Autonomous Tdd Git Workflow - Claude Task Master

Summary

Put the existing git and test workflows on rails: a repeatable, automated process that can run autonomously, with guardrails and a compact TUI for visibility.
Flow: for a selected task, create a branch named with the tag + task id → generate tests for the first subtask (red) using the Surgical Test Generator → implement code (green) → verify tests → commit → repeat per subtask → final verify → push → open PR against the default branch.
Build on existing rules: .cursor/rules/git_workflow.mdc, .cursor/rules/test_workflow.mdc, .claude/agents/surgical-test-generator.md, and existing CLI/core services.

Goals

Deterministic, resumable automation to execute the TDD loop per subtask with minimal human intervention.
Strong guardrails: never commit to the default branch; only commit when tests pass; enforce status transitions; persist logs/state for debuggability.
Visibility: a compact terminal UI (like lazygit) to pick tag, view tasks, and start work; right-side pane opens an executor terminal (via tmux) for agent coding.
Extensible: framework-agnostic test generation via the Surgical Test Generator; detect and use the repo’s test command for execution with coverage thresholds.

Non‑Goals (initial)

Full multi-language runner parity beyond detection and executing the project’s test command.
Complex GUI; start with CLI/TUI + tmux pane. IDE/extension can hook into the same state later.
Rich executor selection UX (codex/gemini/claude) — we’ll prompt per run; defaults can come later.

Success Criteria

One command can autonomously complete a task's subtasks via TDD and open a PR when done.
All commits made on a branch that includes the tag and task id (see Branch Naming); no commits to the default branch directly.
Every subtask iteration: failing tests added first (red), then code added to pass them (green), commit only after green.
End-to-end logs + artifacts stored in .taskmaster/reports/runs/<timestamp-or-id>/.

Success Metrics (Phase 1)

Adoption: 80% of tasks in a pilot repo completed via tm autopilot
Safety: 0 commits to default branch; 100% of commits have green tests
Efficiency: Average time from task start to PR < 30min for simple subtasks
Reliability: < 5% of runs require manual intervention (timeout/conflicts)

User Stories

As a developer, I can run tm autopilot <taskId> and watch a structured, safe workflow execute.
As a reviewer, I can inspect commits per subtask, and a PR summarizing the work when the task completes.
As an operator, I can see current step, active subtask, tests status, and logs in a compact CLI view and read a final run report.

Example Workflow Traces

Happy Path: Complete a 3-subtask feature

bash

# Developer starts
$ tm autopilot 42
→ Checks preflight: ✓ clean tree, ✓ npm test detected
→ Creates branch: analytics/task-42-user-metrics
→ Subtask 42.1: "Add metrics schema"
  RED: generates test_metrics_schema.test.js → 3 failures
  GREEN: implements schema.js → all pass
  COMMIT: "feat(metrics): add metrics schema (task 42.1)"
→ Subtask 42.2: "Add collection endpoint"
  RED: generates test_metrics_endpoint.test.js → 5 failures
  GREEN: implements api/metrics.js → all pass
  COMMIT: "feat(metrics): add collection endpoint (task 42.2)"
→ Subtask 42.3: "Add dashboard widget"
  RED: generates test_metrics_widget.test.js → 4 failures
  GREEN: implements components/MetricsWidget.jsx → all pass
  COMMIT: "feat(metrics): add dashboard widget (task 42.3)"
→ Final: all 3 subtasks complete
  ✓ Run full test suite → all pass
  ✓ Coverage check → 85% (meets 80% threshold)
  PUSH: confirms with user → pushed to origin
  PR: opens #123 "Task #42 [analytics]: User metrics tracking"

✓ Task 42 complete. PR: https://github.com/org/repo/pull/123
  Run report: .taskmaster/reports/runs/2025-01-15-142033/

Error Recovery: Failing tests timeout

bash

$ tm autopilot 42
→ Subtask 42.2 GREEN phase: attempt 1 fails (2 tests still red)
→ Subtask 42.2 GREEN phase: attempt 2 fails (1 test still red)
→ Subtask 42.2 GREEN phase: attempt 3 fails (1 test still red)

⚠️  Paused: Could not achieve green state after 3 attempts
📋 State saved to: .taskmaster/reports/runs/2025-01-15-142033/
    Last error: "POST /api/metrics returns 500 instead of 201"

Next steps:
  - Review diff: git diff HEAD
  - Inspect logs: cat .taskmaster/reports/runs/2025-01-15-142033/log.jsonl
  - Check test output: cat .taskmaster/reports/runs/2025-01-15-142033/test-results/subtask-42.2-green-attempt3.json
  - Resume after manual fix: tm autopilot --resume

# Developer manually fixes the issue, then:
$ tm autopilot --resume
→ Resuming subtask 42.2 GREEN phase
  GREEN: all tests pass
  COMMIT: "feat(metrics): add collection endpoint (task 42.2)"
→ Continuing to subtask 42.3...

Dry Run: Preview before execution

bash

$ tm autopilot 42 --dry-run
Autopilot Plan for Task #42 [analytics]: User metrics tracking
─────────────────────────────────────────────────────────────
Preflight:
  ✓ Working tree is clean
  ✓ Test command detected: npm test
  ✓ Tools available: git, gh, node, npm
  ✓ Current branch: main (will create new branch)

Branch & Tag:
  → Create branch: analytics/task-42-user-metrics
  → Set active tag: analytics

Subtasks (3 pending):
  1. 42.1: Add metrics schema
     - RED: generate tests in src/__tests__/schema.test.js
     - GREEN: implement src/schema.js
     - COMMIT: "feat(metrics): add metrics schema (task 42.1)"

  2. 42.2: Add collection endpoint [depends on 42.1]
     - RED: generate tests in src/api/__tests__/metrics.test.js
     - GREEN: implement src/api/metrics.js
     - COMMIT: "feat(metrics): add collection endpoint (task 42.2)"

  3. 42.3: Add dashboard widget [depends on 42.2]
     - RED: generate tests in src/components/__tests__/MetricsWidget.test.jsx
     - GREEN: implement src/components/MetricsWidget.jsx
     - COMMIT: "feat(metrics): add dashboard widget (task 42.3)"

Finalization:
  → Run full test suite with coverage
  → Push branch to origin (will confirm)
  → Create PR targeting main

Run without --dry-run to execute.

High‑Level Workflow

Pre‑flight
- Verify clean working tree or confirm staging/commit policy (configurable).
- Detect repo type and the project’s test command (e.g., npm test, pnpm test, pytest, go test).
- Validate tools: git, gh (optional for PR), node/npm, and (if used) claude CLI.
- Load TaskMaster state and selected task; if no subtasks exist, automatically run “expand” before working.
Branch & Tag Setup
- Checkout default branch and update (optional), then create a branch using Branch Naming (below).
- Map branch ↔ tag via existing tag management; explicitly set active tag to the branch’s tag.
Subtask Loop (for each pending/in-progress subtask in dependency order)
- Select next eligible subtask using tm-core TaskService getNextTask() and subtask eligibility logic.
- Red: generate or update failing tests for the subtask
  - Use the Surgical Test Generator system prompt .claude/agents/surgical-test-generator.md) to produce high-signal tests following project conventions.
  - Run tests to confirm red; record results. If not red (already passing), skip to next subtask or escalate.
- Green: implement code to pass tests
  - Use executor to implement changes (initial: claude CLI prompt with focused context).
  - Re-run tests until green or timeout/backoff policy triggers.
- Commit: when green
  - Commit tests + code with conventional commit message. Optionally update subtask status to done.
  - Persist run step metadata/logs.
Finalization
- Run full test suite and coverage (if configured); optionally lint/format.
- Commit any final adjustments.
- Push branch (ask user to confirm); create PR (via gh pr create) targeting the default branch. Title format: Task #<id> [<tag>]: <title>.
Post‑Run
- Update task status if desired (e.g., review).
- Persist run report (JSON + markdown summary) to .taskmaster/reports/runs/<run-id>/.

Guardrails

Never commit to the default branch.
Commit only if all tests (targeted and suite) pass; allow override flags.
Enforce 80% coverage thresholds (lines/branches/functions/statements) by default; configurable.
Timebox/model ops and retries; if not green within N attempts, pause with actionable state for resume.
Always log actions, commands, and outcomes; include dry-run mode.
Ask before branch creation, pushing, and opening a PR unless --no-confirm is set.

Integration Points (Current Repo)

CLI: apps/cli provides command structure and UI components.
- New command: tm autopilot (alias: task-master autopilot).
- Reuse UI components under apps/cli/src/ui/components/ for headers/task details/next-task.
Core services: packages/tm-core
- TaskService for selection, status, tags.
- TaskExecutionService for prompt formatting and executor prep.
- Executors: claude executor and ExecutorFactory to run external tools.
- Proposed new: WorkflowOrchestrator to drive the autonomous loop and emit progress events.
Tag/Git utilities: scripts/modules/utils/git-utils.js and scripts/modules/task-manager/tag-management.js for branch→tag mapping and explicit tag switching.
Rules: .cursor/rules/git_workflow.mdc and .cursor/rules/test_workflow.mdc to steer behavior and ensure consistency.
Test generation prompt: .claude/agents/surgical-test-generator.md.

Proposed Components

Orchestrator (tm-core): WorkflowOrchestrator (new)
- State machine driving phases: Preflight → Branch/Tag → SubtaskIter (Red/Green/Commit) → Finalize → PR.
- Exposes an evented API (progress events) that the CLI can render.
- Stores run state artifacts.
Test Runner Adapter
- Detects and runs tests via the project’s test command (e.g., npm test), with targeted runs where feasible.
- API: runTargeted(files/pattern), runAll(), report summary (failures, duration, coverage), enforce 80% threshold by default.
Git/PR Adapter
- Encapsulates git ops: branch create/checkout, add/commit, push.
- Optional gh integration to open PR; fallback to instructions if gh unavailable.
- Confirmation gates for branch creation and pushes.
Prompt/Exec Adapter
- Uses existing executor service to call the selected coding assistant (initially claude) with tight prompts: task/subtask context, surgical tests first, then minimal code to green.
Run State + Reporting
- JSONL log of steps, timestamps, commands, test results.
- Markdown summary for PR description and post-run artifact.

CLI UX (MVP)

Command: tm autopilot [taskId]
- Flags: --dry-run, --no-push, --no-pr, --no-confirm, --force, --max-attempts <n>, --runner <auto|custom>, --commit-scope <scope>
- Output: compact header (project, tag, branch), current phase, subtask line, last test summary, next actions.
Resume: If interrupted, tm autopilot --resume picks up from last checkpoint in run state.

TUI with tmux (Linear Execution)

Left pane: Tag selector, task list (status/priority), start/expand shortcuts; "Start" triggers the next task or a selected task.
Right pane: Executor terminal (tmux split) that runs the coding agent (claude-code/codex). Autopilot can hand over to the right pane during green.
MCP integration: use MCP tools for task queries/updates and for shell/test invocations where available.

TUI Layout (tmux-based)

Pane Structure

┌─────────────────────────────────────┬──────────────────────────────────┐
│ Task Navigator (left)               │ Executor Terminal (right)        │
│                                     │                                  │
│ Project: my-app                     │ $ tm autopilot --executor-mode   │
│ Branch: analytics/task-42           │ > Running subtask 42.2 GREEN...  │
│ Tag: analytics                      │ > Implementing endpoint...       │
│                                     │ > Tests: 3 passed, 0 failed      │
│ Tasks:                              │ > Ready to commit                │
│ → 42 [in-progress] User metrics     │                                  │
│   → 42.1 [done] Schema              │ [Live output from Claude Code]   │
│   → 42.2 [active] Endpoint ◀        │                                  │
│   → 42.3 [pending] Dashboard        │                                  │
│                                     │                                  │
│ [s] start  [p] pause  [q] quit      │                                  │
└─────────────────────────────────────┴──────────────────────────────────┘

Implementation Notes

Left pane: apps/cli/src/ui/tui/navigator.ts (new, uses blessed or ink)
Right pane: spawned via tmux split-window -h running tm autopilot --executor-mode
Communication: shared state file .taskmaster/state/current-run.json + file watching or event stream
Keybindings:
- s - Start selected task
- p - Pause/resume current run
- q - Quit (with confirmation if run active)
- ↑/↓ - Navigate task list
- Enter - Expand/collapse subtasks

Prompt Composition (Detailed)

System Prompt Assembly

Prompts are composed in three layers:

Base rules (loaded in order from .cursor/rules/ and .claude/agents/):
- git_workflow.mdc → git commit conventions, branch policy, PR guidelines
- test_workflow.mdc → TDD loop requirements, coverage thresholds, test structure
- surgical-test-generator.md → test generation methodology, project-specific test patterns

Task context injection:

You are implementing:
Task #42 [analytics]: User metrics tracking
Subtask 42.2: Add collection endpoint

Description:
Implement POST /api/metrics endpoint to collect user metrics events

Acceptance criteria:
- POST /api/metrics accepts { userId, eventType, timestamp }
- Validates input schema (reject missing/invalid fields)
- Persists to database
- Returns 201 on success with created record
- Returns 400 on validation errors

Dependencies:
- Subtask 42.1 (metrics schema) is complete

Current phase: RED (generate failing tests)
Test command: npm test
Test file convention: src/**/*.test.js (vitest framework detected)
Branch: analytics/task-42-user-metrics
Project language: JavaScript (Node.js)

Phase-specific instructions:
- RED phase: "Generate minimal failing tests for this subtask. Do NOT implement any production code. Only create test files. Confirm tests fail with clear error messages indicating missing implementation."
- GREEN phase: "Implement minimal code to pass the failing tests. Follow existing project patterns in src/. Only modify files necessary for this subtask. Keep changes focused and reviewable."

Example Full Prompt (RED Phase)

markdown


[Contents of .cursor/rules/git_workflow.mdc]
[Contents of .cursor/rules/test_workflow.mdc]
[Contents of .claude/agents/surgical-test-generator.md]

<TASK CONTEXT>
You are implementing:
Task #42.2: Add collection endpoint

Description:
Implement POST /api/metrics endpoint to collect user metrics events

Acceptance criteria:
- POST /api/metrics accepts { userId, eventType, timestamp }
- Validates input schema (reject missing/invalid fields)
- Persists to database using MetricsSchema from subtask 42.1
- Returns 201 on success with created record
- Returns 400 on validation errors with details

Dependencies: Subtask 42.1 (metrics schema) is complete

<INSTRUCTION>
Generate failing tests for this subtask. Follow project conventions:
- Test file: src/api/__tests__/metrics.test.js
- Framework: vitest (detected from package.json)
- Test cases to cover:
  * POST /api/metrics with valid payload → should return 201 (will fail: endpoint not implemented)
  * POST /api/metrics with missing userId → should return 400 (will fail: validation not implemented)
  * POST /api/metrics with invalid timestamp → should return 400 (will fail: validation not implemented)
  * POST /api/metrics should persist to database → should save record (will fail: persistence not implemented)

Do NOT implement the endpoint code yet. Only create test file(s).
Confirm tests fail with messages like "Cannot POST /api/metrics" or "endpoint not defined".

Output format:
1. File path to create: src/api/__tests__/metrics.test.js
2. Complete test code
3. Command to run: npm test src/api/__tests__/metrics.test.js

Example Full Prompt (GREEN Phase)

markdown


[Contents of .cursor/rules/git_workflow.mdc]
[Contents of .cursor/rules/test_workflow.mdc]

<TASK CONTEXT>
Task #42.2: Add collection endpoint
[same context as RED phase]

<CURRENT STATE>
Tests created in RED phase:
- src/api/__tests__/metrics.test.js
- 5 tests written, all failing as expected

Test output:

FAIL src/api/tests/metrics.test.js POST /api/metrics ✗ should return 201 with valid payload (endpoint not found) ✗ should return 400 with missing userId (endpoint not found) ✗ should return 400 with invalid timestamp (endpoint not found) ✗ should persist to database (endpoint not found)


<INSTRUCTION>
Implement minimal code to make all tests pass.

Guidelines:
- Create/modify file: src/api/metrics.js
- Use existing patterns from src/api/ (e.g., src/api/users.js for reference)
- Import MetricsSchema from subtask 42.1 (src/models/schema.js)
- Implement validation, persistence, and response handling
- Follow project error handling conventions
- Keep implementation focused on this subtask only

After implementation:
1. Run tests: npm test src/api/__tests__/metrics.test.js
2. Confirm all 5 tests pass
3. Report results

Output format:
1. File(s) created/modified
2. Implementation code
3. Test command and results

Prompt Loading Configuration

See .taskmaster/config.json → prompts section for paths and load order.

Configuration Schema

.taskmaster/config.json

json

{
  "autopilot": {
    "enabled": true,
    "requireCleanWorkingTree": true,
    "commitTemplate": "{type}({scope}): {msg}",
    "defaultCommitType": "feat",
    "maxGreenAttempts": 3,
    "testTimeout": 300000
  },
  "test": {
    "runner": "auto",
    "coverageThresholds": {
      "lines": 80,
      "branches": 80,
      "functions": 80,
      "statements": 80
    },
    "targetedRunPattern": "**/*.test.js"
  },
  "git": {
    "branchPattern": "{tag}/task-{id}-{slug}",
    "pr": {
      "enabled": true,
      "base": "default",
      "bodyTemplate": ".taskmaster/templates/pr-body.md"
    }
  },
  "prompts": {
    "rulesPath": ".cursor/rules",
    "testGeneratorPath": ".claude/agents/surgical-test-generator.md",
    "loadOrder": ["git_workflow.mdc", "test_workflow.mdc"]
  }
}

Configuration Fields

autopilot

enabled (boolean): Enable/disable autopilot functionality
requireCleanWorkingTree (boolean): Require clean git state before starting
commitTemplate (string): Template for commit messages (tokens: {type}, {scope}, {msg})
defaultCommitType (string): Default commit type (feat, fix, chore, etc.)
maxGreenAttempts (number): Maximum retry attempts to achieve green tests (default: 3)
testTimeout (number): Timeout in milliseconds per test run (default: 300000 = 5min)

test

runner (string): Test runner detection mode ("auto" or explicit command like "npm test")
coverageThresholds (object): Minimum coverage percentages required
- lines, branches, functions, statements (number): Threshold percentages (0-100)
targetedRunPattern (string): Glob pattern for targeted subtask test runs

git

branchPattern (string): Branch naming pattern (tokens: {tag}, {id}, {slug})
pr.enabled (boolean): Enable automatic PR creation
pr.base (string): Target branch for PRs ("default" uses repo default, or specify like "main")
pr.bodyTemplate (string): Path to PR body template file (optional)

prompts

rulesPath (string): Directory containing rule files (e.g., .cursor/rules)
testGeneratorPath (string): Path to test generator prompt file
loadOrder (array): Order to load rule files from rulesPath

Environment Variables

bash

# Required for executor
ANTHROPIC_API_KEY=sk-ant-...          # Claude API key

# Optional: for PR creation
GITHUB_TOKEN=ghp_...                  # GitHub personal access token

# Optional: for other executors (future)
OPENAI_API_KEY=sk-...
GOOGLE_API_KEY=...

Run Artifacts & Observability

Per-Run Artifact Structure

Each autopilot run creates a timestamped directory with complete traceability:

.taskmaster/reports/runs/2025-01-15-142033/
├── manifest.json          # run metadata (task id, start/end time, status)
├── log.jsonl              # timestamped event stream
├── commits.txt            # list of commit SHAs made during run
├── test-results/
│   ├── subtask-42.1-red.json
│   ├── subtask-42.1-green.json
│   ├── subtask-42.2-red.json
│   ├── subtask-42.2-green-attempt1.json
│   ├── subtask-42.2-green-attempt2.json
│   ├── subtask-42.2-green-attempt3.json
│   └── final-suite.json
└── pr.md                  # generated PR body

manifest.json Format

json

{
  "runId": "2025-01-15-142033",
  "taskId": "42",
  "tag": "analytics",
  "branch": "analytics/task-42-user-metrics",
  "startTime": "2025-01-15T14:20:33Z",
  "endTime": "2025-01-15T14:45:12Z",
  "status": "completed",
  "subtasksCompleted": ["42.1", "42.2", "42.3"],
  "subtasksFailed": [],
  "totalCommits": 3,
  "prUrl": "https://github.com/org/repo/pull/123",
  "finalCoverage": {
    "lines": 85.3,
    "branches": 82.1,
    "functions": 88.9,
    "statements": 85.0
  }
}

log.jsonl Format

Event stream in JSON Lines format for easy parsing and debugging:

jsonl

{"ts":"2025-01-15T14:20:33Z","phase":"preflight","status":"ok","details":{"testCmd":"npm test","gitClean":true}}
{"ts":"2025-01-15T14:20:45Z","phase":"branch","status":"ok","branch":"analytics/task-42-user-metrics"}
{"ts":"2025-01-15T14:21:00Z","phase":"red","subtask":"42.1","status":"ok","tests":{"failed":3,"passed":0}}
{"ts":"2025-01-15T14:22:15Z","phase":"green","subtask":"42.1","status":"ok","tests":{"passed":3,"failed":0},"attempts":2}
{"ts":"2025-01-15T14:22:20Z","phase":"commit","subtask":"42.1","status":"ok","sha":"a1b2c3d","message":"feat(metrics): add metrics schema (task 42.1)"}
{"ts":"2025-01-15T14:23:00Z","phase":"red","subtask":"42.2","status":"ok","tests":{"failed":5,"passed":0}}
{"ts":"2025-01-15T14:25:30Z","phase":"green","subtask":"42.2","status":"error","tests":{"passed":3,"failed":2},"attempts":3,"error":"Max attempts reached"}
{"ts":"2025-01-15T14:25:35Z","phase":"pause","reason":"max_attempts","nextAction":"manual_review"}

Test Results Format

Each test run stores detailed results:

json

{
  "subtask": "42.2",
  "phase": "green",
  "attempt": 3,
  "timestamp": "2025-01-15T14:25:30Z",
  "command": "npm test src/api/__tests__/metrics.test.js",
  "exitCode": 1,
  "duration": 2340,
  "summary": {
    "total": 5,
    "passed": 3,
    "failed": 2,
    "skipped": 0
  },
  "failures": [
    {
      "test": "POST /api/metrics should return 201 with valid payload",
      "error": "Expected status 201, got 500",
      "stack": "..."
    }
  ],
  "coverage": {
    "lines": 78.5,
    "branches": 75.0,
    "functions": 80.0,
    "statements": 78.5
  }
}

Execution Model

Orchestration vs Direct Execution

The autopilot system uses an orchestration model rather than direct code execution:

Orchestrator Role (tm-core WorkflowOrchestrator):

Maintains state machine tracking current phase (RED/GREEN/COMMIT) per subtask
Validates preconditions (tests pass, git state clean, etc.)
Returns "work units" describing what needs to be done next
Records completion and advances to next phase
Persists state for resumability

Executor Role (Claude Code/AI session via MCP):

Queries orchestrator for next work unit
Executes the work (generates tests, writes code, runs tests, makes commits)
Reports results back to orchestrator
Handles file operations and tool invocations

Why This Approach?

Leverages existing AI capabilities (Claude Code) rather than duplicating them
MCP protocol provides clean separation between state management and execution
Allows human oversight and intervention at each phase
Simpler to implement: orchestrator is pure state logic, no code generation needed
Enables multiple executor types (Claude Code, other AI tools, human developers)

Example Flow:

typescript

// Claude Code (via MCP) queries orchestrator
const workUnit = await orchestrator.getNextWorkUnit('42');
// => {
//      phase: 'RED',
//      subtask: '42.1',
//      action: 'Generate failing tests for metrics schema',
//      context: { title, description, dependencies, testFile: 'src/__tests__/schema.test.js' }
//    }

// Claude Code executes the work (writes test file, runs tests)
// Then reports back
await orchestrator.completeWorkUnit('42', '42.1', 'RED', {
  success: true,
  testsCreated: ['src/__tests__/schema.test.js'],
  testsFailed: 3
});

// Query again for next phase
const nextWorkUnit = await orchestrator.getNextWorkUnit('42');
// => { phase: 'GREEN', subtask: '42.1', action: 'Implement code to pass tests', ... }

Design Decisions

Why commit per subtask instead of per task?

Decision: Commit after each subtask's green state, not after the entire task.

Rationale:

Atomic commits make code review easier (reviewers can see logical progression)
Easier to revert a single subtask if it causes issues downstream
Matches the TDD loop's natural checkpoint and cognitive boundary
Provides resumability points if the run is interrupted

Trade-off: More commits per task (can use squash-merge in PRs if desired)

Why not support parallel subtask execution?

Decision: Sequential subtask execution in Phase 1; parallel execution deferred to Phase 3.

Rationale:

Subtasks often have implicit dependencies (e.g., schema before endpoint, endpoint before UI)
Simpler orchestrator state machine (less complexity = faster to ship)
Parallel execution requires explicit dependency DAG and conflict resolution
Can be added in Phase 3 once core workflow is proven stable

Trade-off: Slower for truly independent subtasks (mitigated by keeping subtasks small and focused)

Why require 80% coverage by default?

Decision: Enforce 80% coverage threshold (lines/branches/functions/statements) before allowing commits.

Rationale:

Industry standard baseline for production code quality
Forces test generation to be comprehensive, not superficial
Configurable per project via .taskmaster/config.json if too strict
Prevents "green tests" that only test happy paths

Trade-off: May require more test generation iterations; can be lowered per project

Why use tmux instead of a rich GUI?

Decision: MVP uses tmux split panes for TUI, not Electron/web-based GUI.

Rationale:

Tmux is universally available on dev machines; no installation burden
Terminal-first workflows match developer mental model (no context switching)
Simpler to implement and maintain; can add GUI later via extensions
State stored in files allows IDE/extension integration without coupling

Trade-off: Less visual polish than GUI; requires tmux familiarity

Why not support multiple executors (codex/gemini/claude) in Phase 1?

Decision: Start with Claude executor only; add others in Phase 2+.

Rationale:

Reduces scope and complexity for initial delivery
Claude Code already integrated with existing executor service
Executor abstraction already exists; adding more is straightforward later
Different executors may need different prompt strategies (requires experimentation)

Trade-off: Users locked to Claude initially; can work around with manual executor selection

Risks and Mitigations

Model hallucination/large diffs: restrict prompt scope; enforce minimal changes; show diff previews (optional) before commit.
Flaky tests: allow retries, isolate targeted runs for speed, then full suite before commit.
Environment variability: detect runners/tools; provide fallbacks and actionable errors.
PR creation fails: still push and print manual commands; persist PR body to reuse.

Open Questions

Slugging rules for branch names; any length limits or normalization beyond {slug} token sanitize?
PR body standard sections beyond run report (e.g., checklist, coverage table)?
Default executor prompt fine-tuning once codex/gemini integration is available.
Where to store persistent TUI state (pane layout, last selection) in .taskmaster/state.json?

Branch Naming

Include both the tag and the task id in the branch name to make lineage explicit.
Default pattern: <tag>/task-<id>[-slug] (e.g., master/task-12, tag-analytics/task-4-user-auth).
Configurable via .taskmaster/config.json: git.branchPattern supports tokens {tag}, {id}, {slug}.

PR Base Branch

Use the repository’s default branch (detected via git) unless overridden.
Title format: Task #<id> [<tag>]: <title>.

RPG Mapping (Repository Planning Graph)

Functional nodes (capabilities):

Autopilot Orchestration → drives TDD loop and lifecycle
Test Generation (Surgical) → produces failing tests from subtask context
Test Execution + Coverage → runs suite, enforces thresholds
Git/Branch/PR Management → safe operations and PR creation
TUI/Terminal Integration → interactive control and visibility via tmux
MCP Integration → structured task/status/context operations

Structural nodes (code organization):

packages/tm-core:
- services/workflow-orchestrator.ts (new)
- services/test-runner-adapter.ts (new)
- services/git-adapter.ts (new)
- existing: task-service.ts, task-execution-service.ts, executors/*
apps/cli:
- src/commands/autopilot.command.ts (new)
- src/ui/tui/ (new tmux/TUI helpers)
scripts/modules:
- reuse utils/git-utils.js, task-manager/tag-management.js
.claude/agents/:
- surgical-test-generator.md

Edges (data/control flow):

Autopilot → Test Generation → Test Execution → Git Commit → loop
Autopilot → Git Adapter (branch, tag, PR)
Autopilot → TUI (event stream) → tmux pane control
Autopilot → MCP tools for task/status updates
Test Execution → Coverage gate → Autopilot decision

Topological traversal (implementation order):

Git/Test adapters (foundations)
Orchestrator skeleton + events
CLI autopilot command and dry-run
Surgical test-gen integration and execution gate
PR creation, run reports, resumability

Phased Roadmap

Phase 0: Spike
- Implement CLI skeleton tm autopilot with dry-run showing planned steps from a real task + subtasks.
- Detect test runner (package.json) and git state; render a preflight report.
Phase 1: Core Rails (State Machine & Orchestration)
- Implement WorkflowOrchestrator in tm-core as a state machine that tracks TDD phases per subtask.
- Orchestrator guides the current AI session (Claude Code/MCP client) rather than executing code itself.
- Add Git/Test adapters for status checks and validation (not direct execution).
- WorkflowOrchestrator API:
  - getNextWorkUnit(taskId) → returns next phase to execute (RED/GREEN/COMMIT) with context
  - completeWorkUnit(taskId, subtaskId, phase, result) → records completion and advances state
  - getRunState(taskId) → returns current progress and resumability data
- MCP integration: expose work unit endpoints so Claude Code can query "what to do next" and report back.
- Branch/tag mapping via existing tag-management APIs.
- Run report persisted under .taskmaster/reports/runs/ with state checkpoints for resumability.
Phase 2: PR + Resumability
- Add gh PR creation with well-formed body using the run report.
- Introduce resumable checkpoints and --resume flag.
- Add coverage enforcement and optional lint/format step.
Phase 3: Extensibility + Guardrails
- Add support for basic pytest/go test adapters.
- Add safeguards: diff preview mode, manual confirm gates, aggressive minimal-change prompts.
- Optional: small TUI panel and extension panel leveraging the same run state file.

References (Repo)

Test Workflow: .cursor/rules/test_workflow.mdc
Git Workflow: .cursor/rules/git_workflow.mdc
CLI: apps/cli/src/commands/start.command.ts, apps/cli/src/ui/components/*.ts
Core Services: packages/tm-core/src/services/task-service.ts, task-execution-service.ts
Executors: packages/tm-core/src/executors/*
Git Utilities: scripts/modules/utils/git-utils.js
Tag Management: scripts/modules/task-manager/tag-management.js
Surgical Test Generator: .claude/agents/surgical-test-generator.md