get-shit-done/references/tdd.md
Principle: If you can describe the behavior as expect(fn(input)).toBe(output) before writing fn, TDD improves the result.
Key insight: TDD work is fundamentally heavier than standard tasks—it requires 2-3 execution cycles (RED → GREEN → REFACTOR), each with file reads, test runs, and potential debugging. TDD features get dedicated plans to ensure full context is available throughout the cycle. </overview>
<when_to_use_tdd>
TDD candidates (create a TDD plan):
Skip TDD (use standard plan with type="auto" tasks):
Heuristic: Can you write expect(fn(input)).toBe(output) before writing fn?
→ Yes: Create a TDD plan
→ No: Use standard plan, add tests after if needed
</when_to_use_tdd>
<tdd_plan_structure>
Each TDD plan implements one feature through the full RED-GREEN-REFACTOR cycle.
---
phase: XX-name
plan: NN
type: tdd
---
<objective>
[What feature and why]
Purpose: [Design benefit of TDD for this feature]
Output: [Working, tested feature]
</objective>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@relevant/source/files.ts
</context>
<feature>
<name>[Feature name]</name>
<files>[source file, test file]</files>
<behavior>
[Expected behavior in testable terms]
Cases: input → expected output
</behavior>
<implementation>[How to implement once tests pass]</implementation>
</feature>
<verification>
[Test command that proves feature works]
</verification>
<success_criteria>
- Failing test written and committed
- Implementation passes test
- Refactor complete (if needed)
- All 2-3 commits present
</success_criteria>
<output>
After completion, create SUMMARY.md with:
- RED: What test was written, why it failed
- GREEN: What implementation made it pass
- REFACTOR: What cleanup was done (if any)
- Commits: List of commits produced
</output>
One feature per TDD plan. If features are trivial enough to batch, they're trivial enough to skip TDD—use a standard plan and add tests after. </tdd_plan_structure>
<execution_flow>
RED - Write failing test:
<behavior> element)test({phase}-{plan}): add failing test for [feature]GREEN - Implement to pass:
feat({phase}-{plan}): implement [feature]REFACTOR (if needed):
refactor({phase}-{plan}): clean up [feature]Result: Each TDD plan produces 2-3 atomic commits. </execution_flow>
<test_quality>
Test behavior, not implementation:
One concept per test:
Descriptive names:
No implementation details:
<framework_setup>
When executing a TDD plan but no test framework is configured, set it up as part of the RED phase:
1. Detect project type:
# JavaScript/TypeScript
if [ -f package.json ]; then echo "node"; fi
# Python
if [ -f requirements.txt ] || [ -f pyproject.toml ]; then echo "python"; fi
# Go
if [ -f go.mod ]; then echo "go"; fi
# Rust
if [ -f Cargo.toml ]; then echo "rust"; fi
2. Install minimal framework:
| Project | Framework | Install |
|---|---|---|
| Node.js | Jest | npm install -D jest @types/jest ts-jest |
| Node.js (Vite) | Vitest | npm install -D vitest |
| Python | pytest | pip install pytest |
| Go | testing | Built-in |
| Rust | cargo test | Built-in |
3. Create config if needed:
jest.config.js with ts-jest presetvitest.config.ts with test globalspytest.ini or pyproject.toml section4. Verify setup:
# Run empty test suite - should pass with 0 tests
npm test # Node
pytest # Python
go test ./... # Go
cargo test # Rust
5. Create first test file: Follow project conventions for test location:
*.test.ts / *.spec.ts next to source__tests__/ directorytests/ directory at rootFramework setup is a one-time cost included in the first TDD plan's RED phase. </framework_setup>
<error_handling>
Test doesn't fail in RED phase:
Test doesn't pass in GREEN phase:
Tests fail in REFACTOR phase:
Unrelated tests break:
<commit_pattern>
TDD plans produce 2-3 atomic commits (one per phase):
test(08-02): add failing test for email validation
- Tests valid email formats accepted
- Tests invalid formats rejected
- Tests empty input handling
feat(08-02): implement email validation
- Regex pattern matches RFC 5322
- Returns boolean for validity
- Handles edge cases (empty, null)
refactor(08-02): extract regex to constant (optional)
- Moved pattern to EMAIL_REGEX constant
- No behavior changes
- Tests still pass
Comparison with standard plans:
Both follow same format: {type}({phase}-{plan}): {description}
Benefits:
<gate_enforcement>
When workflow.tdd_mode is enabled in config, the RED/GREEN/REFACTOR gate sequence is enforced for all type: tdd plans.
| Gate | Required | Commit Pattern | Validation |
|---|---|---|---|
| RED | Yes | test({phase}-{plan}): ... | Test exists AND fails before implementation |
| GREEN | Yes | feat({phase}-{plan}): ... | Test passes after implementation |
| REFACTOR | No | refactor({phase}-{plan}): ... | Tests still pass after cleanup |
test(...) commit precedes the feat(...) commit, the TDD discipline was violated. Flag in SUMMARY.md.After completing a type: tdd plan, the executor validates the git log:
# Check for RED gate commit
git log --oneline --grep="^test(${PHASE}-${PLAN})" | head -1
# Check for GREEN gate commit
git log --oneline --grep="^feat(${PHASE}-${PLAN})" | head -1
# Check for optional REFACTOR gate commit
git log --oneline --grep="^refactor(${PHASE}-${PLAN})" | head -1
If RED or GREEN gate commits are missing, add a ## TDD Gate Compliance section to SUMMARY.md with the violation details.
</gate_enforcement>
<end_of_phase_review>
When workflow.tdd_mode is enabled, the execute-phase orchestrator inserts a collaborative review checkpoint after all waves complete but before phase verification.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
TDD REVIEW — Phase {X}
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
TDD Plans: {count} | Gate violations: {count}
| Plan | RED | GREEN | REFACTOR | Status |
|------|-----|-------|----------|--------|
| {id} | ✓ | ✓ | ✓ | Pass |
| {id} | ✓ | ✗ | — | FAIL |
{If violations exist:}
⚠ Gate violations are advisory — review before advancing.
This checkpoint is advisory — it does not block phase completion but surfaces TDD discipline issues for human review. </end_of_phase_review>
<context_budget>
TDD plans target ~40% context usage (lower than standard plans' ~50%).
Why lower:
Each phase involves reading files, running commands, analyzing output. The back-and-forth is inherently heavier than linear task execution.
Single feature focus ensures full quality throughout the cycle. </context_budget>