Back to Beads

Testing Philosophy

docs/TESTING_PHILOSOPHY.md

1.0.37.9 KB
Original Source

Testing Philosophy

This document covers what to test and what not to test. For how to run tests, see TESTING.md.

The Test Pyramid

                  ┌─────────────────┐
                  │   E2E Tests     │  ← PR/Deploy only (slow, expensive)
                  │   ~5% of tests  │
                  └────────┬────────┘
                           │
            ┌──────────────┴──────────────┐
            │     Integration Tests       │  ← PR gate (moderate)
            │       ~15% of tests         │
            └──────────────┬──────────────┘
                           │
  ┌────────────────────────┴────────────────────────┐
  │              Unit Tests (Fast)                  │  ← Every save/commit
  │                 ~80% of tests                   │
  └─────────────────────────────────────────────────┘

Tier 1: Fast Tests (< 5 seconds total)

When: Every file save, pre-commit hooks, continuous during development

  • Pure function tests (no I/O)
  • In-memory data structure tests
  • Business logic validation
  • Mock all external dependencies

In beads: Core logic tests using newTestStore() with in-memory SQLite

Tier 2: Integration Tests (< 30 seconds)

When: Pre-push, PR checks

  • Real file system operations
  • Git operations with temp repos
  • Config file parsing
  • CLI argument handling

In beads: Tests tagged with //go:build integration, server mode tests

Tier 3: E2E / Smoke Tests (1-5 minutes)

When: PR merge, pre-deploy, nightly

  • Full bd initbd doctorbd doctor --fix workflow
  • Real API calls (to staging)
  • Cross-platform verification

What Makes a Test "Right"

A good test:

  1. Catches a bug you'd actually ship - not theoretical edge cases
  2. Documents expected behavior - serves as living documentation
  3. Runs fast enough to not skip - slow tests get disabled
  4. Isn't duplicated elsewhere - tests one thing, one way

What to Test (Priority Matrix)

PriorityWhatWhyExamples in beads
HighCore business logicThis is what users depend onsync, doctor, export, import
HighError paths that could corrupt dataData loss is catastrophicConfig handling, git operations, database integrity
MediumEdge cases from production bugsDiscovered through real issuesOrphan handling, ID collision detection
LowDisplay/formattingVisual output, can be manually verifiedTable formatting, color output

What NOT to Test Extensively

Simple utility functions

Trust the language. Don't test that strings.TrimSpace works.

Every permutation of inputs

Use table-driven tests with representative cases instead of exhaustive permutations.

go
// BAD: 10 separate test functions
func TestPriority0(t *testing.T) { ... }
func TestPriority1(t *testing.T) { ... }
func TestPriority2(t *testing.T) { ... }

// GOOD: One table-driven test
func TestPriorityMapping(t *testing.T) {
    cases := []struct{ in, want int }{
        {0, 4}, {1, 0}, {5, 3}, // includes boundary
    }
    for _, tc := range cases {
        t.Run(fmt.Sprintf("priority_%d", tc.in), func(t *testing.T) {
            got := mapPriority(tc.in)
            if got != tc.want { t.Errorf(...) }
        })
    }
}

Obvious behavior

Don't test "if file exists, return true" - trust the implementation.

Same logic through different entry points

If you test a function directly, don't also test it through every caller.


Anti-Patterns to Avoid

1. Trivial Assertions

Testing obvious happy paths that would pass with trivial implementations.

go
// BAD: What bug would this catch?
func TestValidateBeadsWorkspace(t *testing.T) {
    dir := setupTestWorkspace(t)
    if err := validateBeadsWorkspace(dir); err != nil {
        t.Errorf("expected no error, got: %v", err)
    }
}

// GOOD: Test the interesting error cases
func TestValidateBeadsWorkspace(t *testing.T) {
    cases := []struct{
        name    string
        setup   func(t *testing.T) string
        wantErr string
    }{
        {"missing .beads dir", setupNoBeadsDir, "not a beads workspace"},
        {"corrupted db", setupCorruptDB, "database is corrupted"},
        {"permission denied", setupNoReadAccess, "permission denied"},
    }
    // ...
}

2. Duplicate Error Path Testing

Testing the same logic multiple ways instead of once with table-driven tests.

go
// BAD: Repetitive individual assertions
if config.PriorityMap["0"] != 4 { t.Errorf(...) }
if config.PriorityMap["1"] != 0 { t.Errorf(...) }
if config.PriorityMap["2"] != 1 { t.Errorf(...) }

// GOOD: Table-driven
for k, want := range expectedMap {
    if got := config.PriorityMap[k]; got != want {
        t.Errorf("PriorityMap[%q] = %d, want %d", k, got, want)
    }
}

3. I/O Heavy Tests Without Mocking

Unit tests that execute real commands or heavy I/O when they could mock.

go
// BAD: Actually executes external commands in unit test
func TestServerFix(t *testing.T) {
    exec.Command("bd", "dolt", "stop").Run()
    // ...
}

// GOOD: Mock the execution or use integration test tag
func TestServerFix(t *testing.T) {
    executor := &mockExecutor{}
    fix := NewServerFix(executor)
    // ...
}

4. Testing Implementation, Not Behavior

Tests that break when you refactor, even though behavior is unchanged.

go
// BAD: Tests internal state
if len(server.connectionPool) != 3 { t.Error(...) }

// GOOD: Tests observable behavior
if resp, err := server.HandleRequest(req); err != nil { t.Error(...) }

5. Missing Boundary Tests

Testing known-good values but not boundaries and invalid inputs.

go
// BAD: Only tests middle values
TestPriority(1)  // works
TestPriority(2)  // works

// GOOD: Tests boundaries and invalid
TestPriority(-1) // invalid - expect error
TestPriority(0)  // boundary - min valid
TestPriority(4)  // boundary - max valid
TestPriority(5)  // boundary - first invalid

Target Metrics

MetricTargetCurrent (beads)Status
Test-to-code ratio0.5:1 - 1.5:10.85:1Healthy
Fast test suite< 5 seconds3.8 secondsGood
Integration tests< 30 seconds~15 secondsGood
Compilation overheadMinimize180 secondsBottleneck

Interpretation

  • 0.5:1 - Light coverage, fast iteration (acceptable for utilities)
  • 1:1 - Solid coverage for most projects (our target)
  • 1.5:1 - Heavy coverage for critical systems
  • 2:1+ - Over-engineered, maintenance burden

Beads-Specific Guidance

Well-Covered (Maintain)

AreaWhy It's Well-Tested
Sync/Export/ImportData integrity critical - comprehensive edge cases
SQLite transactionsRollback safety, atomicity guarantees
Merge operationsDolt-native cell-level merge
Database lockingPrevents corruption from multiple instances

Needs Attention

AreaGapPriority
Server lifecycleShutdown/signal handlingMedium
Concurrent operationsStress testing under loadMedium
Boundary validationEdge inputs in mapping functionsLow

Skip These

  • Display formatting tests (manually verify)
  • Simple getters/setters
  • Tests that duplicate SQLite's guarantees