docs/TESTING_PHILOSOPHY.md
This document covers what to test and what not to test. For how to run tests, see TESTING.md.
┌─────────────────┐
│ E2E Tests │ ← PR/Deploy only (slow, expensive)
│ ~5% of tests │
└────────┬────────┘
│
┌──────────────┴──────────────┐
│ Integration Tests │ ← PR gate (moderate)
│ ~15% of tests │
└──────────────┬──────────────┘
│
┌────────────────────────┴────────────────────────┐
│ Unit Tests (Fast) │ ← Every save/commit
│ ~80% of tests │
└─────────────────────────────────────────────────┘
When: Every file save, pre-commit hooks, continuous during development
In beads: Core logic tests using newTestStore() with in-memory SQLite
When: Pre-push, PR checks
In beads: Tests tagged with //go:build integration, server mode tests
When: PR merge, pre-deploy, nightly
bd init → bd doctor → bd doctor --fix workflowA good test:
| Priority | What | Why | Examples in beads |
|---|---|---|---|
| High | Core business logic | This is what users depend on | sync, doctor, export, import |
| High | Error paths that could corrupt data | Data loss is catastrophic | Config handling, git operations, database integrity |
| Medium | Edge cases from production bugs | Discovered through real issues | Orphan handling, ID collision detection |
| Low | Display/formatting | Visual output, can be manually verified | Table formatting, color output |
Trust the language. Don't test that strings.TrimSpace works.
Use table-driven tests with representative cases instead of exhaustive permutations.
// BAD: 10 separate test functions
func TestPriority0(t *testing.T) { ... }
func TestPriority1(t *testing.T) { ... }
func TestPriority2(t *testing.T) { ... }
// GOOD: One table-driven test
func TestPriorityMapping(t *testing.T) {
cases := []struct{ in, want int }{
{0, 4}, {1, 0}, {5, 3}, // includes boundary
}
for _, tc := range cases {
t.Run(fmt.Sprintf("priority_%d", tc.in), func(t *testing.T) {
got := mapPriority(tc.in)
if got != tc.want { t.Errorf(...) }
})
}
}
Don't test "if file exists, return true" - trust the implementation.
If you test a function directly, don't also test it through every caller.
Testing obvious happy paths that would pass with trivial implementations.
// BAD: What bug would this catch?
func TestValidateBeadsWorkspace(t *testing.T) {
dir := setupTestWorkspace(t)
if err := validateBeadsWorkspace(dir); err != nil {
t.Errorf("expected no error, got: %v", err)
}
}
// GOOD: Test the interesting error cases
func TestValidateBeadsWorkspace(t *testing.T) {
cases := []struct{
name string
setup func(t *testing.T) string
wantErr string
}{
{"missing .beads dir", setupNoBeadsDir, "not a beads workspace"},
{"corrupted db", setupCorruptDB, "database is corrupted"},
{"permission denied", setupNoReadAccess, "permission denied"},
}
// ...
}
Testing the same logic multiple ways instead of once with table-driven tests.
// BAD: Repetitive individual assertions
if config.PriorityMap["0"] != 4 { t.Errorf(...) }
if config.PriorityMap["1"] != 0 { t.Errorf(...) }
if config.PriorityMap["2"] != 1 { t.Errorf(...) }
// GOOD: Table-driven
for k, want := range expectedMap {
if got := config.PriorityMap[k]; got != want {
t.Errorf("PriorityMap[%q] = %d, want %d", k, got, want)
}
}
Unit tests that execute real commands or heavy I/O when they could mock.
// BAD: Actually executes external commands in unit test
func TestServerFix(t *testing.T) {
exec.Command("bd", "dolt", "stop").Run()
// ...
}
// GOOD: Mock the execution or use integration test tag
func TestServerFix(t *testing.T) {
executor := &mockExecutor{}
fix := NewServerFix(executor)
// ...
}
Tests that break when you refactor, even though behavior is unchanged.
// BAD: Tests internal state
if len(server.connectionPool) != 3 { t.Error(...) }
// GOOD: Tests observable behavior
if resp, err := server.HandleRequest(req); err != nil { t.Error(...) }
Testing known-good values but not boundaries and invalid inputs.
// BAD: Only tests middle values
TestPriority(1) // works
TestPriority(2) // works
// GOOD: Tests boundaries and invalid
TestPriority(-1) // invalid - expect error
TestPriority(0) // boundary - min valid
TestPriority(4) // boundary - max valid
TestPriority(5) // boundary - first invalid
| Metric | Target | Current (beads) | Status |
|---|---|---|---|
| Test-to-code ratio | 0.5:1 - 1.5:1 | 0.85:1 | Healthy |
| Fast test suite | < 5 seconds | 3.8 seconds | Good |
| Integration tests | < 30 seconds | ~15 seconds | Good |
| Compilation overhead | Minimize | 180 seconds | Bottleneck |
| Area | Why It's Well-Tested |
|---|---|
| Sync/Export/Import | Data integrity critical - comprehensive edge cases |
| SQLite transactions | Rollback safety, atomicity guarantees |
| Merge operations | Dolt-native cell-level merge |
| Database locking | Prevents corruption from multiple instances |
| Area | Gap | Priority |
|---|---|---|
| Server lifecycle | Shutdown/signal handling | Medium |
| Concurrent operations | Stress testing under load | Medium |
| Boundary validation | Edge inputs in mapping functions | Low |