docs/pr-752-chaos-testing-review.md
PR: https://github.com/gastownhall/beads/pull/752 Author: jordanhubbard Bead: bd-kx1j Status: Under Review
Jordan proposes adding chaos testing and E2E test coverage to beads. The PR:
"Is this level of testing something you actually want with the current pace of progress? It comes with an implied obligation to update and add to the tests as well as follow the CICD feedback in github (very spammy if your tests don't pass!)"
cmd/bd/doctor_repair_chaos_test.go (378 lines) - Core chaos testingcmd/bd/doctor/fix/database_integrity.go (116 lines) - DB integrity fixescmd/bd/doctor/fix/jsonl_integrity.go (87 lines) - JSONL integrity fixescmd/bd/doctor/fix/fs.go (57 lines) - Filesystem fault injectioncmd/bd/doctor/fix/sqlite_open.go (52 lines) - SQLite open handlingcmd/bd/doctor/jsonl_integrity.go (123 lines) - JSONL checkscmd/bd/doctor/git.go (168 additions) - Git hygiene checksinternal/storage/memory/memory_more_coverage_test.go (921 lines) - Memory storage testscmd/bd/cli_coverage_show_test.go (426 lines) - CLI show command testscmd/bd/daemon_autostart_unit_test.go (331 lines) - Server autostart testsinternal/rpc/client_gate_shutdown_test.go (107 lines) - RPC client testsinternal/storage/sqlite/migrations/021_migrate_edge_fields.go - Major migration fixinternal/storage/sqlite/migrations/022_drop_edge_columns.go - Column cleanupinternal/storage/sqlite/migrations_template_pinned_regression_test.go - Regression testImplementation Quality: HIGH
The chaos testing code is well-structured. Key observations:
From doctor_repair_chaos_test.go:
Each test:
bd binary for testingThe PR includes fixes for bugs found during testing:
pinned and is_template columns were being clobberedTests are organized by build tags:
//go:build chaos - Chaos/corruption tests (run separately)//go:build e2e - End-to-end CLI testsThis means chaos tests only run when explicitly requested, not on every go test.
Is the testing worth the ongoing maintenance cost?
Beads is more robust than feared. If Jordan got these tests passing, it means:
bd doctor actually recovers from corruptionThis validates the core design: SQLite + JSONL + git backstop.
Bugs already found. The migration 021/022 bugs are exactly the kind of subtle issues that would cause data loss in production. Finding them now is worth something.
Build tag isolation. Chaos tests won't slow down regular development:
go test ./... # Normal tests only
go test -tags=chaos ./... # Include chaos tests
go test -tags=e2e ./... # Include E2E tests
48% coverage is a floor, not a target. The PR doesn't enforce maintaining 48%. Jordan is asking: "Is this level worth it?" We can always add more later, or let coverage drift if priorities change.
Documentation value. E2E tests document expected user scenarios. When an AI agent asks "what should happen when X?", the tests provide executable answers.
Velocity tax is real. Every behavior change needs test updates. This is especially painful during rapid iteration phases.
CI noise. Failed tests block merges. With multiple agents working, flaky tests become coordination bottlenecks.
Framework maintenance. The chaos testing framework itself (side databases, build tags, test helpers) becomes another thing to maintain.
False confidence. Tests passing doesn't mean beads is production-ready. It means tested scenarios work. Edge cases not covered still fail silently.
If beads is still in "rapid prototype" phase: The testing overhead is premature. Focus on features, fix crashes as they happen, lean on git backstop.
If beads is approaching "reliable tool" phase: Testing is essential. Multi-agent workflows amplify bugs. Corruption during a 10-agent batch is expensive.
Current reality: Beads is being dogfooded seriously. Multiple agents, real work, real data loss when things break. We're closer to "reliable tool" than "prototype."
Cost of NOT testing: When corruption happens:
Cost of testing:
If corruption happens once a month, testing ROI is marginal. If corruption happens weekly (or with each new feature), testing pays for itself.
MERGE WITH MODIFICATIONS
No hard coverage threshold in CI. Let coverage drift naturally. The value is in the chaos tests catching corruption, not in hitting a percentage.
Chaos tests optional in CI. Run chaos tests on release branches, not every PR. This reduces CI noise during active development.
Clear ownership. Jordan should document how to add new chaos scenarios. Future contributors need to know when to add vs skip tests.
If you answer YES to 2+ of these, merge:
If you answer NO to all, defer the PR until beads stabilizes.