docs/INTERNALS.md
This document describes internal implementation details of bd, with particular focus on concurrency guarantees and data consistency.
For the overall architecture (data model, sync mechanism, component overview), see ARCHITECTURE.md.
The original auto-flush implementation suffered from a critical race condition when multiple concurrent operations accessed shared state:
Concurrent access points:
Shared mutable state:
isDirty flagneedsFullExport flagflushTimer instancestoreActive flagImpact:
The race condition was eliminated by replacing timer-based shared state with an event-driven architecture using a single-owner pattern.
┌─────────────────────────────────────────────────────────┐
│ Command/Agent │
│ │
│ markDirtyAndScheduleFlush() ─┐ │
│ markDirtyAndScheduleFullExport() ─┐ │
└────────────────────────────────────┼───┼────────────────┘
│ │
v v
┌────────────────────────────────────┐
│ FlushManager │
│ (Single-Owner Pattern) │
│ │
│ Channels (buffered): │
│ - markDirtyCh │
│ - timerFiredCh │
│ - flushNowCh │
│ - shutdownCh │
│ │
│ State (owned by run() goroutine): │
│ - isDirty │
│ - needsFullExport │
│ - debounceTimer │
└────────────────────────────────────┘
│
v
┌────────────────────────────────────┐
│ flushWithState() │
│ │
│ - Validates store is active │
│ - Checks data integrity │
│ - Performs Dolt commit │
│ - Updates sync state │
└────────────────────────────────────┘
1. Single Owner Pattern
All flush state (isDirty, needsFullExport, debounceTimer) is owned by a single background goroutine (FlushManager.run()). This eliminates the need for mutexes to protect this state.
2. Channel-Based Communication
External code communicates with FlushManager via buffered channels:
markDirtyCh: Request to mark DB dirty (incremental or full export)timerFiredCh: Debounce timer expired notificationflushNowCh: Synchronous flush request (returns error)shutdownCh: Graceful shutdown with final flush3. No Shared Mutable State
The only shared state is accessed via atomic operations (channel sends/receives). The storeActive flag and store pointer still use a mutex, but only to coordinate with store lifecycle, not flush logic.
4. Debouncing Without Locks
The timer callback sends to timerFiredCh instead of directly manipulating state. The run() goroutine processes timer events in its select loop, eliminating timer-related races.
Thread-Safety:
MarkDirty(fullExport bool) - Safe from any goroutine, non-blockingFlushNow() error - Safe from any goroutine, blocks until flush completesShutdown() error - Idempotent, safe to call multiple timesDebouncing Guarantees:
MarkDirty() calls within the debounce window → single flushShutdown Guarantees:
sync.Once - safe for multiple callsStore Lifecycle:
storeActive flag before every flushstoreMutexThe implementation maintains backward compatibility:
flushManager == nil, falls back to old timer-based logicmarkDirtyAndScheduleFlush() and markDirtyAndScheduleFullExport() delegate to FlushManager when availableThis allows existing tests to pass without modification while fixing the race condition in production.
Comprehensive race detector tests ensure concurrency safety:
TestFlushManagerConcurrentMarkDirty - Many goroutines marking dirtyTestFlushManagerConcurrentFlushNow - Concurrent immediate flushesTestFlushManagerMarkDirtyDuringFlush - Interleaved mark/flush operationsTestFlushManagerShutdownDuringOperation - Shutdown while operations ongoingTestMarkDirtyAndScheduleFlushConcurrency - Integration test with legacy APIRun with: go test -race -run TestFlushManager ./cmd/bd
The FlushManager is designed to work correctly when commands run multiple times in the same process (common in tests):
PersistentPreRun creates a new FlushManagerPersistentPostRun shuts down the managerShutdown() is idempotent via sync.OnceWhen running with Dolt server mode (orchestrator), the CLI communicates with the Dolt SQL server for database operations. The FlushManager is NOT used in server mode — the server process has its own flush coordination.
The server mode check in PersistentPostRun ensures FlushManager shutdown only occurs in embedded mode (standalone users).
Auto-import runs in PersistentPreRun before FlushManager is used. It may call markDirtyAndScheduleFlush() or markDirtyAndScheduleFullExport() if remote changes are detected.
Hash-based comparison (not mtime) prevents git pull false positives (issue bd-84).
flushWithState() validates database state before flush:
getDebounceDuration() (default 5s)The bd ready command originally computed blocked issues using a recursive CTE on every query. On a 10K issue database, each query took ~752ms, making the command feel sluggish and impractical for large projects.
The blocked_issues_cache table materializes the blocking computation, storing issue IDs for all currently blocked issues. Queries now use a simple NOT EXISTS check against this cache, completing in ~29ms (25x speedup).
┌─────────────────────────────────────────────────────────┐
│ GetReadyWork Query │
│ │
│ SELECT ... FROM issues WHERE status IN (...) │
│ AND NOT EXISTS ( │
│ SELECT 1 FROM blocked_issues_cache │
│ WHERE issue_id = issues.id │
│ ) │
│ │
│ Performance: 29ms (was 752ms with recursive CTE) │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ Cache Invalidation Triggers │
│ │
│ 1. AddDependency (blocks/parent-child only) │
│ 2. RemoveDependency (blocks/parent-child only) │
│ 3. UpdateIssue (on any status change) │
│ 4. CloseIssue (changes status to closed) │
│ │
│ NOT triggered by: related, discovered-from deps │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ Cache Rebuild Process │
│ │
│ 1. DELETE FROM blocked_issues_cache │
│ 2. INSERT INTO blocked_issues_cache │
│ WITH RECURSIVE CTE: │
│ - Find directly blocked issues (blocks deps) │
│ - Propagate to children (parent-child deps) │
│ 3. Happens in same transaction as triggering change │
│ │
│ Performance: <50ms full rebuild on 10K database │
└─────────────────────────────────────────────────────────┘
An issue is blocked if:
blocks dependency on an open/in_progress/blocked issueparent-child dependencyClosed issues never block others. Related and discovered-from dependencies don't affect blocking.
Full rebuild on every change
Instead of incremental updates, the cache is completely rebuilt (DELETE + INSERT) on any triggering change. This approach is chosen because:
Transaction safety
All cache operations happen within the same transaction as the triggering change:
Selective invalidation
Only blocks and parent-child dependencies trigger rebuilds since they affect blocking semantics. Related and discovered-from dependencies don't trigger invalidation, avoiding unnecessary work.
Query performance (GetReadyWork):
Write overhead:
Parent-child transitive blocking
Multiple blockers
Status changes
Dependency removal
Foreign key cascades
Comprehensive test coverage in blocked_cache_test.go:
Run tests: go test -v ./internal/storage/dolt -run TestCache
internal/storage/dolt/blocked_cache.go - Cache rebuild and invalidationinternal/storage/dolt/ready.go - Uses cache in GetReadyWork queriesinternal/storage/dolt/dependencies.go - Invalidates on dep changesinternal/storage/dolt/queries.go - Invalidates on status changesIf rebuild becomes a bottleneck in very large databases (>100K issues):
However, current performance is excellent for realistic workloads.
Potential enhancements for multi-agent scenarios:
Flush coordination across agents:
Adaptive debounce window:
Flush progress tracking:
Per-issue dirty tracking optimization: