docs/design/dolt-concurrency.md
Status: Implemented — all-on-main is live, branch-per-worker retired Date: 2026-02-22 (implemented 2026-02-24) Authors: Steve Yegge Input: Tim Sehn (Dolt co-founder), DoltHub blog 2026-02-18 Scope: Beads (primary), orchestrator (operational), Wasteland (federation)
Beads is the universal data plane for multi-agent systems. Every agent role — workers, coordinators, observers, processors, patrols — reads and writes beads as their primary means of coordination. The Dolt concurrency model must serve all of them, not just individual workers.
The system currently uses a branch-per-worker strategy for Dolt concurrency: workers get their own Dolt branches, write in isolation, and merge to main later.
This was designed to eliminate optimistic lock contention between concurrent writers, and it works — 50 concurrent writers, 250 Dolt commits, 100% success rate in tests. But the concurrency wins are illusory because:
Workers can't see each other's beads. A bead created by agent A is invisible to agent B until A's branch merges to main. This breaks cross-agent visibility for dispatching, dependency tracking, and status queries.
Shared state must live on main. Beads is the coordination layer for the entire system. Every role — workers doing tasks, coordinators dispatching, observers monitoring, processors validating, assistants helping — needs the same view of bead state. Branch isolation is the opposite of what a shared data plane requires.
Merge-at-done introduces staleness. Long-running agents accumulate divergence. The merge at completion is a batch reconciliation point, not a continuous shared view.
Branch proliferation. Each sling creates a branch; cleanup relies on
bd done or branch cleanup. Orphaned branches accumulate. The
BD_BRANCH safety analysis (#1796) adds code complexity across the codebase.
Tim Sehn's guidance (2026-02-21): "It is far simpler to use one branch, so start there. You can get hundreds of transactions per second on a single branch. We fixed the bug you ran into."
"I think you dolt commit every sql statement. If you don't you want to wrap writes in a BEGIN and finish with a CALL DOLT_COMMIT(), ie in a transaction otherwise connections will commit each other's writes."
This identifies the core issue: without explicit SQL transactions, Dolt's auto-commit mode means any connection can inadvertently commit another connection's uncommitted working set changes. The fix is proper transaction boundaries.
From the DoltHub concurrency blog post, Dolt has two concurrency layers:
Standard SQL transaction semantics with a twist — conflict detection uses three-way merge against the branch HEAD, not row-level locking:
This is more permissive than traditional databases. Two agents updating different fields of the same bead will succeed without conflict.
Isolation level: repeatable read (vulnerable to lost updates if two connections read-then-write the same cell).
Version control operations (DOLT_COMMIT, DOLT_MERGE, DOLT_BRANCH, etc.)
acquire a global lock, execute atomically, then release. Merge work
happens outside the lock; only final graph writes are serialized.
Performance: hundreds of commit graph operations per second in normal operation. The global lock is planned to become per-branch but hasn't been a bottleneck in practice.
Pattern 1: Transaction-Wrapped Dolt Commits (recommended for us)
BEGIN;
INSERT INTO issues (id, title, status) VALUES ('gt-abc', 'Fix bug', 'open');
INSERT INTO dependencies (issue_id, depends_on_id, type) VALUES ('gt-abc', 'gt-def', 'blocks');
CALL DOLT_COMMIT('-Am', 'bd: create gt-abc');
-- Transaction ends, changes are atomically visible
Pattern 2: Branch-Per-Client (our current approach — retiring)
CALL DOLT_BRANCH('worker-ace-1708642800');
CALL DOLT_CHECKOUT('worker-ace-1708642800');
-- Isolated writes, invisible to other agents
-- Merge later
All beads live on main. Concurrent access is managed through SQL transactions
with explicit DOLT_COMMIT at transaction boundaries. No per-worker branches.
Every logical write operation (create bead, update status, close bead, add
dependency, etc.) must be wrapped in BEGIN ... CALL DOLT_COMMIT().
// BEFORE (current): auto-commit per statement, no transaction boundary
func (s *DoltStore) CreateIssue(ctx, issue, actor) {
s.db.Exec("INSERT INTO issues ...") // auto-committed
// DOLT_COMMIT happens later via maybeAutoCommit()
}
// AFTER: explicit transaction with DOLT_COMMIT inside
func (s *DoltStore) CreateIssue(ctx, issue, actor) {
tx, _ := s.db.BeginTx(ctx, nil)
tx.Exec("INSERT INTO issues ...")
tx.Exec("INSERT INTO labels ...") // if applicable
tx.Exec("INSERT INTO dependencies ...") // if applicable
tx.Exec("CALL DOLT_COMMIT('-Am', ?)", msg)
tx.Commit()
}
This ensures:
DOLT_COMMIT is part of the transaction, so it only includes our changesSimple reads (SELECT from issues, dependencies, etc.) can use bare
connections. They'll see the latest committed state of main. Dolt's
repeatable-read isolation means reads within a transaction see a consistent
snapshot.
The current batch mode (accumulate changes, commit at a logical boundary) maps naturally to a long-lived transaction:
// Batch: open transaction, do multiple writes, single DOLT_COMMIT at end
tx.Begin()
for _, issue := range issues {
tx.Exec("INSERT INTO issues ...")
}
tx.Exec("CALL DOLT_COMMIT('-Am', ?)", batchMsg)
tx.Commit()
store.go: Remove Branch-Per-WorkerRemove the BD_BRANCH initialization block (lines 336-358). The store always
operates on main. Remove SetMaxOpenConns(1) / SetMaxIdleConns(1) — we
want connection pooling now since all connections share the same branch.
// DELETE this block:
if bdBranch := os.Getenv("BD_BRANCH"); bdBranch != "" {
db.SetMaxOpenConns(1)
db.SetMaxIdleConns(1)
// ... branch creation/checkout logic
}
transaction.go: Add DOLT_COMMIT to TransactionsThe current RunInTransaction does sql.BeginTx / Commit but never calls
DOLT_COMMIT. This means SQL changes are committed to the working set but
not to Dolt's version history. Another connection could then inadvertently
include those changes in its own DOLT_COMMIT.
// BEFORE:
func (s *DoltStore) runDoltTransaction(ctx, fn) error {
sqlTx, _ := s.db.BeginTx(ctx, nil)
tx := &doltTransaction{tx: sqlTx, store: s}
fn(tx)
return sqlTx.Commit() // SQL commit only — no Dolt commit!
}
// AFTER: DOLT_COMMIT inside the transaction
func (s *DoltStore) runDoltTransaction(ctx, fn, commitMsg) error {
sqlTx, _ := s.db.BeginTx(ctx, nil)
tx := &doltTransaction{tx: sqlTx, store: s}
if err := fn(tx); err != nil {
sqlTx.Rollback()
return err
}
// Dolt commit INSIDE the SQL transaction — atomic with the writes
_, err := sqlTx.Exec("CALL DOLT_COMMIT('-Am', ?, '--author', ?)",
commitMsg, s.commitAuthorString())
if err != nil && !isNothingToCommit(err) {
sqlTx.Rollback()
return err
}
return sqlTx.Commit()
}
dolt_autocommit.go: Retire or SimplifyWith DOLT_COMMIT moving inside transactions, the external auto-commit
wrapper (maybeAutoCommit) becomes unnecessary for most operations. It may
still be useful as a safety net for bare writes that escape the transaction
pattern (migration scripts, one-off fixes), but the primary write path
should use transaction-scoped commits.
versioned.go: Merge/Branch OperationsMerge() and DeleteBranch() are only needed during the migration period
(cleaning up existing worker branches). After migration, they become
dead code for the normal write path. Retain for federation use cases
(DoltHub remote merge) and standalone Beads.
Note: The following sections reference the orchestrator's internal commands (
gt sling,gt done). These are documented here for historical context as this design was originally written for the orchestrator migration.
The orchestrator's worker dispatch previously created a Dolt branch and injected BD_BRANCH into
the worker environment. After migration:
BD_BRANCH env varThe orchestrator's task completion previously checked out main, merged the worker's branch, and deleted it. After migration:
BD_BRANCH Safety Infrastructure (#1796)The entire BD_BRANCH safety analysis (bdbranch/analyzer.go,
OnMain(), StripBdBranch(), the arch test registry) can be retired.
This is a significant simplification of the codebase.
session_manager.go: No DoltBranch OptionRemove the DoltBranch field from session options and the injection of
BD_BRANCH into tmux sessions.
With all connections on main, we need proper pooling:
db.SetMaxOpenConns(10) // Allow concurrent readers + writers
db.SetMaxIdleConns(5) // Keep warm connections
db.SetConnMaxLifetime(5 * time.Minute)
The exact numbers depend on the rig's concurrency level. A typical orchestrator rig with 6 workers + coordinator + observer + processor + patrol = ~10 concurrent agents, each potentially holding a connection.
Wisps are already in dolt_ignore-ed tables. They are not version-tracked,
not branched, and not federated (except digest publication). The wisps
concurrency model is purely SQL — standard MySQL-compatible concurrent writes
with no Dolt-specific concerns.
Wisps are worker-local by design. An agent's wisps are only meaningful to that agent's session. Cross-agent wisp visibility (e.g., molecule step coordination) uses the events/comments tables, which are also dolt_ignored.
No changes to wisps are needed for this migration.
With multiple connections writing to main concurrently, conflicts are possible but rare due to Dolt's cell-level merge semantics:
| Scenario | Conflict? | Resolution |
|---|---|---|
| Two agents create different beads | No | Different rows, auto-merged |
| Two agents update different beads | No | Different rows, auto-merged |
| Two agents update different fields of same bead | No | Different cells, auto-merged |
| Two agents update same field of same bead | Yes | Last writer wins (updated_at) |
| One agent writes while another reads | No | Read sees committed state |
The "same field of same bead" case is rare in practice — beads are typically owned by one agent at a time (assigned via sling). The main risk is concurrent status updates (e.g., one agent closes a bead while another also updates it). Mitigation: use optimistic concurrency checks where needed (check expected status before update).
Modify RunInTransaction to include DOLT_COMMIT inside the SQL transaction.
This works regardless of branch-per-worker — it's strictly additive safety.
transaction.go: Add commit message parameter, call DOLT_COMMIT
before tx.Commit()RunInTransaction callers to provide commit messagesmaybeAutoCommit as fallback for bare writesConditional on Phase 1 being stable in production.
BD_BRANCH injection from worker spawn and session managementBD_BRANCH env var handling from store.goOnMain(), StripBdBranch(), analyzer infrastructurebdbranch/analyzer.go and arch test registryBD_BRANCH references from documentationdolt-storage.md design docThe move to all-on-main simplifies federation:
Push/pull is branch-clean. A single main branch means dolt push and
dolt pull operate on a single linear history (modulo Dolt's content-addressed
merge commits). No branch namespace pollution.
Commit graph is simpler. Branch-per-worker created a complex DAG that was hard to reason about when syncing with DoltHub remotes. All-on-main produces a cleaner commit history.
Cross-rig bead visibility is immediate. When rig A pushes to DoltHub, rig B pulls and sees all beads. No branch reconciliation needed.
Federation transactions. The same BEGIN ... DOLT_COMMIT pattern
applies to federation sync: pull remote changes, resolve conflicts, commit.
This is the same flow as Dolt's native replication.
Wisps are dolt_ignore-d and never pushed. This is correct:
The bd CLI supports both embedded Dolt and server mode. This design
applies to server mode only (the orchestrator's deployment). Embedded mode
is single-process and doesn't have the multi-connection concurrency
concerns described here.
For standalone bd with embedded Dolt:
Tim's guidance: hundreds of transactions per second on a single branch.
Our workload:
This is well within Dolt's single-branch capacity. Even at 10x scale (60+ agents across a large rig), we'd hit ~1200 writes/minute = ~20 writes/second, far below the hundreds-per-second ceiling.
Commit granularity. Should every bd create produce its own Dolt
commit, or should we batch at a higher level (e.g., per-molecule, per-formula
step)? Per-operation gives better auditability; batching reduces commit
graph size. Tim's model suggests per-operation is fine at our scale.
Connection pool sizing. What's the right pool size per rig? Need to test under load. Start conservative (10 max connections) and tune.
Lost update protection. Dolt's repeatable-read isolation doesn't
prevent lost updates. Do we need application-level optimistic locking
(e.g., WHERE updated_at = ? AND status = ?) for high-contention
fields like status and assignee?
Existing branch cleanup. Production orchestrator rigs have accumulated worker branches. Need a migration script to merge-or-delete these before switching to all-on-main.
Embedded mode fallback. If a standalone bd user runs multiple
processes against the same embedded Dolt (unlikely but possible), they'd
hit the same issues. Document as unsupported, or add transaction
discipline to embedded mode too?
beads/internal/storage/dolt/store.go — DoltStore, branch-per-worker initbeads/internal/storage/dolt/transaction.go — RunInTransactionbeads/cmd/bd/dolt_autocommit.go — auto-commit wrapperdone.go — merge flow (removed)session_manager.go — BD_BRANCH injection (removed)bdbranch/ — BD_BRANCH safety analyzer (removed)