Back to Beads

Beads Performance Benchmarks

BENCHMARKS.md

1.0.57.7 KB
Original Source

Beads Performance Benchmarks

This document describes the performance benchmarks available in the beads project and how to use them.

Running Benchmarks

All Dolt Benchmarks

bash
go test -tags=bench -bench=. -benchmem ./internal/storage/dolt/...

Specific Benchmark

bash
go test -tags=bench -bench=BenchmarkGetReadyWork_Large -benchmem ./internal/storage/dolt/...

With CPU Profiling

bash
go test -tags=bench -bench=BenchmarkGetReadyWork_Large -cpuprofile=cpu.prof ./internal/storage/dolt/...
go tool pprof -http=:8080 cpu.prof

Benchmark Categories

Compaction Operations

  • BenchmarkGetTier1Candidates - Identify L1 compaction candidates
  • BenchmarkGetTier2Candidates - Identify L2 compaction candidates
  • BenchmarkCheckEligibility - Check if issue is eligible for compaction

Cycle Detection

Tests on graphs with different topologies (linear chains, trees, dense graphs):

  • BenchmarkCycleDetection_Linear_100/1000/5000 - Linear dependency chains
  • BenchmarkCycleDetection_Tree_100/1000 - Tree-structured dependencies
  • BenchmarkCycleDetection_Dense_100/1000 - Dense graphs

Ready Work / Filtering

  • BenchmarkGetReadyWork_Large - Filter unblocked issues (10K dataset)
  • BenchmarkGetReadyWork_XLarge - Filter unblocked issues (20K dataset)
  • BenchmarkGetReadyWork_FromJSONL - Ready work on imported database

Search Operations

  • BenchmarkSearchIssues_Large_NoFilter - Search all open issues (10K dataset)
  • BenchmarkSearchIssues_Large_ComplexFilter - Search with priority/status filters (10K dataset)
  • BenchmarkPerfSearchTypedLabelFilter_5K - Label/type search over a 5K issue/label catalog
  • BenchmarkPerfResolvePartialIDInvalidInput_5K - Invalid partial-ID rejection without a broad fallback scan

CRUD Operations

  • BenchmarkCreateIssue_Large - Create new issue in 10K database
  • BenchmarkUpdateIssue_Large - Update existing issue in 10K database
  • BenchmarkBulkCloseIssues - Close 100 issues sequentially (NEW)

Specialized Operations

  • BenchmarkLargeDescription - Handling 100KB+ issue descriptions (NEW)
  • BenchmarkSyncMerge - Simulate sync cycle with create/update operations (NEW)

Recent Perf Regression References

These benchmarks cover the May 2026 Dolt hot-path changes so future perf PRs can run before/after checks against the same fixture shapes:

PR / changeBenchmark
#3966 perf(deps): narrow recursive cycle checksBenchmarkPerfAddDependencyCycleCheck_DiamondDAG
#3967 perf(search): tighten label and partial-id queriesBenchmarkPerfSearchTypedLabelFilter_5K, BenchmarkPerfResolvePartialIDInvalidInput_5K
#3968 perf(ready): page blocked checks for limited ready workBenchmarkPerfReadyWorkLimited_LargeBlockedGraph
#4001 perf(ready): narrow deferred-parent child filteringBenchmarkPerfReadyWorkDeferredParentExclusion_5K
#4002 perf(ready): restrict blocked dependency scans to active IDsBenchmarkPerfBlockedIssues_ClosedDependencySkew
#4003 perf(get): query primary issues before wisp fallbackBenchmarkPerfGetIssuePrimaryFirst_PermanentWithWisps
#4004 perf(deps): scan one cycle table for same-storage edgesNo standalone executable perf diff in the landed squash; covered by the cycle-check benchmark above

Measured with -benchtime=1x -benchmem -count=1 on the same host, copying this benchmark file onto each before/after ref:

PR / pathBenchmarkBeforeAfterTime gainAlloc gain
#3967 label/type searchBenchmarkPerfSearchTypedLabelFilter_5K134.8 ms51.8 ms61.6%-0.1%
#3967 invalid partial-ID fallbackBenchmarkPerfResolvePartialIDInvalidInput_5K124.3 ms22.5 ms81.9%43.6%
#3966 dependency cycle checkBenchmarkPerfAddDependencyCycleCheck_DiamondDAG80.0 ms25.8 ms67.7%1.4%
#3968 limited ready workBenchmarkPerfReadyWorkLimited_LargeBlockedGraph1677.4 ms341.7 ms79.6%85.4%
#4001 deferred parent exclusionBenchmarkPerfReadyWorkDeferredParentExclusion_5K3257.3 ms130.8 ms96.0%83.1%
#4002 active blocked-dep scanBenchmarkPerfBlockedIssues_ClosedDependencySkew44.3 ms36.2 ms18.1%96.0%
#4003 primary issue lookupBenchmarkPerfGetIssuePrimaryFirst_PermanentWithWisps9.0 ms6.4 ms28.7%10.7%

Run the recent perf reference set with:

bash
go test -run=^$ -bench='BenchmarkPerf(SearchTypedLabelFilter|ResolvePartialIDInvalidInput|AddDependencyCycleCheck|ReadyWorkLimited|BlockedIssues|ReadyWorkDeferredParentExclusion|GetIssuePrimaryFirst)' -benchtime=1x -benchmem ./internal/storage/dolt

For production-shaped CLI timeout and index experiments, use:

bash
go run ./scripts/repro-dolt-prod-timeouts --bd ./bd --scenario all
go run ./scripts/bench-ready-indexes --dsn 'root@tcp(127.0.0.1:33307)/mc?timeout=30s&readTimeout=30s&writeTimeout=30s'

When repro-dolt-prod-timeouts targets an existing workspace with --workspace, fixture seeding defaults to --seed-mode=none; pass --seed-mode=full or --seed-mode=dep-only only when intentionally writing and committing synthetic fixture rows into that workspace.

bench-ready-indexes drops its candidate indexes again before exit by default; pass --keep-indexes only when intentionally leaving the final index set installed.

Performance Targets

Typical Results (M2 Pro)

OperationTimeMemoryNotes
GetReadyWork (10K)30ms16.8MBFilters ~200 open issues
Search (10K, no filter)12.5ms6.3MBReturns all open issues
Cycle Detection (5000 linear)70ms15KBDetects transitive deps
Create Issue (10K db)2.5ms8.9KBInsert into index
Update Issue (10K db)18ms17KBStatus change
Large Description (100KB)3.3ms874KBString handling overhead
Bulk Close (100 issues)1.9s1.2MB100 sequential writes
Sync Merge (20 ops)29ms198KBCreate 10 + update 10

Dataset Caching

Benchmark datasets are cached in /tmp/beads-bench-cache/:

  • large.db - 10,000 issues (16.6 MB)
  • xlarge.db - 20,000 issues (generated on demand)

Cached databases are reused across runs. To regenerate:

bash
rm /tmp/beads-bench-cache/*.db

Adding New Benchmarks

Follow the pattern in sqlite_bench_test.go:

go
// BenchmarkMyTest benchmarks a specific operation
func BenchmarkMyTest(b *testing.B) {
	runBenchmark(b, setupLargeBenchDB, func(store *SQLiteStorage, ctx context.Context) error {
		// Your test code here
		return err
	})
}

Or for custom setup:

go
func BenchmarkMyTest(b *testing.B) {
	store, cleanup := setupLargeBenchDB(b)
	defer cleanup()
	ctx := context.Background()

	b.ResetTimer()
	b.ReportAllocs()

	for i := 0; i < b.N; i++ {
		// Your test code here
	}
}

CPU Profiling

The benchmark suite automatically enables CPU profiling on the first benchmark run:

CPU profiling enabled: bench-cpu-2025-12-07-174417.prof
View flamegraph: go tool pprof -http=:8080 bench-cpu-2025-12-07-174417.prof

This generates a flamegraph showing where time is spent across all benchmarks.

Performance Optimization Strategy

  1. Identify bottleneck - Run benchmarks to find slow operations
  2. Profile - Use CPU profiling to see which functions consume time
  3. Measure - Run baseline benchmark before optimization
  4. Optimize - Make targeted changes
  5. Verify - Re-run benchmark to measure improvement

Example:

bash
# Baseline
go test -tags=bench -bench=BenchmarkGetReadyWork_Large -benchmem ./internal/storage/dolt/...

# Make changes...

# Measure improvement
go test -tags=bench -bench=BenchmarkGetReadyWork_Large -benchmem ./internal/storage/dolt/...