Back to Beads

CI Test Surface Audit

docs/CI_TEST_SURFACE_AUDIT.md

1.0.516.9 KB
Original Source

CI Test Surface Audit

Last reviewed: 2026-05-26

Freshness source: origin/main at 4990c8309, Makefile, scripts/test*.sh, .buildflags, .test-skip, .github/workflows/*.yml, and test-file inventory from git ls-files, go list, and rg.

This is an audit snapshot, not the final CI policy. Use it to reason from first principles about what the repository can validate, what CI currently validates, and what should be cleaned up next.

Accepted cleanup decisions and implementation order are tracked in CI_CLEANUP_PLAN.md.

Executive Summary

The repository has a large Go test surface and several non-Go package surfaces, but CI is not organized around a single canonical test contract.

Current facts:

SurfaceCurrent size / commandCI status
Go packages69 packages from go list ./...Core PR/main CI runs direct go test, not scripts/test.sh.
Go test files610 *_test.go filesMostly covered through Linux/macOS ./..., with exclusions and tags.
Go test functions4362 func Test... declarationsPR/main CI uses -short and skips ^TestEmbedded.
Go benchmarks46 func Benchmark... declarationsLocal/manual only.
Embedded Dolt tests157 TestEmbedded* declarationsConditional 20-shard CI matrix on risk paths, always on main push/merge queue.
Integration-tagged tests31 files with integration build tagNightly only as a broad sweep; selected Docker-backed suites run in PR CI.
MCP Python packageuv run pytest, uv run ruff, uv run mypy are documented/configuredBuild/publish workflows do not run the checks.
npm packagenpm test, npm run test:integration, npm run test:allRelease publish does not run package tests.
Websitenpm run build, npm run typecheckDeploy workflow builds on main pushes only; typecheck is unused in CI.

Main conclusion: before changing CI mechanics, define named validation tiers and make each tier map to one repository-owned command. Today, local docs, Make targets, wrapper scripts, and Actions jobs describe overlapping but different contracts.

Local Test Surface

Canonical Build Flags

.buildflags is the source for normal local shell scripts:

  • Defaults CGO_ENABLED=1 unless the caller already set it.
  • Exports BEADS_BUILD_TAGS=gms_pure_go.
  • Adds -tags=gms_pure_go to GOFLAGS.

The normal shipped path is CGO-enabled with pure-Go regex support. The ICU path exists only as an opt-in maintainer path.

Make Targets

TargetCommand pathPurpose
make buildgo build -tags "$(BUILD_TAGS)" ./cmd/bdBuild local bd binary with gms_pure_go.
make testTEST_COVER=1 ./scripts/test.shLocal default suite with coverage and .test-skip handling.
make test-icu-path./scripts/test-icu-path.sh ./...Opt-in ICU regex path, not normal validation.
make test-full-cgoAlias to make test-icu-pathDeprecated compatibility target.
make test-regressiongo test -tags=regression,$(BUILD_TAGS) ./tests/regression/...Differential regression suite against baseline release.
make test-upgradeBuild then scripts/upgrade-smoke-test.shPrevious-release upgrade smoke gate.
make test-cross-versionBuild then scripts/cross-version-smoke-test.shCross-version upgrade smoke coverage.
make test-migrationBuild then scripts/migration-test/run.shMigration fidelity harness across storage eras.
make benchgo test -bench=. ./internal/storage/dolt/Full Dolt benchmark suite.
make bench-quickShorter benchmark runLocal performance iteration.
make fmt-checkgofmt -l .Formatting gate.
make check-docsBuild no-CGO binary, then scripts/check-doc-flags.shCLI-doc flag freshness.

Go Test Runner

scripts/test.sh is the local wrapper recommended by docs/TESTING.md.

It:

  • Sources .buildflags.
  • Reads .test-skip and passes a composed -skip regex to go test.
  • Defaults to go test -timeout 3m ./....
  • Supports -v, -timeout, -run, package arguments, and extra -skip.
  • Enables coverage when TEST_COVER=1.
  • Can start one shared Dolt test server when BEADS_TEST_SHARED_SERVER=1.

At this audit point, .test-skip contains only comments and no active skip patterns.

Build Tags and Special Test Modes

Build tag / modeObserved surfaceCurrent command
default CGO + gms_pure_goMost Go tests./scripts/test.sh, make test, PR Linux/macOS jobs.
!cgo3 test filesOnly partially covered by the CI pure-Go cmd/bd job.
cgo186 test filesCovered when CGO is enabled, including normal Linux/macOS CI.
integration31 test files mention integrationNightly broad run; selected PR jobs.
regression2 test filesmake test-regression, regression workflow.
e2e1 test fileNo routine CI gate observed.
scripttests1 test fileNo routine CI gate observed.
chaos1 test fileNo routine CI gate observed.
regression && discovery1 test fileManual/specialized only.

testing.Short() appears in 25 test files. The main PR Linux/macOS matrix runs with -short, so some slow or external-path tests are intentionally skipped there even when their build tags are selected.

Embedded Dolt Surface

Embedded Dolt tests are split out from the PR/core matrix:

  • CI core Linux/macOS uses -skip '^TestEmbedded'.
  • .github/scripts/ci-embedded-tier.sh decides whether to run full embedded coverage.
  • build-embedded prebuilds /tmp/bd-embedded-test, /tmp/embeddeddolt-test, and /tmp/bd-cmd-test.
  • test-embedded-storage runs the embedded storage test binary with BEADS_TEST_EMBEDDED_DOLT=1.
  • test-embedded-cmd shards cmd/bd TestEmbedded* across 20 jobs using .github/scripts/embedded-test-shard.sh.

The tier detector runs full embedded coverage on:

  • Push to main.
  • Merge queue.
  • PRs touching cmd/, internal/, tests/, scripts/, .github/scripts/, .github/workflows/, Go module/build inputs, root agent/release docs, or any *.go file.

It skips the embedded matrix on docs/metadata-only PRs.

Release and Migration Surface

The release-related local scripts are separate from the normal Go test runner:

Script / targetScope
scripts/upgrade-smoke-test.shPrevious-release upgrade scenarios covering data, mode, role, doctor, and mutations.
scripts/cross-version-smoke-test.shCandidate readability after data creation by older releases.
scripts/migration-test/run.shRich migration datasets, snapshots, fidelity checks, and recipe discovery.
tests/regression/...Differential CLI/storage behavior against tests/regression/BASELINE_VERSION.

These are valuable but are not one unified release gate in CI today.

Non-Go Package Surface

AreaAvailable local commandsCurrent issue
integrations/beads-mcpuv run pytest, uv run pytest --cov=beads_mcp tests/, uv run ruff check src/beads_mcp, uv run mypy src/beads_mcp, uv buildTests/lint/typecheck are documented but not run before publish in current workflows.
npm-packagenpm test, npm run test:integration, npm run test:allRelease workflow publishes without running package tests.
websitenpm run build, npm run typecheckDeploy workflow builds and link-checks on main pushes; typecheck is not run, and PRs do not get a website gate.
plugins/beadsManifest files and generated plugin assetsVersion consistency is checked, but there is no obvious plugin manifest/schema gate.

Current GitHub Actions Map

ci.yml

Triggers: push to main, pull request to main, merge queue.

Jobs:

  • detect-ci-tier: decides whether to run full embedded Dolt coverage.
  • check-build-tags: runs scripts/check-build-tags.sh and scripts/check-go-install-guidance.sh.
  • check-cmd-bd-puregeo-tests: CGO-disabled cmd/bd build, test-binary compile, and a focused pure-Go test subset.
  • check-version-consistency: runs scripts/check-versions.sh.
  • check-doc-flags: builds a no-CGO bd and validates docs against CLI flags.
  • check-no-beads-changes: PR-only guard for .beads/issues.jsonl.
  • test: Linux/macOS matrix, installs Dolt, builds bd, then runs:
    • Linux: gotestsum -- -tags gms_pure_go -race -short -coverprofile=coverage.out -skip '^TestEmbedded' ./...
    • macOS: go test -tags gms_pure_go -v -race -short -skip '^TestEmbedded' ./...
  • test-domain-uow: pulls dolthub/dolt-sql-server:1.88.1, prebuilds bd, and runs internal/storage/domain/... plus internal/storage/uow/....
  • build-embedded, test-embedded-storage, test-embedded-cmd: conditional embedded Dolt matrix.
  • test-windows: Windows build plus version and help smoke tests only.
  • fmt-check: make fmt-check.
  • lint: golangci-lint with --build-tags=gms_pure_go.
  • test-nix: nix run .#default -- --help and validates first help line.

Other Workflows

WorkflowTriggersMain validation
regression.ymlPush to main, PR to main, manualDetector runs regression on push/manual, PR label run-regression, or risky paths; test command is go test -tags=regression,gms_pure_go -timeout=20m -v ./tests/regression/....
cross-version-smoke.ymlTags, PRs, manualPRs test latest 5 releases, tags test latest 30, via scripts/upgrade-smoke-test.sh.
migration-test.ymlTags, manualBuilds candidate and runs scripts/migration-test/run.sh; not a PR/main gate.
nightly.ymlDaily schedule, manualgo test -v -race -tags=integration,gms_pure_go -coverprofile=coverage.out -timeout=30m ./... with BEADS_TEST_SKIP=dolt; checks coverage >= 30%.
nix-build.ymlPR/push paths for Nix or Go module files, manualnix build .#default --print-build-logs.
deploy-docs.ymlPush to main paths website/** or scripts/generate-llms-full.sh, manualnpm ci, generate llms-full.txt, npm run build, internal link check, non-blocking external link check, deploy Pages.
release.ymlTags, manual from tagGoReleaser, native macOS builds, macOS embedded smoke, release attestations/SBOM, Homebrew formula update, PyPI build/publish, npm publish.
test-pypi.ymlManualBuilds MCP package and publishes to TestPyPI.
update-flake-lock.ymlWeekly, manualUpdates flake.lock, scope-checks diff, opens PR.
update-vendor-hash.ymlDependabot pull_request_target for go.mod/go.sumUpdates default.nix vendor hash and pushes to dependabot branch.

Gap Analysis

P0: Define the CI Contract

There is no single source of truth for which checks are required for PRs, main pushes, nightly, and releases. The workflows encode this implicitly.

Impact: agents and maintainers cannot reliably answer "what must pass before merge" or "which local command reproduces this status check" without reading multiple workflow files.

P0: Local and CI Go Commands Diverge

Local docs recommend make test / scripts/test.sh; PR CI runs direct go test invocations with -race, -short, and -skip '^TestEmbedded'.

Impact:

  • scripts/test.sh skip, timeout, coverage, and shared-server behavior is not the PR contract.
  • make test is not a faithful local reproduction of PR CI.
  • The testing.Short() boundary is part of CI behavior but not prominent in the local test docs.

P1: No-CGO Coverage Is Partial

CI has a focused CGO-disabled cmd/bd compile/run job, but !cgo tests outside that subset are not obviously covered by a full no-CGO ./... job.

Impact: regressions in server-mode/no-CGO behavior outside the focused subset can escape the main PR gate.

P1: Integration Coverage Is Fragmented

Some Docker-backed Dolt suites run in PR CI, the broad integration tag runs nightly with Docker Dolt tests explicitly skipped, and several slow paths are suppressed by -short.

Impact: this may be intentional, but the contract is undocumented. It is hard to distinguish "not worth PR time" from "accidentally uncovered."

P1: Release Gates Are Spread Across Workflows

Regression, cross-version smoke, migration fidelity, release builds, PyPI, npm, and Homebrew publishing are separate workflows with different triggers.

Impact: tag-time validation exists, but the release-blocking order and rerun strategy are not represented as one release gate document or workflow summary.

P1: Non-Go Package Tests Are Not CI Gates

The MCP Python package, npm wrapper, and website each have local tests or checks. Current workflows mostly build or publish these artifacts but do not run their full local validation before publishing or on PRs touching those paths.

Impact: package-specific regressions can reach release/publish workflows without the package's own tests having run.

P2: Windows Is Smoke-Only

Windows CI builds and runs version and help, but it does not run Go tests.

Impact: Windows-specific filesystem, path, shell, and CGO behavior depends on limited coverage. This may be the right tradeoff, but it should be an explicit tier with a known owner and escape hatch.

P2: Benchmark and Performance Regression Checks Are Manual

Benchmarks and production-shaped repro tools are documented, but no workflow captures benchmark artifacts or runs labeled performance checks.

Impact: performance-sensitive changes rely on human/agent discipline rather than a repeatable CI path.

P2: Existing Docs Have Drift

Examples:

  • docs/TESTING.md says the wrapper script is consistent with CI, but PR CI uses direct go test.
  • docs/LINTING.md says CI may not fail on known lint issues, while the workflow uses the standard golangci-lint action without an explicit non-failing issues exit code.
  • Older staged audit docs contain obsolete test counts.

Impact: stale docs undermine the cleanup effort because they hide the actual current contract.

Roadmap

Phase 1: Name the Tiers

Create a CI policy doc or convert this audit into one with explicit tiers:

TierPurposeExample triggerLocal command
pr-coreRequired fast PR signalEvery PR/main pushTo be defined.
pr-riskExtra checks for risky pathsGo/storage/scripts/workflow changesTo be defined.
nightly-fullExpensive broad sweepSchedule/manualTo be defined.
release-gateMust pass before publishingTags/manual releaseTo be defined.
package-gatesPython/npm/website path checksPath-gated PRs and releaseTo be defined.
perf-manualBenchmark evidenceManual or labelTo be defined.

Then make every status check name include its tier.

Phase 2: Add Reproducible CI Wrapper Commands

Add repository-owned wrapper scripts or Make targets for each tier, for example:

  • make test-pr-core
  • make test-pr-risk
  • make test-nocgo
  • make test-integration-nightly
  • make test-release-gate

Do not make workflow YAML be the only place where command policy lives.

Phase 3: Align Current PR CI With the Wrappers

Once the tier commands exist, change ci.yml to call those commands. Preserve useful CI-only behavior such as gotestsum/JUnit output by wrapping around the same underlying command rather than maintaining a separate test definition.

Phase 4: Fill Package-Specific Gaps

Add path-gated checks:

  • MCP: uv run ruff check, uv run mypy, uv run pytest.
  • npm package: npm ci or equivalent package install, then npm run test:all.
  • Website: npm ci, npm run typecheck, npm run build, and possibly the internal link check on PRs that touch website/**.
  • Plugins: validate manifest JSON and version consistency beyond the existing version script if schema tooling exists or can be added cheaply.

Phase 5: Rationalize Expensive Coverage

Decide and document:

  • Which integration-tagged tests must be PR-gated.
  • Whether nightly should continue using BEADS_TEST_SKIP=dolt, or whether Docker-backed Dolt hangs are resolved enough to split and re-enable them.
  • Which release migration paths must block tags versus run manually.
  • Which Windows tests are worth adding without making the matrix unstable.

Phase 6: Add CI Observability

For every tier, capture enough artifacts to debug failures without rerunning:

  • JUnit for Go tests where practical.
  • Coverage artifacts by tier.
  • Embedded shard selected-test lists.
  • Package test logs.
  • Release-gate summary that links regression, cross-version, migration, and publish checks.

Immediate Next Steps

  1. Turn this audit into a shorter docs/CI.md policy once maintainers agree on the tier names.
  2. Add make test-pr-core that exactly reproduces the current Linux PR test command, including -race, -short, and -skip '^TestEmbedded'.
  3. Add a no-CGO all-package compile/test gate or explicitly document why the focused cmd/bd subset is enough.
  4. Add path-gated MCP, npm package, and website checks before touching the main Go matrix.
  5. Update docs/TESTING.md after the wrapper commands exist so local guidance points at the same contract CI runs.