docs/CI_TEST_SURFACE_AUDIT.md
Last reviewed: 2026-05-26
Freshness source: origin/main at 4990c8309, Makefile,
scripts/test*.sh, .buildflags, .test-skip, .github/workflows/*.yml,
and test-file inventory from git ls-files, go list, and rg.
This is an audit snapshot, not the final CI policy. Use it to reason from first principles about what the repository can validate, what CI currently validates, and what should be cleaned up next.
Accepted cleanup decisions and implementation order are tracked in
CI_CLEANUP_PLAN.md.
The repository has a large Go test surface and several non-Go package surfaces, but CI is not organized around a single canonical test contract.
Current facts:
| Surface | Current size / command | CI status |
|---|---|---|
| Go packages | 69 packages from go list ./... | Core PR/main CI runs direct go test, not scripts/test.sh. |
| Go test files | 610 *_test.go files | Mostly covered through Linux/macOS ./..., with exclusions and tags. |
| Go test functions | 4362 func Test... declarations | PR/main CI uses -short and skips ^TestEmbedded. |
| Go benchmarks | 46 func Benchmark... declarations | Local/manual only. |
| Embedded Dolt tests | 157 TestEmbedded* declarations | Conditional 20-shard CI matrix on risk paths, always on main push/merge queue. |
| Integration-tagged tests | 31 files with integration build tag | Nightly only as a broad sweep; selected Docker-backed suites run in PR CI. |
| MCP Python package | uv run pytest, uv run ruff, uv run mypy are documented/configured | Build/publish workflows do not run the checks. |
| npm package | npm test, npm run test:integration, npm run test:all | Release publish does not run package tests. |
| Website | npm run build, npm run typecheck | Deploy workflow builds on main pushes only; typecheck is unused in CI. |
Main conclusion: before changing CI mechanics, define named validation tiers and make each tier map to one repository-owned command. Today, local docs, Make targets, wrapper scripts, and Actions jobs describe overlapping but different contracts.
.buildflags is the source for normal local shell scripts:
CGO_ENABLED=1 unless the caller already set it.BEADS_BUILD_TAGS=gms_pure_go.-tags=gms_pure_go to GOFLAGS.The normal shipped path is CGO-enabled with pure-Go regex support. The ICU path exists only as an opt-in maintainer path.
| Target | Command path | Purpose |
|---|---|---|
make build | go build -tags "$(BUILD_TAGS)" ./cmd/bd | Build local bd binary with gms_pure_go. |
make test | TEST_COVER=1 ./scripts/test.sh | Local default suite with coverage and .test-skip handling. |
make test-icu-path | ./scripts/test-icu-path.sh ./... | Opt-in ICU regex path, not normal validation. |
make test-full-cgo | Alias to make test-icu-path | Deprecated compatibility target. |
make test-regression | go test -tags=regression,$(BUILD_TAGS) ./tests/regression/... | Differential regression suite against baseline release. |
make test-upgrade | Build then scripts/upgrade-smoke-test.sh | Previous-release upgrade smoke gate. |
make test-cross-version | Build then scripts/cross-version-smoke-test.sh | Cross-version upgrade smoke coverage. |
make test-migration | Build then scripts/migration-test/run.sh | Migration fidelity harness across storage eras. |
make bench | go test -bench=. ./internal/storage/dolt/ | Full Dolt benchmark suite. |
make bench-quick | Shorter benchmark run | Local performance iteration. |
make fmt-check | gofmt -l . | Formatting gate. |
make check-docs | Build no-CGO binary, then scripts/check-doc-flags.sh | CLI-doc flag freshness. |
scripts/test.sh is the local wrapper recommended by docs/TESTING.md.
It:
.buildflags..test-skip and passes a composed -skip regex to go test.go test -timeout 3m ./....-v, -timeout, -run, package arguments, and extra -skip.TEST_COVER=1.BEADS_TEST_SHARED_SERVER=1.At this audit point, .test-skip contains only comments and no active skip
patterns.
| Build tag / mode | Observed surface | Current command |
|---|---|---|
default CGO + gms_pure_go | Most Go tests | ./scripts/test.sh, make test, PR Linux/macOS jobs. |
!cgo | 3 test files | Only partially covered by the CI pure-Go cmd/bd job. |
cgo | 186 test files | Covered when CGO is enabled, including normal Linux/macOS CI. |
integration | 31 test files mention integration | Nightly broad run; selected PR jobs. |
regression | 2 test files | make test-regression, regression workflow. |
e2e | 1 test file | No routine CI gate observed. |
scripttests | 1 test file | No routine CI gate observed. |
chaos | 1 test file | No routine CI gate observed. |
regression && discovery | 1 test file | Manual/specialized only. |
testing.Short() appears in 25 test files. The main PR Linux/macOS matrix runs
with -short, so some slow or external-path tests are intentionally skipped
there even when their build tags are selected.
Embedded Dolt tests are split out from the PR/core matrix:
-skip '^TestEmbedded'..github/scripts/ci-embedded-tier.sh decides whether to run full embedded
coverage.build-embedded prebuilds /tmp/bd-embedded-test,
/tmp/embeddeddolt-test, and /tmp/bd-cmd-test.test-embedded-storage runs the embedded storage test binary with
BEADS_TEST_EMBEDDED_DOLT=1.test-embedded-cmd shards cmd/bd TestEmbedded* across 20 jobs using
.github/scripts/embedded-test-shard.sh.The tier detector runs full embedded coverage on:
main.cmd/, internal/, tests/, scripts/, .github/scripts/,
.github/workflows/, Go module/build inputs, root agent/release docs, or any
*.go file.It skips the embedded matrix on docs/metadata-only PRs.
The release-related local scripts are separate from the normal Go test runner:
| Script / target | Scope |
|---|---|
scripts/upgrade-smoke-test.sh | Previous-release upgrade scenarios covering data, mode, role, doctor, and mutations. |
scripts/cross-version-smoke-test.sh | Candidate readability after data creation by older releases. |
scripts/migration-test/run.sh | Rich migration datasets, snapshots, fidelity checks, and recipe discovery. |
tests/regression/... | Differential CLI/storage behavior against tests/regression/BASELINE_VERSION. |
These are valuable but are not one unified release gate in CI today.
| Area | Available local commands | Current issue |
|---|---|---|
integrations/beads-mcp | uv run pytest, uv run pytest --cov=beads_mcp tests/, uv run ruff check src/beads_mcp, uv run mypy src/beads_mcp, uv build | Tests/lint/typecheck are documented but not run before publish in current workflows. |
npm-package | npm test, npm run test:integration, npm run test:all | Release workflow publishes without running package tests. |
website | npm run build, npm run typecheck | Deploy workflow builds and link-checks on main pushes; typecheck is not run, and PRs do not get a website gate. |
plugins/beads | Manifest files and generated plugin assets | Version consistency is checked, but there is no obvious plugin manifest/schema gate. |
ci.ymlTriggers: push to main, pull request to main, merge queue.
Jobs:
detect-ci-tier: decides whether to run full embedded Dolt coverage.check-build-tags: runs scripts/check-build-tags.sh and
scripts/check-go-install-guidance.sh.check-cmd-bd-puregeo-tests: CGO-disabled cmd/bd build, test-binary compile,
and a focused pure-Go test subset.check-version-consistency: runs scripts/check-versions.sh.check-doc-flags: builds a no-CGO bd and validates docs against CLI flags.check-no-beads-changes: PR-only guard for .beads/issues.jsonl.test: Linux/macOS matrix, installs Dolt, builds bd, then runs:
gotestsum -- -tags gms_pure_go -race -short -coverprofile=coverage.out -skip '^TestEmbedded' ./...go test -tags gms_pure_go -v -race -short -skip '^TestEmbedded' ./...test-domain-uow: pulls dolthub/dolt-sql-server:1.88.1, prebuilds bd,
and runs internal/storage/domain/... plus internal/storage/uow/....build-embedded, test-embedded-storage, test-embedded-cmd: conditional
embedded Dolt matrix.test-windows: Windows build plus version and help smoke tests only.fmt-check: make fmt-check.lint: golangci-lint with --build-tags=gms_pure_go.test-nix: nix run .#default -- --help and validates first help line.| Workflow | Triggers | Main validation |
|---|---|---|
regression.yml | Push to main, PR to main, manual | Detector runs regression on push/manual, PR label run-regression, or risky paths; test command is go test -tags=regression,gms_pure_go -timeout=20m -v ./tests/regression/.... |
cross-version-smoke.yml | Tags, PRs, manual | PRs test latest 5 releases, tags test latest 30, via scripts/upgrade-smoke-test.sh. |
migration-test.yml | Tags, manual | Builds candidate and runs scripts/migration-test/run.sh; not a PR/main gate. |
nightly.yml | Daily schedule, manual | go test -v -race -tags=integration,gms_pure_go -coverprofile=coverage.out -timeout=30m ./... with BEADS_TEST_SKIP=dolt; checks coverage >= 30%. |
nix-build.yml | PR/push paths for Nix or Go module files, manual | nix build .#default --print-build-logs. |
deploy-docs.yml | Push to main paths website/** or scripts/generate-llms-full.sh, manual | npm ci, generate llms-full.txt, npm run build, internal link check, non-blocking external link check, deploy Pages. |
release.yml | Tags, manual from tag | GoReleaser, native macOS builds, macOS embedded smoke, release attestations/SBOM, Homebrew formula update, PyPI build/publish, npm publish. |
test-pypi.yml | Manual | Builds MCP package and publishes to TestPyPI. |
update-flake-lock.yml | Weekly, manual | Updates flake.lock, scope-checks diff, opens PR. |
update-vendor-hash.yml | Dependabot pull_request_target for go.mod/go.sum | Updates default.nix vendor hash and pushes to dependabot branch. |
There is no single source of truth for which checks are required for PRs, main pushes, nightly, and releases. The workflows encode this implicitly.
Impact: agents and maintainers cannot reliably answer "what must pass before merge" or "which local command reproduces this status check" without reading multiple workflow files.
Local docs recommend make test / scripts/test.sh; PR CI runs direct
go test invocations with -race, -short, and -skip '^TestEmbedded'.
Impact:
scripts/test.sh skip, timeout, coverage, and shared-server behavior is not
the PR contract.make test is not a faithful local reproduction of PR CI.testing.Short() boundary is part of CI behavior but not prominent in
the local test docs.CI has a focused CGO-disabled cmd/bd compile/run job, but !cgo tests outside
that subset are not obviously covered by a full no-CGO ./... job.
Impact: regressions in server-mode/no-CGO behavior outside the focused subset can escape the main PR gate.
Some Docker-backed Dolt suites run in PR CI, the broad integration tag runs
nightly with Docker Dolt tests explicitly skipped, and several slow paths are
suppressed by -short.
Impact: this may be intentional, but the contract is undocumented. It is hard to distinguish "not worth PR time" from "accidentally uncovered."
Regression, cross-version smoke, migration fidelity, release builds, PyPI, npm, and Homebrew publishing are separate workflows with different triggers.
Impact: tag-time validation exists, but the release-blocking order and rerun strategy are not represented as one release gate document or workflow summary.
The MCP Python package, npm wrapper, and website each have local tests or checks. Current workflows mostly build or publish these artifacts but do not run their full local validation before publishing or on PRs touching those paths.
Impact: package-specific regressions can reach release/publish workflows without the package's own tests having run.
Windows CI builds and runs version and help, but it does not run Go tests.
Impact: Windows-specific filesystem, path, shell, and CGO behavior depends on limited coverage. This may be the right tradeoff, but it should be an explicit tier with a known owner and escape hatch.
Benchmarks and production-shaped repro tools are documented, but no workflow captures benchmark artifacts or runs labeled performance checks.
Impact: performance-sensitive changes rely on human/agent discipline rather than a repeatable CI path.
Examples:
docs/TESTING.md says the wrapper script is consistent with CI, but PR CI
uses direct go test.docs/LINTING.md says CI may not fail on known lint issues, while the
workflow uses the standard golangci-lint action without an explicit
non-failing issues exit code.Impact: stale docs undermine the cleanup effort because they hide the actual current contract.
Create a CI policy doc or convert this audit into one with explicit tiers:
| Tier | Purpose | Example trigger | Local command |
|---|---|---|---|
pr-core | Required fast PR signal | Every PR/main push | To be defined. |
pr-risk | Extra checks for risky paths | Go/storage/scripts/workflow changes | To be defined. |
nightly-full | Expensive broad sweep | Schedule/manual | To be defined. |
release-gate | Must pass before publishing | Tags/manual release | To be defined. |
package-gates | Python/npm/website path checks | Path-gated PRs and release | To be defined. |
perf-manual | Benchmark evidence | Manual or label | To be defined. |
Then make every status check name include its tier.
Add repository-owned wrapper scripts or Make targets for each tier, for example:
make test-pr-coremake test-pr-riskmake test-nocgomake test-integration-nightlymake test-release-gateDo not make workflow YAML be the only place where command policy lives.
Once the tier commands exist, change ci.yml to call those commands. Preserve
useful CI-only behavior such as gotestsum/JUnit output by wrapping around the
same underlying command rather than maintaining a separate test definition.
Add path-gated checks:
uv run ruff check, uv run mypy, uv run pytest.npm ci or equivalent package install, then npm run test:all.npm ci, npm run typecheck, npm run build, and possibly the
internal link check on PRs that touch website/**.Decide and document:
BEADS_TEST_SKIP=dolt, or whether
Docker-backed Dolt hangs are resolved enough to split and re-enable them.For every tier, capture enough artifacts to debug failures without rerunning:
docs/CI.md policy once maintainers agree on
the tier names.make test-pr-core that exactly reproduces the current Linux PR test
command, including -race, -short, and -skip '^TestEmbedded'.docs/TESTING.md after the wrapper commands exist so local guidance
points at the same contract CI runs.