docs/CI_CLEANUP_PLAN.md
Last reviewed: 2026-05-28
Freshness source: docs/CI_TEST_SURFACE_AUDIT.md, .github/workflows/*.yml,
.buildflags, .golangci.yml, package test manifests, and maintainer decision
review on 2026-05-28.
This document records the agreed target shape for CI cleanup. It is the policy
and roadmap layer; the current inventory remains in
CI_TEST_SURFACE_AUDIT.md.
main, manual dispatch, or
scheduled background jobs after measuring wall-clock cost.main success..test-skip part of CI. It is a local human optimization file.main.pull_request_target usage for package validation.| Tier | Trigger | Required | Platform | Purpose |
|---|---|---|---|---|
pr-core | Every PR and merge queue run | Yes | Linux | Fast baseline Go validation for the shipped default path. |
pr-policy | Every PR and merge queue run | Yes | Linux | Repository policy checks that should fail before expensive tests matter. |
pr-lint | Every PR and merge queue run | Yes | Linux | Required gofmt and golangci-lint gate. |
pr-risk-* | PRs matching risky paths or maintainer labels | Yes when applicable | Linux | Descriptive risk checks such as embedded, regression, Nix, packages, and release paths. |
main-* | Every push to main | Yes for branch health | Linux plus selected macOS/Windows | Detect after-merge issues from direct pushes and platform-specific behavior. |
measure-* | Manual dispatch | No | Per suite | Collect wall-clock and sharding data before promoting suites. |
nightly-* | Scheduled/manual | No, but failures require triage | Linux unless measured otherwise | Expensive background coverage not ready for every main push. |
release-* | Tags/manual release | Yes before publish | Per artifact | Re-run release-critical checks and publish only after package gates pass. |
merge_group means GitHub Merge Queue. Treat it like a PR event: run Linux
pr-core, pr-policy, pr-lint, and the same risk checks; do not add macOS or
Windows there.
Every PR, including docs-only PRs, should run the required Linux baseline:
pr-core, pr-policy, and pr-lint.
Shell scripts under scripts/ci/ are the source of truth. Make targets should
be aliases for discoverability, not a second implementation of command policy.
Wrapper rules:
.buildflags except for explicitly special modes such as no-CGO or
unsupported-install checks.scripts/ci/lib/timing.sh for new measured command blocks.gotestsum, JUnit, and artifacts in the
workflow layer unless the wrapper needs to own the behavior.pr-coreInitial wrapper behavior must preserve the current Linux PR command exactly:
source ./.buildflags
go test -race -short -skip '^TestEmbedded' ./...
CI may wrap that command with gotestsum for logs and JUnit, but the underlying
test contract must remain identical during the first migration.
Additional rules:
.buildflags so gms_pure_go remains the default shipped path.-race.-short initially only to avoid behavior drift..test-skip.pr-policypr-policy should be a separate wrapper from pr-core. It should include:
scripts/check-build-tags.sh.go install guidance: scripts/check-go-install-guidance.sh.scripts/check-versions.sh.make check-docs or the underlying doc flag script..beads/issues.jsonl changes.pr-lintpr-lint is required. It should stay separate from policy so lint failures are
easy to identify and rerun. It includes:
make fmt-check.golangci-lint run --timeout=5m --build-tags=gms_pure_go ./....Known false positives must be handled in .golangci.yml or with targeted
//nolint comments. CI should not use a tolerated failing lint baseline.
Use separate, descriptive jobs rather than one broad "extra tests" job:
pr-risk-embeddedpr-risk-regressionpr-risk-nixpr-risk-packagespr-risk-releaseUse the robust path-gated required-check pattern:
Embedded Dolt coverage is risk-gated on PRs and always runs on main. Add a
maintainer run-embedded label and a rare maintainer-only skip-embedded
override. Regression coverage follows the same pattern with run-regression
and skip-regression, while still running on every main push.
main should run as much as practical after measurement. Direct pushes to
main are allowed in this repository, so after-merge detection matters.
Initial main policy:
pr-core.main, not on PRs or merge queue.go test -tags gms_pure_go -v -race -short -skip '^TestEmbedded' ./....version, and help.main push.main push.Candidate promotions for every main push must be measured first. The working
wall-clock target is about 25 minutes total. Suites that exceed that target or
create too much queue pressure should stay manual/scheduled until sharded.
No-short integration is an intended every-main candidate, not nightly-only by policy; promote it after measurement if wall-clock data supports it.
Coverage collection should block on local coverage generation/test failures, not on upload service failures. Do not introduce a coverage threshold during the first promotion step.
The main branch may fail after merge as a cost tradeoff, but failures should
be fixed forward or reverted promptly.
Add a manual-dispatch workflow before changing tier breadth.
Measurement requirements:
scripts/ci/lib/timing.sh helper in new wrappers.$GITHUB_STEP_SUMMARY.gotestsum; replace current gotestsum@latest opportunistically when
touching nearby workflow code.gotestsum for Linux Go measurement outputs. Install it one-off in the
workflow rather than making wrappers depend on it.Measure at least:
pr-core, policy, and lint timing.BEADS_TEST_SKIP=dolt with -tags=integration,gms_pure_go.nix build.Package checks should be reusable from PR risk jobs, measurement jobs, main,
and release workflows.
Build a candidate bd once and put it on PATH. Test only the bd binary
name, not the beads alias.
Wrapper command sequence:
go build -tags gms_pure_go -o /tmp/bd-mcp-test ./cmd/bd
cd integrations/beads-mcp
uv sync --all-groups
uv run ruff check src/beads_mcp tests
uv run mypy src/beads_mcp
uv run pytest --durations=50
uv build
npm-package currently has no lockfile, so use npm install until a separate
packaging cleanup adds one. Build the native binary expected by
npm-package/bin/bd.js, and clean it up on exit by default.
Wrapper command sequence:
go build -tags gms_pure_go -o npm-package/bin/bd ./cmd/bd
cd npm-package
npm install
npm run test:all
npm pack --dry-run
The existing integration test already exercises a real npm pack; keep both
that real pack and the explicit dry-run file-list check.
Classify scripts/generate-llms-full.sh as a docs/website check, not generic
policy.
Wrapper command sequence:
cd website
npm ci
npm run typecheck
cd ..
./scripts/generate-llms-full.sh
cd website
npm run build
Keep the internal link check in Actions through the existing Lychee workflow step. External link checking remains non-blocking.
testing.Short() Cleanup-short currently does double duty: it suppresses true slow tests and also acts
as an implicit integration/e2e tier boundary. That is mandatory cleanup before
the tier policy is considered complete.
Plan:
testing.Short() use.testing.Short() only for runtime, stress, or large-fixture skips.docs/TESTING.md after wrapper commands exist.Release/tag workflows must independently re-run release-critical checks even
when main was green. Publishing should happen only after package-specific
checks pass.
scripts/ci/package-* wrappers in release jobs.dist/* produced by the same validated uv build.pull_request_target.scripts/ci/* wrappers and scripts/ci/lib/timing.sh; add Make aliases.gotestsum.testing.Short() audit and cleanup.main or scheduled jobs based on wall-clock data.pr.yml, pr-risk.yml, main.yml, release.yml, and nightly.yml.npm-package.