Build and Testing

Building and testing the project depends on the components involved and the platform where development is taking place. Due to the amount of context required it's usually best not to try building or testing the project unless the user requests it. If you must build the project then inspect the Makefile in the project root and the Makefiles of any backends that are effected by changes you are making. In addition the workflows in .github/workflows can be used as a reference when it is unclear how to build or test a component. The primary Makefile contains targets for building inside or outside Docker, if the user has not previously specified a preference then ask which they would like to use.

Building a specified backend

Let's say the user wants to build a particular backend for a given platform. For example let's say they want to build coqui for ROCM/hipblas

The Makefile has targets like docker-build-coqui created with generate-docker-build-target at the time of writing. Recently added backends may require a new target.
At a minimum we need to set the BUILD_TYPE, BASE_IMAGE build-args
- Use .github/backend-matrix.yml as a reference — it's the data-only YAML that lists every backend variant's build-type, base-image, platforms, etc. (backend.yml and backend_pr.yml consume it via scripts/changed-backends.js).
- l4t and cublas also require the CUDA major and minor version.
- For llama-cpp / ik-llama-cpp / turboquant the matrix also sets builder-base-image pointing at a prebuilt quay.io/go-skynet/ci-cache:base-grpc-* tag. Local make backends/<name> defaults to BUILDER_TARGET=builder-fromsource and doesn't need it — the Dockerfile's from-source stage installs everything itself.
You can pretty print a command like DOCKER_MAKEFLAGS=-j$(nproc --ignore=1) BUILD_TYPE=hipblas BASE_IMAGE=rocm/dev-ubuntu-24.04:7.2.1 make docker-build-coqui
Unless the user specifies that they want you to run the command, then just print it because not all agent frontends handle long running jobs well and the output may overflow your context
The user may say they want to build AMD or ROCM instead of hipblas, or Intel instead of SYCL or NVIDIA insted of l4t or cublas. Ask for confirmation if there is ambiguity.
Sometimes the user may need extra parameters to be added to docker build (e.g. --platform for cross-platform builds or --progress to view the full logs), in which case you can generate the docker build command directly.

Test coverage gate

The core Go suites (./pkg, ./core, plus the in-process integration suite ./tests/e2e) are covered by a strict, monotonic coverage ratchet:

make test-coverage — runs the suites with covermode=atomic instrumentation and writes a merged profile to coverage/coverage.out. Uses the same prerequisites as make test.
- --coverpkg (COVERAGE_COVERPKG = core/...,pkg/...): coverage is attributed to the core+pkg packages, not just the package under test. This is what lets the in-process tests/e2e suite (which drives the real HTTP server over loopback via application.New) credit the core/http/endpoints/... handlers it exercises — folding it in roughly doubled endpoint coverage (e.g. endpoints/openai 13.6% → 52%). The denominator is therefore all of core+pkg (minus generated proto, dropped via COVERAGE_EXCLUDE_RE), so the number isn't comparable to a plain per-package figure.
- Integration suites (COVERAGE_E2E_ROOTS = ./tests/e2e) run non-recursively (excludes tests/e2e/distributed, which needs containers) with --label-filter=!real-models (those need a downloaded model) against the mock backend built by prepare-test. tests/integration is deliberately excluded — it needs make backends/local-store, which the coverage CI job doesn't build.
- Flake note: folding integration tests into a strict gate means a hard e2e failure (or a spec that silently stops running) can fail the coverage gate, not just the test. --flake-attempts absorbs transient retryable failures; covermode=atomic keeps line coverage deterministic otherwise.
- Why one ginkgo run per root (scripts/run-coverage.sh): passing several recursive roots to a single ginkgo invocation (e.g. ginkgo -r ./pkg ./core) only merges one root's coverprofile into --output-dir/--coverprofile — the others are silently dropped. Verified with ginkgo 2.29.0: -r ./pkg ./core yields only ./pkg coverage, while -r ./core alone yields all 34 core packages. So the script runs each root separately and concatenates the (disjoint) profiles. Don't "simplify" it back to a single multi-root invocation — that's how core/ (including all of core/http, ~7.4k statements) silently vanished from the number before.
- Build tags (COVERAGE_TAGS, passed via GINKGO_TAGS): defaults to debug auth. The auth tag is required to compile the real (sqlite-backed) auth implementation and its ~150 //go:build auth tests — without it those files aren't built, the tests don't run, and the gate scores auth against a stub (~3.7% instead of ~38%). If you add new tag-gated tests, extend COVERAGE_TAGS or they won't count (and likely won't run in CI at all).
make test-coverage-check — runs test-coverage, then scripts/coverage-check.sh fails the build if total coverage is below the committed baseline in coverage-baseline.txt. The Linux job in .github/workflows/test.yml runs this instead of make test.
make test-coverage-baseline — regenerates and overwrites coverage-baseline.txt from the current run.
make install-hooks — sets core.hooksPath to the versioned .githooks/, whose pre-commit runs checks scoped to what's staged: Go changes → make lint + make test-coverage-check; core/http/react-ui/ changes → make test-ui-coverage-check (Playwright e2e + UI coverage gate). A commit touching neither is skipped; bypass with git commit --no-verify. The hook resolves golangci-lint's new-from base to upstream/master → origin/master → master, so it works from a fork clone where origin/master is stale (passed to make lint via LINT_NEW_FROM).

React UI coverage

The React UI (core/http/react-ui/) has no component/unit tests — its only tests are the Playwright e2e specs in e2e/, which run against the real app served by tests/e2e-ui/ui-test-server (the dist is //go:embeded, so the server is rebuilt per coverage run). Those specs do genuinely exercise the UI (clicks, fill, setInputFiles, getByRole/getByText, visibility/value assertions).

make test-ui-coverage — builds an istanbul-instrumented bundle (COVERAGE=true, via vite-plugin-istanbul with forceBuildInstrument: true — the plugin skips production builds otherwise), re-embeds it into ui-test-server (the dist is //go:embeded), runs the Playwright specs, and writes an nyc report to core/http/react-ui/coverage/. The specs import { test, expect } from e2e/coverage-fixtures.js (re-exports Playwright's, plus harvests window.__coverage__ into .nyc_output/ after each test). Instrumentation is off unless COVERAGE=true, so dev/prod builds and plain make test-ui-e2e are unaffected (the fixture no-ops when window.__coverage__ is absent).
Browser: the flake dev shell ships chromium and exports PLAYWRIGHT_CHROMIUM_PATH; playwright.config.js uses it via launchOptions.executablePath, and the Makefile skips playwright install when it's set. This avoids Playwright's downloaded browser, which can't resolve system libs (libglib-2.0, …) on NixOS. In CI (no PLAYWRIGHT_CHROMIUM_PATH) the Makefile falls back to playwright install --with-deps chromium.
The app is a React SPA, so coverage accumulates across in-app navigation within a test; a full page.goto/reload resets it.
.nycrc.json uses all: true, so every src/** file is in the report, including 0%-coverage ones — that's how you spot features with no test at all (sort the HTML report or coverage-summary.json by line% ascending).
UI coverage gate: make test-ui-coverage-check runs the suite then scripts/ui-coverage-check.sh, failing if total line coverage drops more than UI_COVERAGE_TOLERANCE (default 1.0pp) below core/http/react-ui/coverage-baseline.txt. make test-ui-coverage-baseline regenerates the baseline. Why a tolerance (unlike the strict Go gate): UI e2e line coverage is non-deterministic — async/debounced paths (e.g. the VRAM estimate's 500ms debounce) make identical specs vary ~0.5pp run-to-run, so a zero-tolerance gate would flake. Keep the tolerance just above the observed jitter. Run in CI (tests-ui-e2e.yml) and pre-commit on core/http/react-ui/ changes.

Rules:

The gate is strict — there is no tolerance. Any decrease fails, regardless of how many lines a PR adds or deletes. covermode=atomic makes line coverage deterministic, so there's no run-to-run jitter to excuse.
When a change legitimately raises coverage, run make test-coverage-baseline and commit the updated coverage-baseline.txt so the ratchet moves up. Never lower the baseline by hand.
If you can't get coverage back to baseline, the fix is to add tests, not to edit the baseline.