Back to Ponyc

Branch libs cache (tag-addressable scratch): `pr.yml` + tier2/weekly ↔ `pr_libs_cache.py` / `resolve_libs_cache.py --branch-cache` ↔ branch scripts ↔ branch prune ↔ warmer promote

.known-couplings/branch-libs-cache.md

0.66.09.2 KB
Original Source

Branch libs cache (tag-addressable scratch): pr.yml + tier2/weekly ↔ pr_libs_cache.py / resolve_libs_cache.py --branch-cache ↔ branch scripts ↔ branch prune ↔ warmer promote

A separate cache from the main one (see ghcr-libs-cache.md) — its own namespace ponyc-branch-libs-cache/<platform>-<arch> (tag-addressable: the package name is just the platform/arch and the version is the same hashFiles content hash, so it is exactly the main cache's name under a different namespace prefix — there is no -pr<N> component), its own push/pull script (.ci-scripts/libs-cache/branch_libs_cache.py), and its own retention (.ci-scripts/libs-cache/prune_branch_libs_cache.py). It exists so a build that changes an LLVM-determining input — a non-fork PR, or an ad-hoc workflow_dispatch of tier2/weekly on a branch — builds LLVM once and reuses it on later runs instead of cold-building every time, and so the warmer can promote that build into the main cache after merge (promote_libs_cache.py) instead of rebuilding. Because the name is the main name under a different prefix, the warmer constructs it by hand and finds a promotable artifact with one exists HEAD — no enumeration. The tag is the content key, so two builds that share a tag share identical content (free cross-PR/tier dedup); the -pr<N> partitioning that used to isolate PRs is gone because it did no correctness work. Two writers push it — PR jobs (pr_libs_cache.py) and the tier2/weekly consumers (resolve_libs_cache.py --branch-cache), both gated to non-fork by --branch-cache; the warmer reads it (to promote) but never pushes a branch package. The branch scripts do not import the main cache's scripts; the main ponyc-libs-cache is always the source of truth.

  1. The orchestration is a CI script, not a Make target. Makefile / make.ps1 are the developer-facing build files and know only how to build libs (make libs); they carry no branch-cache logic. .ci-scripts/libs-cache/pr_libs_cache.py owns the PR-job flow (and resolve_libs_cache.py's --branch-cache consumer mode does the same for the tier jobs); both just sequence the existing cache primitives plus the build command the workflow hands it after --. pr_libs_cache.py has two modes:

    • consumer mode (default): check main → (with --branch-cache) check the branch via pull (downloads the blob on a hit) → on miss run the build → (with --branch-cache) best-effort push the branch cache (a push failure logs a warning and degrades to "rebuild next run" rather than failing the job).
    • maybe-build mode (--ensure): the same sequence, but checks via the exists subcommand (no blob download — the job only needs to know whether to build, not the blob) and a branch push failure HARD-fails the job, so a registry write problem surfaces here instead of leaving each consumer to cold-build. --ensure requires --branch-cache.

    The exists subcommand lives in both oci_libs_cache.py and branch_libs_cache.py (manifest check only, no blob download; exit 0 present / 1 absent; any HTTP/network error routes through die → exit 1, so it fails safe to "build"). A main-cache hit short-circuits before any push — the warmer stays the only pusher to ponyc-libs-cache. (The other consumers' resolve_libs_cache.py calls are the main cache's own coupling (see ghcr-libs-cache.md); the branch cache extends to them via the stress workflows' and ponyc-tier3.yml's --require-cache-hit --branch-cache path, which pulls it read-only and never writes it.)

  2. One workflow drives it: pr.yml (the merged PR workflow; it replaced pr-ponyc.yml + pr-pony-compiler.yml + pr-tools.yml). The merge is what makes cross-workflow dedup possible: the three suites used to fire as three concurrent workflows, each cold-building the same shared LLVM platform on a cache miss, and needs: is intra-workflow only. Now, for each platform shared by ≥2 suites (ubuntu glibc, macOS, Windows), a maybe-build-<plat> job runs pr_libs_cache.py --ensure once; the consumer jobs needs: it and pull.

    • Consumer gating is !cancelled() && needs.changes.outputs.<suite> == 'true' && needs.maybe-build-<plat>.result != 'failure'. !cancelled() lets a consumer run when its maybe-build was skipped (the fork path — the consumer then pull-or-builds without --branch-cache); result != 'failure' skips the consumer when the shared build genuinely failed, so an LLVM build failure reports once, not 3×.
    • --branch-cache is gated to non-fork in the expression itself: a consumer's LIBS_BRANCH_CACHE env var is ${{ head.repo.full_name == github.repository && '--branch-cache' || '' }} (the flag for non-fork, empty for forks → no branch pull/push), and the run line references it unquoted$LIBS_BRANCH_CACHE, or pwsh $env:LIBS_BRANCH_CACHE on Windows — so an empty value disappears instead of passing as a stray empty argument. Quoting it would break forks (argparse would see an empty positional and exit 2); both bash and pwsh drop an empty unquoted variable — pwsh in its default Standard native-argument mode (PowerShell ≥ 7.3, which the Windows runners ship; a pre-7.3 Legacy runner would instead pass '' and break a fork's Windows job, loudly). The maybe-build jobs' if: already requires non-fork, so they pass --branch-cache literally. --branch-cache is a boolean: it is the only non-fork signal the script reads (it carries no PR number; the cache is tag-addressable). pr.yml carries permissions: packages: write for the non-fork branch push; a fork's pull_request token is read-only regardless. Never switch this to pull_request_target (it would hand fork code a write token). The ponyc Windows job's msys2/lldb install is extracted to .ci-scripts/windows-install-deps.ps1 (only that one job needs it — see the comment in the script).
  3. Per-suite triggering is a home-grown classifier, and the workflow-level paths: MUST be the union of the three suites' rules. The changes job runs gh api .../pulls/<n>/files --jq '.[].filename' | .ci-scripts/pr_changed_suites.py, which writes <suite>=true|false for ponyc/pony_compiler/tools to $GITHUB_OUTPUT; every downstream job gates on those outputs. Draft gating lives only on changes — when it's skipped the outputs are empty, so the == 'true' checks all fail and every job skips. The classifier's per-suite rules are transcribed (plain prefix/suffix/exact tests, no glob engine) from the original three paths: blocks. Invariant: any file that triggers a suite in pr_changed_suites.py must also match the top-of-pr.yml paths: filter, or the workflow never starts and that suite silently never runs. ponyc ⊆ tools, so the union is tools + pony_compiler + the workflow file; the load-bearing gotcha is a src/libponyc/*.md/*.yml change (it triggers pony_compiler, whose filter has no doc/yaml exclusion, but not tools), hence the src/libponyc/** and tools/lib/ponylang/pony_compiler/** re-includes in paths:. pr_changed_suites_test.py pins this case. The pulls/<n>/files API caps at 3000 files; a truncated listing runs all suites (fail-safe). Keep the classifier rules, the pr.yml paths: union, and the test in sync.

  4. Retention is age-based and deliberately different from the main cache's keep-N. prune-branch-libs-cache.yml (daily schedule + workflow_dispatch) runs prune_branch_libs_cache.py, deleting branch-cache artifacts older than 2 weeks and dropping a package once all its versions are stale (a platform that stopped receiving builds, e.g. a retired builder image — the per-platform packages are long-lived now, not per-PR). keep-N would never delete an idle package, hence a separate script. Same two-token split as the main retention (enumerate with PONYLANG_MAIN_READ_PACKAGE_TOKEN, delete with GITHUB_TOKEN); it enumerates and deletes only within ponyc-branch-libs-cache/, so it cannot touch the main cache. There is no clear/escape-hatch workflow for this cache — age-prune is the only reclaim.

  5. The warmer reads the branch cache (to promote on a main miss) but never pushes a branch package — promotion writes only the main cache, so the warmer stays the main cache's sole writer. And the main cache's prune_libs_cache.py filters on ponyc-libs-cache/, so it never sees branch packages — the two prunes can't cross. The libs-cache scripts are thin entry points over a shared support library: the registry-v2 client (request/auth/blobs/manifest/archive/copy/derive_platform/IMAGE_RE/repo_root) lives in registry.py, the GitHub-packages REST plumbing (gh_request/paginate/encode) in ghpackages.py, arch normalization in cache_arch.py, and die/info/the orchestrator helpers plus the shared entry-script path constants (MAIN_CACHE/BRANCH_CACHE/PROMOTE) in common.py. So oci_libs_cache.py/branch_libs_cache.py are just NAMESPACE + cache_package (which adds the arch via cache_arch.canonical) + a main calling registry.dispatch; promote_libs_cache.py resolves both namespaces and calls registry.copy. A builder-image-naming change is now a one-place edit in registry.py; only the cache_package shape is per-script.