docs/rfc/0012-oci-default-package-delivery.md
RFC-0001 built two OCI delivery paths — fetcher pull (Path A) and kubelet image volumes (Path B) — but left them opt-in for users who bring pre-built images, and left the tarball/storagesvc pipeline as the out-of-the-box default. This RFC makes OCI the default in three moves:
executor.enableOCIImageVolume defaults to true, so every eligible OCI package rides Path B wherever the cluster supports it (the runtime gate already degrades to Path A below k8s 1.33).packageRegistry chart block; when configured, every successful build publishes its deployment archive as a digest-pinned single-layer OCI image and the Package is rewritten to Archive{Type: oci} — fission fn create --code users get OCI delivery with zero per-package action.
Tarball output remains automatic when no registry is configured.Path B eliminates the executor→fetcher RPC, the per-pod full-archive download from storagesvc, and the unzip from the cold-start critical path, and gets node-level package caching for free from the kubelet image cache. The tarball path is not deprecated; it remains the fallback and the default for installs without a registry.
The poolmgr cold-start critical path today (fetcher path, verified against main):
getServiceForFunction RPC (pkg/router/resolver_executor.go:202).choosePod) and RPCs the pod's fetcher sidecar (pkg/executor/executortype/poolmgr/gp_specialize.go:125).pkg/fetcher/fetcher.go:377-392; the only cache is the pod's own shared volume).fetcher.go:413-424), then load via the env's /v2/specialize.Path B (shipped, opt-in) replaces steps 2–4 with a single load-only POST to the env (pkg/executor/executortype/poolmgr/oci_specialize.go:30-83): the code is already mounted read-only by the kubelet, digest-pinned, and cached on the node like any image layer.
N pods specializing the same function on one node cost one pull instead of N downloads.
Adoption is structurally limited, though: RFC-0001 deferred the producer ("BuildKit builder + --oci-push" — RFC-0001 line 53), so only users who bring pre-built images can use OCI at all.
The common path — fission fn create --code, buildermgr builds, updatePackage writes a storagesvc URL (pkg/buildermgr/common.go:166-186) — never produces an OCI artifact.
"OCI as default" without a producer would be marketing for a minority; this RFC's centerpiece is the producer.
packageRegistry.enabled stays false by default because Fission cannot conjure one.--deploy file.zip packages (different credential/trust model — follow-up; see Open questions).archivePruner has no registry analog; operator's retention policy applies — documented).Who pushes: the builder pod's fetcher, via an extended /upload mode.
buildermgr already delegates artifact handling to the builder-pod fetcher (fetcherC.Upload(ctx, uploadReq) at pkg/buildermgr/common.go:125-140); the fetcher has the built artifact on its shared volume and already links pkg/oci and the kauth keychain.
Wire change (pkg/fetcher/types.go):
ArchiveUploadRequest gains OCIPush *OCIPushSpec{Repository, Tag, PushSecretName, InsecureHosts}.ArchiveUploadResponse gains OCI *fv1.OCIArchive alongside the existing URL+Checksum fields.When OCIPush is set, the fetcher pushes the deployment directory (pre-zip — the upload handler archives only when ArchivePackage is true, common.go:123) via a new pkg/oci/push.go:
// PushDirectory publishes dir as a single-layer OCI image and returns its digest.
func PushDirectory(ctx context.Context, dir, repoTag string, kc authn.Keychain) (digest string, err error)
Artifact shape: a real OCI image manifest — empty.Image + mutate.AppendLayers with one tar+gzip layer of the directory contents at the image root, minimal config.
This is deliberately not an ORAS artifact manifest: kubelet image volumes and containerd's image store consume image manifests, and pkg/oci.ExtractImage's mutate.Extract flattens it — byte-compatible with both consumption paths by construction.
Package rewrite: updatePackage (common.go:166-186) is the single chokepoint that writes the built archive; it writes Archive{Type: ArchiveTypeOCI, OCI: &OCIArchive{Image, Digest, ImagePullSecrets: [packageRegistry.pullSecret]}} when the upload response carries OCI.
The digest is always set (returned by the push) — digest-pinned by default; the tag (<repositoryPrefix>/<namespace>/<pkg>:<short-digest>) is a human affordance only.
Existing rollout machinery (PackageRef.ResourceVersion propagation, fsCache invalidation) is untouched: the Package update is the trigger, exactly as today.
Push failure: packageRegistry.fallbackToStorage (default true) → log + fall back to the storagesvc tarball upload; the build still succeeds and a OCIPublishDegraded Package condition surfaces the degradation.
With false, the build fails (strict mode for shops requiring OCI provenance).
executor.enableOCIImageVolume flips to true in values.yaml (currently false at values.yaml:235).
The chart flips the default; the runtime gate does the degrading: ImageVolumeGate (pkg/executor/util/imagevolume.go:123-149) requires both the env flag and k8s ≥ 1.33 (detected once at startup, vendor-suffix tolerant), so a 1.32 cluster with the flag on simply runs Path A — correct and already shipped.
No Helm version templating; the flip is one values line, and false is the escape hatch (the old default).
Today getFunctionOCIArchive (pkg/executor/executortype/poolmgr/oci.go:55-87) excludes functions with Secrets/ConfigMaps from Path B because the fetcher materializes them onto the shared volume (pkg/fetcher/fetcher.go:439-550) and Path B pods have no fetcher.
Closure: a second Path B pool variant, B-fetcher, which is exactly newdeploy's shipped Path B pattern (pkg/executor/executortype/newdeploy/newdeploy.go:319-336):
RootStat(storePath) early-exit skips the pull.FetchSecretsAndCfgMaps and the load — zero new fetcher code.:8000/specialize) instead of loadOnlySpecialize.The variant is pod-spec-affecting, so it goes into ociPoolHash (extend the parts slice at oci.go:31 with a variant marker), preserving RFC-0001's invariant that the pool hash covers every pod-spec-affecting archive field.
Eligibility selection becomes: secrets/configmaps present → B-fetcher; absent → B-direct (today's fetcherless variant).
NetworkPolicy: executor→8000 ingress is already admitted.
Rejected alternatives: native projected volumes (structurally impossible — pool pods exist before function identity, and per-image pools are per-package while secrets are per-function); secrets relayed via the specialize payload (leaks secret material through the executor and the HTTP path, size limits — rejected on posture).
With the producer on, every built package becomes its own pool: N packages × poolsize warm pods, today never shrinking (only specialized pods are reaped; empty per-image pool deployments live forever).
Spec:
/, poolKey at oci.go:39-47) records lastActive, touched at creation and on every specialization routed to it.POOL_OCI_IMAGE_HASH pod label) AND now − lastActive > executor.ociPoolIdleReapTime (new value, default 5m, distinct from the pod idle timeout).imageHash == "") are never reaped — exact parity with today.GetFuncSvc recreates cleanly (on-demand pool creation is already the cold-start path).| Exclusion | Why structural |
|---|---|
| Env v1 | Speaks only the v1 specialize contract with store path user; loadOnlySpecialize posts /v2/specialize. Legacy and frozen. |
AllowedFunctionsPerContainerInfinite | Store path is per-function UID; one shared image mount cannot serve it without per-function pools. These envs already amortize cold starts by design. |
KeepArchive envs (new) | JVM-style envs expect LoadReq.FilePath to be an archive file; OCI artifacts are directory-shaped and image-volume subPaths must be directories (RFC-0001 amendment). Excluded on both producer (tarball output) and consumer sides. |
All three keep working via Path A / tarball, documented as permanent, not TODOs.
| Package archive | Cluster | Function | Delivery |
|---|---|---|---|
| url/literal (tarball) | any | any | fetcher download (unchanged) |
| oci | <1.33 or flag off | any | Path A: fetcher OCI pull |
| oci | ≥1.33, flag on | no secrets/cfgmaps, env v2, non-Infinite, non-KeepArchive | Path B-direct (no fetcher) |
| oci | ≥1.33, flag on | secrets/cfgmaps present | Path B-fetcher (sidecar for secrets only) |
| oci | ≥1.33, flag on | env v1 / Infinite / KeepArchive | Path A |
packageRegistry:
enabled: false # producer master switch (no registry = tarball, today's behavior)
repositoryPrefix: "" # e.g. ghcr.io/myorg/fission-packages; repo layout <prefix>/<namespace>/<pkg>
pushSecret: "" # dockerconfigjson secret in the builder namespace (write creds)
pullSecret: "" # read-only secret stamped into Archive.OCI.ImagePullSecrets
insecureHosts: "" # comma list, reuses FETCHER_ALLOW_INSECURE_REGISTRIES semantics
fallbackToStorage: true # push failure -> tarball + OCIPublishDegraded condition; false = fail the build
executor:
enableOCIImageVolume: true # FLIPPED (runtime-gated on k8s >= 1.33)
ociPoolIdleReapTime: 5m # new: per-image pool idle reap window
CLI surface: no new flags for the default path — the producer is a server-side decision, which is the feature.
fission pkg info/getdeploy display the image reference + digest; an escape-hatch annotation fission.io/package-delivery: tarball opts a single package out of the producer.
Push and pull credentials are deliberately distinct (least privilege).
pkg/oci/extract.go:72-80) both verify it — tag mutation in the registry cannot change what runs.<prefix>/<namespace>/<pkg> layout for multi-tenant separation.pkg/oci.Keychain (kauth); fetcher RBAC already includes secrets: get.| Step | Today (tarball) | Path A (OCI pull) | Path B-direct | Path B-fetcher |
|---|---|---|---|---|
| Router → executor RPC | ✓ | ✓ | ✓ | ✓ |
| choosePod | ✓ | ✓ | ✓ | ✓ |
| Executor → fetcher RPC | ✓ | ✓ | — | ✓ (localhost, no download) |
| Package transfer | full download per pod | registry pull per pod | — (kubelet-cached mount) | — |
| Unzip | ✓ | — (extracted on pull) | — | — |
| Secrets materialization | fetcher | fetcher | n/a (ineligible) | fetcher (k8s API gets only) |
Env load /v2/specialize | ✓ | ✓ | ✓ | ✓ |
What moves off the critical path under Path B: the pull happens at pool creation (amortized by the kubelet cache and shared across all functions of the package), not at specialization. The producer adds push time to build wall-clock (not cold start); measured and published as part of Gate B. First cold start after a pool reap pays pool creation with a warm node cache.
| Failure | Path | Behavior |
|---|---|---|
| Registry down at build | producer | push fails → tarball fallback + OCIPublishDegraded condition (default), or build fails (strict) |
| Registry down at cold start | A | fetcher pull fails → specialize error → executor error/retry — parity with storagesvc-down today |
| Registry down at cold start | B, pool exists | unaffected — image already mounted in warm pods (strictly better than today) |
| Registry down at cold start | B, new pool | kubelet ImagePullBackOff → pool ready-wait timeout → cold start errors; kubelet retries, self-heals on recovery; no automatic tarball fallback (post-producer there may be no tarball) — documented |
| Digest mismatch | A / B | fetcher rejects / kubelet refuses the pinned ref — fail closed |
| Image deleted from registry | both | cold starts fail until rebuild; registry retention is the operator's job (explicit non-goal) |
| Artifact > 2 GiB cap | A | fetcher rejects; the producer warns at build time when the artifact exceeds the cap (Path B unaffected — kubelet has no such cap) |
| Executor restart with per-image pools | B | adoption shipped in RFC-0001 (instanceID annotation patch, adoptPerImagePoolDeployments) |
| Flag flipped off with live per-image pools | B | pool reconciler cleans up; functions fall back to Path A on next cold start |
| Push-secret rotation | producer | next build picks up the new secret (resolved per build); stale creds → push failure path above |
enableOCIImageVolume: false restores the old default.fission pkg rebuild after configuring packageRegistry (the rebuild re-pushes and rewrites the archive).Phase 1 — per-image idle pool reaper.
Touch: poolmgr/gpm.go (pool map bookkeeping), idle strategy / pool reconciler, oci.go; chart executor.ociPoolIdleReapTime.
Tests: reap-eligibility table (active pods present / fresh lastActive / generic pool never reaped); race with concurrent GetFuncSvc recreate; integration: Path B pool disappears after idle window and recreates on next request.
Gate: cold-start benchmark unchanged for non-OCI; recreate-after-reap cold start measured and published.
Phase 2 — B-fetcher variant (secrets/configmaps closure).
Touch: oci.go (variant selection + hash bit), gp_deployment.go (fetcher-retained per-image pod spec — copy newdeploy's pattern), gp.go (route specialize via fetcher for B-fetcher pools).
Tests: hash includes variant (no pool aliasing); pod-spec invariants for both variants; integration TestOCIPathBSecrets (secret-using function rides Path B with the secret materialized).
Gate: B-fetcher cold start within a small bound of B-direct.
Phase 3 — consumer default flip.
Touch: values.yaml only (gate logic already shipped).
Tests: full integration suite green with the new default on both CI legs (the 1.32-floor leg proves auto-degrade; the 1.36 leg proves Path B); upgrade test (existing generic pools unaffected; pre-existing OCI functions migrate pools on next cold start).
Gate A: published off-vs-on cold-start p50/p95 delta for an OCI package; zero regression for non-OCI.
Phase 4 — producer.
Touch: pkg/oci/push.go (new), pkg/fetcher/types.go + Upload handler (push mode + fallback), pkg/buildermgr/ (registry config plumbing, updatePackage OCI write, OCIPublishDegraded condition), chart packageRegistry block, KeepArchive producer exclusion.
Tests: PushDirectory against an in-memory registry (digest round-trip, layer content == dir); Upload-handler mode table; updatePackage OCI + fallback writes; integration: source build with registry configured ⇒ digest-pinned Spec.Deployment.OCI ⇒ function serves via Path B; registry-down build ⇒ tarball fallback + condition; pkg rebuild migrates a tarball package.
Reuses the existing CI localhost-NodePort test registry.
Gates B + C: E2E built-function cold start ≈ pre-built Path B and push overhead published; many-package steady-state warm-pod count bounded by the reaper.
Phase 5 — docs, UX, migration.
Quickstart with registry config as the recommended install; migration guide; spec save/apply round-trip for OCI packages; benchmark publication in rfc/perf-results; escape-hatch annotation.
test/benchmark/tests/cold-start (p50/p95 over 30 sequential cold starts) drives Gates A and B; the scale pattern from test/benchmark/tests/scale-index inspires the Gate C many-package scenario.enableOCIImageVolume=false pinned to cover Path A until the floor reaches 1.33./v1/archive consumers expect download URLs — storagesvc isn't even in the build upload path today.fallbackToStorage provides on demand.<prefix>/<namespace>/<pkg> proposed).--deploy packages (client credential model) — follow-up RFC or phase 6.OCIPublishDegraded also be surfaced as a Prometheus metric on buildermgr?Phases 1–4 landed together (per-phase commits on one PR). Deltas from the proposal, driven by implementation and the pre-merge review:
packageRegistry.publishedPrefix (new knob): decouples the push endpoint from the recorded consumption reference — the registry split-brain (risk 1) made concrete by CI itself, where builds push via cluster DNS but the kubelet pulls the NodePort name. The digest pins identity across both names.oci.NoServiceAccount): the builder pod runs as fission-builder, which has no RBAC on the fission-fetcher SA the pull keychain resolves — and SA imagePullSecrets are a pull concept; pushes authenticate via the explicit push secret only.fission_buildermgr_oci_publish_total{result=published|degraded} + a buildermgr error log on degradation (the per-package condition alone left a fleet-wide registry outage invisible); fission_executor_oci_pool_reap_failures_total keeps the Gate C reap counter honest; PACKAGE_REGISTRY_ENABLED parse failures hard-fail startup like the missing prefix.packageRegistry.insecureHosts into the function-pod fetcher allowlist so Path A consumption of produced packages works on <1.33 clusters with insecure registries.TestOCIProducerBuild asserting the full chain; the v1.34 leg pins image volumes off; v1.32 proves the runtime auto-degrade./userfunc at producer scale: opt-in users self-selected; the producer routes every built function through a read-only mount (__pycache__, JVM scratch, envs that write beside code).
Mitigation: per-supported-environment compatibility matrix early in Phase 4 drives the exclusion lists (this is where KeepArchive came from); exclusions are producer-side too, so incompatible envs simply keep tarballs.poolsize=1 (spec'd as the fallback lever).