docs/craft/sandbox/sandbox-podtemplate.md
The per-sandbox Pod spec is constructed field-by-field in Python
(KubernetesSandboxManager._create_sandbox_pod, ~230 lines). Everything in it
except a handful of per-pod values is static infrastructure config — container
images, ports, volumes, security contexts, init/sidecar containers, node
selector, tolerations, resource sizing, proxy CA wiring. This duplicates what
Helm already owns for the rest of the sandbox stack (namespace, RBAC, network
policy, egress proxy — all in deployment/helm/charts/onyx/templates/) and
means any change to the pod's shape requires a backend image rebuild + deploy
rather than a helm upgrade.
Goal: move the static shape of the sandbox Pod into a Helm-rendered
core/v1 PodTemplate, leaving Python to read it and overlay only the
genuinely dynamic per-pod fields.
core/v1 PodTemplate is the
typed, first-class K8s object designed exactly for "declare the shape at
deploy time, instantiate at runtime." Prefer it over a ConfigMap-carried YAML
(untyped, unvalidated).metadata.name (sandbox-{uuid[:8]})metadata.labels — LABEL_SANDBOX_ID, LABEL_TENANT_ID merged onto the
template's base labelssecretKeyRef.name env entries on the sandbox container, pointing
at the per-pod {pod}-opencode-auth Secretspec.hostAliases[0].ip — the proxy ClusterIP resolved at runtime by
_resolve_proxy_ip() (DNS to the proxy is blocked by the firewall, so this
can't be static)sandbox + sidecar), the sandbox-init init container (NET_ADMIN +
firewall-init.sh), workspace/managed emptyDirs, CA source/bundle
volumes, pod + container security contexts, nodeSelector,
tolerations, enableServiceLinks: false, probes, and the proxy env
constants from _proxy_main_container_env_vars() /
_proxy_init_container().ServiceTemplate object, and
_create_sandbox_service is ~30 lines (names + the Next.js port range). Do
NOT try to template it. Instead single-source the port range
(SANDBOX_NEXTJS_PORT_START/END) so the template's container/service ports
and the Python service can't drift.SANDBOX_POD_CPU_REQUEST
etc. (configs.py), injected from the Helm ConfigMap. These move into the
PodTemplate values and the env vars in configs.py are retired (or kept only
as the template's value source — pick one source of truth, prefer the
template).onyxdotapp/sandbox:${global.version}. SANDBOX_CONTAINER_IMAGE remains an
internal override, but the chart owns the normal Kubernetes image default.SANDBOX_NAMESPACE at provision time. Add a clear
error (and ideally a one-time startup check when ENABLE_CRAFT +
SANDBOX_BACKEND=kubernetes) so a missing/misnamed template fails loudly
rather than deep inside provision().ENABLE_CRAFT via the existing
onyx.craftEnabled helper in _helpers.tpl, matching the other sandbox
templates.Add templates/sandbox-podtemplate.yaml rendering a v1/PodTemplate
named e.g. sandbox-pod into SANDBOX_NAMESPACE. Its .template.spec is
the full static pod spec; .template.metadata.labels carries the static
labels (LABEL_K8S_COMPONENT, LABEL_K8S_MANAGED_BY). Drive all tunables
from a new sandboxPod: block in values.yaml.
Add the sandboxPod: values block exposing: image, imagePullPolicy,
the Next.js port range, the three resource sets (sandbox / sidecar / init),
nodeSelector, tolerations, and CA mount config. Wire CI/localdev
overlays (values-ci.yaml, values-localdev.yaml) — CI overrides the
resource requests for the 4-vCPU kind runner exactly as the env vars do
today.
Rewrite _create_sandbox_pod to:
read_namespaced_pod_template(name, SANDBOX_NAMESPACE) →
copy.deepcopy(tpl.template.spec) → overlay the four dynamic fields →
return V1Pod(metadata=..., spec=...). Delete the static construction.
Keep _proxy_init_container/_proxy_main_container_env_vars only if still
referenced; otherwise remove.
Single-source the port range between the template and
_create_sandbox_service (config constant feeding both). Leave the Service
construction in Python.
Add a startup/preflight check that the PodTemplate exists when
ENABLE_CRAFT and the K8s backend are active, raising a clear error
naming the expected template + namespace.
Retire the now-template-owned env vars from configs.py
(SANDBOX_POD_CPU_*, SANDBOX_POD_MEMORY_*, and the image default if fully
migrated), updating any other readers.
secretKeyRef names, and a resolved
hostAliases IP. This is the real coverage — it exercises template-read +
overlay end to end.helm template + lint with ENABLE_CRAFT=true across
default/CI/localdev values to confirm the PodTemplate renders and the
resource/port values flow through. Confirm it does NOT render when
ENABLE_CRAFT is false.read_namespaced_pod_template
to return a known template and assert the overlay sets exactly the four
dynamic fields and nothing else — guards the version-skew contract.Do not overtest — the kind integration test plus helm template lint is the core of the coverage.