docs/craft/features/approvals/phase-5-docker.md
Reference: approvals-plan.md for architecture.
Depends on Phase 1 (SandboxIPLookup, CAStore, firewall-init.sh
bootstrap-mode switch) and Phase 2 (gate addon, approval data layer,
decision API, chat-stream announce path).
Bring the existing K8s sandbox-egress proxy + action-approval gate to
self-hosted docker-compose deployments (SANDBOX_BACKEND=docker).
The proxy core (backend/onyx/sandbox_proxy/), gate addon, action
matcher, approval data layer, decision API, chat-stream announce path,
and frontend wiring are all unchanged. This phase is exclusively
the docker-compose infrastructure delta:
server.py that selects docker stores instead
of K8s ones when SANDBOX_BACKEND=docker.CAStore over a shared compose-named volume.SandboxIPLookup watching sandbox containers.DockerSandboxManager changes to install the firewall init, mount
the CA bundle, set HTTPS_PROXY + SDK CA env vars, and register the
opencode session-tag plugin (currently K8s-only).firewall-init.sh's entrypoint mode
so the agent process does not retain CAP_NET_ADMIN after init.sandbox-proxy service in docker-compose.craft.yml.Local sandbox backend (SANDBOX_BACKEND=local) is out of scope and
already removed in configs.py::_parse_sandbox_backend.
After this phase ships, a docker-compose deployment with --include-craft
gets the same approval gating behavior as a Helm-deployed K8s cluster:
sandbox-proxy:8080.Proxy-Authorization
tag the opencode plugin emits.action_approval rows, and forward or
reject on the same APPROVED/REJECTED/EXPIRED logic.Reused unchanged. The bulk of Phases 1 and 2 has nothing K8s-specific in it:
backend/onyx/sandbox_proxy/addons/gate.pybackend/onyx/sandbox_proxy/identity.py (the Protocol + resolver)backend/onyx/sandbox_proxy/ca.py (the Protocol + bootstrap)backend/onyx/sandbox_proxy/approval_cache.pybackend/onyx/sandbox_proxy/action_matcher.pybackend/onyx/sandbox_proxy/snapshot_egress.pybackend/onyx/server/features/build/db/action_approval.pybackend/onyx/server/features/build/approvals/api.pybackend/onyx/server/features/build/session/manager.py::_merge_acp_with_announcesbackend/onyx/server/features/build/sandbox/image/firewall-init.sh
(the entrypoint mode is already plumbed)backend/onyx/server/features/build/sandbox/image/opencode-plugins/session-proxy-tag.tsweb/src/app/craft/hooks/useBuildStreaming.ts,
parsePacket.ts, packetTypes.ts, SWR keys.New. Strictly the backend implementations of the Phase 1 interfaces plus the docker-side sandbox provisioning changes:
backend/onyx/sandbox_proxy/
+-- ca_docker.py # FileCAStore over a named volume
+-- identity_docker.py # DockerEventsLookup over docker events
+-- backend.py # SANDBOX_BACKEND-driven dispatch helpers
backend/onyx/server/features/build/sandbox/docker/
+-- docker_sandbox_manager.py # MODIFIED: proxy plumbing, security ctx, command
backend/onyx/server/features/build/sandbox/image/
+-- firewall-init.sh # MODIFIED: drop CAP_NET_ADMIN before exec
+-- Dockerfile # MODIFIED: install libcap2-bin
deployment/docker_compose/
+-- docker-compose.craft.yml # MODIFIED: add sandbox-proxy service + ca volume
backend/onyx/sandbox_proxy/server.py # MODIFIED: backend dispatch
backend/onyx/server/features/build/configs.py # MODIFIED: SANDBOX_PROXY_CA_VOLUME_PATH
backend/onyx/server/features/build/sandbox/util/opencode_config.py # MODIFIED: plugins= on single-provider config
No new DB tables. No new API endpoints. No new constants
in approval_cache.py. The whole feature surface is steady.
CAStore Protocol (ca.py) specifies load and persist, with
persist raising CAStoreConflictError on a lost cold-start race.
K8sSecretCAStore realises this via conditional create on a Secret;
the docker impl realises it via O_EXCL on a file.SandboxIPLookup Protocol (identity.py) specifies start,
lookup, wait_for_initial_sync, is_synced, stop. The K8s impl
is informer-backed; the docker impl is DockerClient.events()-backed.firewall-init.sh already dispatches on SANDBOX_PROXY_BOOTSTRAP_MODE
with initcontainer and entrypoint modes; only entrypoint runs
in compose. The script ends in exec gosu 1000:1000 "$@" today --
Task T5.5 changes that to bound capabilities._proxy_main_container_env_vars()
(HTTPS_PROXY + SDK CA env vars) and _proxy_init_container()
(the firewall init) gated on SANDBOX_PROXY_HOST. The docker
manager does neither today.build_container_create_kwargs in docker_sandbox_manager.py
defines a fixed env allowlist that is enforced by
tests/unit/onyx/server/features/build/sandbox/test_docker_manager_config.py.
Widening it (necessary in this phase) requires updating that test
alongside the code.sandbox_proxy/server.pyToday server.py imports K8sSecretCAStore and K8sInformerLookup
directly at module level. Replace those with dispatch helpers in a new
module backend/onyx/sandbox_proxy/backend.py:
from onyx.sandbox_proxy.ca import CAStore
from onyx.sandbox_proxy.identity import SandboxIPLookup
from onyx.server.features.build.configs import SANDBOX_BACKEND
from onyx.server.features.build.configs import SandboxBackend
def build_ca_store() -> CAStore:
if SANDBOX_BACKEND is SandboxBackend.KUBERNETES:
from onyx.sandbox_proxy.ca_k8s import K8sSecretCAStore
return K8sSecretCAStore()
if SANDBOX_BACKEND is SandboxBackend.DOCKER:
from onyx.sandbox_proxy.ca_docker import FileCAStore
return FileCAStore()
raise RuntimeError(f"unsupported SANDBOX_BACKEND={SANDBOX_BACKEND!r}")
def build_ip_lookup() -> SandboxIPLookup:
if SANDBOX_BACKEND is SandboxBackend.KUBERNETES:
from onyx.sandbox_proxy.identity_k8s import K8sInformerLookup
return K8sInformerLookup()
if SANDBOX_BACKEND is SandboxBackend.DOCKER:
from onyx.sandbox_proxy.identity_docker import DockerEventsLookup
return DockerEventsLookup()
raise RuntimeError(f"unsupported SANDBOX_BACKEND={SANDBOX_BACKEND!r}")
server.py::main calls build_ca_store() and build_ip_lookup()
instead of the direct constructors. _bootstrap_ca and _build_lookup
collapse into the dispatch calls. The signal handler, healthz server,
DumpMaster setup, drain logic, and identity factory are unchanged.
Lazy imports keep the K8s kubernetes client out of the docker
process's import graph and vice-versa (kubernetes is large and the
SDK opens config files at import time on some paths).
CAStore for compose (ca_docker.py)A shared named compose volume is the "source of truth" analogue of the
K8s Secret. Both the proxy and every sandbox container mount it; the
proxy at read-write so it can persist on cold start, sandboxes at
read-only so the firewall-init.sh can read ca.crt and install it
into the trust store.
class FileCAStore(CAStore):
"""File-backed CA persistence over a shared compose volume.
Layout on disk:
$SANDBOX_PROXY_CA_VOLUME_PATH/
ca.crt # public cert; mounted into sandboxes
ca.key # private key; readable only by the proxy
Cold-start race: `O_EXCL` create on `ca.crt` ensures exactly one
writer wins. The loser sees EEXIST and re-loads -- the same
semantics K8s gets from `409 Conflict` on conditional Secret
create.
"""
_CA_CERT_FILENAME = "ca.crt"
_CA_KEY_FILENAME = "ca.key"
def __init__(self, root: str | Path = SANDBOX_PROXY_CA_VOLUME_PATH) -> None: ...
def load(self) -> tuple[bytes, bytes] | None: ...
def persist(self, cert_pem: bytes, key_pem: bytes) -> None:
"""Atomic write: O_EXCL on cert first; on success write key next
to it with 0o600. On EEXIST raise CAStoreConflictError so
CABootstrap re-load()s the winner's CA."""
Implementation specifics:
ca.crt, restart proxy.ca.key is mode 0o600; ca.crt is mode 0o644.load() so a
deleted ConfigMap self-heals. The docker store has no analogue --
the file IS the bundle, sandboxes mount the same volume directly.Config:
# configs.py
SANDBOX_PROXY_CA_VOLUME_PATH = os.environ.get(
"SANDBOX_PROXY_CA_VOLUME_PATH", "/var/lib/sandbox-proxy/ca"
)
SandboxIPLookup (identity_docker.py)The K8s informer maintains {pod_ip: SandboxIdentity} by watching pods
with the sandbox label selector. The docker analogue maintains
{container_ip: SandboxIdentity} by:
client.containers.list(filters={"label": f"{LABEL_COMPONENT}={LABEL_COMPONENT_VALUE}"}) -- inspect each,
pull labels (onyx.app/sandbox-id, onyx.app/tenant-id), pull IP
from NetworkSettings.Networks[<network>].IPAddress. The network
name is SANDBOX_DOCKER_NETWORK.client.events(filters={"type": "container", "label": f"{LABEL_COMPONENT}={LABEL_COMPONENT_VALUE}"}) -- on
start events, inspect and upsert; on die/destroy, evict by
container id.class DockerEventsLookup(SandboxIPLookup):
def __init__(
self,
docker_client: DockerClient | None = None,
network: str = SANDBOX_DOCKER_NETWORK,
) -> None: ...
def start(self) -> None: ...
def lookup(self, src_ip: str) -> SandboxIdentity | None: ...
def wait_for_initial_sync(self, timeout_seconds: float) -> bool: ...
def is_synced(self) -> bool: ...
def stop(self) -> None: ...
Implementation specifics:
LABEL_COMPONENT=sandbox-proxy (distinct from
LABEL_COMPONENT_VALUE=craft-sandbox), so the label filter already
handles this -- belt and braces, double-check via assertion in
_identity_from_container.docker.errors.APIError,
requests.exceptions.ConnectionError, generic OSError; backoff
starts at 1s, caps at 30s. _synced clears on disconnect so
/healthz flips to 503, matching K8s semantics.docker_sandbox_manager.py already documents at the top of the
module. No new trust elevation in this phase.tenant_id is read from the container label (set by
build_container_create_kwargs), not from any DB lookup -- identical
to the K8s path.DockerSandboxManager proxy plumbingThe K8s manager has three pieces of proxy plumbing the docker manager needs:
_proxy_main_container_env_vars() equivalent. Inject
HTTPS_PROXY/HTTP_PROXY (lowercase + uppercase), NO_PROXY, and
the SDK CA env vars: NODE_EXTRA_CA_CERTS, REQUESTS_CA_BUNDLE,
SSL_CERT_FILE, AWS_CA_BUNDLE, CURL_CA_BUNDLE, GIT_SSL_CAINFO.
SANDBOX_PROXY_HOST is the compose service name sandbox-proxy
(Docker's embedded DNS resolves it -- no /etc/hosts injection
needed, unlike K8s).command=["/workspace/entrypoint.sh"]. With the proxy enabled this
becomes command=["/workspace/firewall-init.sh", "/workspace/entrypoint.sh"] with
SANDBOX_PROXY_BOOTSTRAP_MODE=entrypoint. The script execs the
second argument after setting up iptables + CA.firewall-init.sh reads from (/sandbox-ca/ca.crt by default,
overridable via SANDBOX_PROXY_CA_BUNDLE_SRC). The materialised
bundle output (/etc/ssl/sandbox/ca-bundle.crt by default) lives in
the container's writable layer -- no shared volume needed, only one
process reads it.The big change is the env allowlist. Today
build_container_create_kwargs enforces a four-key allowlist that
test_docker_manager_config.py locks down. It must be widened to:
env = {
"ONYX_PAT": onyx_pat,
"ONYX_SERVER_URL": api_server_url,
OPENCODE_SERVER_PASSWORD: opencode_password,
"OPENCODE_CONFIG_CONTENT": opencode_config_json,
}
if sandbox_proxy_host:
env |= _proxy_env_vars(sandbox_proxy_host, sandbox_proxy_port,
api_server_url)
_proxy_env_vars lives in docker_sandbox_manager.py next to the
existing security-invariant code (it does not belong in the K8s
manager). The proxy-disabled posture (no SANDBOX_PROXY_HOST) keeps
the original 4-key allowlist intact for dev / tests that run without
the proxy.
test_docker_manager_config.py gets two new cases: with proxy
configured, the resulting env contains the expected proxy + CA keys
and nothing else; without proxy configured, the env is exactly the
four legacy keys.
Security context changes:
cap_add=["NET_ADMIN", "SETPCAP", "SETUID", "SETGID"] only when
proxy is enabled. NET_ADMIN runs iptables; SETPCAP authorises
the PR_CAPBSET_DROP syscall that setpriv --bounding-set=-all
uses to clear the bounding set; SETUID/SETGID gate setpriv's
--reuid / --regid / --init-groups calls (under cap_drop=ALL
even root needs them). All four are dropped from the bounding set
by setpriv before the agent execve, so the running container ends
up with no caps at all. With proxy disabled, cap_drop=ALL stays
in effect with no additions.
Originally specced as just NET_ADMIN + SETPCAP; the smoke pass
discovered that capsh/setpriv's user-switch needs SETUID + SETGID
in the effective set even when invoked as UID 0, because
cap_drop=ALL strips them from the inherited set Docker would
otherwise grant. The bounding set still drops to empty before
agent exec.
user is dropped from the create kwargs when proxy is enabled. The
container starts as root (uid 0) so firewall-init.sh can run
iptables; the script's final exec drops to UID 1000 (see T5.5 for
the capability-bounding wrinkle). With proxy disabled, the legacy
user="1000:1000" stays.
The proxy-enabled / proxy-disabled split is gated on a single
SANDBOX_PROXY_HOST truthiness check, mirroring the K8s manager.
A dev who explicitly unsets SANDBOX_PROXY_HOST gets the pre-gate
posture, which matters for the existing test surface that runs without
the proxy stack up.
firewall-init.sh (entrypoint mode)Today the script ends in:
exec gosu 1000:1000 "$@"
gosu calls setuid(1000) then execve(). The kernel's execve
transition rules with no file capabilities (fileP=0, fileI=0)
compute the new process's Permitted set as zero, so in practice the
agent process runs with no caps in Effective/Permitted. But the
Bounding set still contains CAP_NET_ADMIN. This is a ceiling, not
an active grant -- the agent cannot run iptables -- but it represents
a weaker security argument than the K8s init-container model (where
NET_ADMIN is granted only to the init container's security context,
not to the running sandbox container at all).
Change the script's tail to explicitly clear the Bounding set before
exec, using setpriv from util-linux:
exec setpriv --reuid=1000 --regid=1000 --init-groups \
--bounding-set=-all -- "$@"
(util-linux ships in the node:20-slim base, so no Dockerfile
dependency change is needed.) setpriv drops every capability from
the bounding set, switches UID/GID to 1000:1000 with the right
supplementary groups, then execve's the target. The subsequent
execve has no file capabilities, so the agent process ends up with
zero caps in any set. After this:
NET_ADMIN even
if a setuid-NET_ADMIN binary somehow ended up in its filesystem.The K8s init path is unaffected: initcontainer mode exits before
the privilege-drop section.
Why setpriv, not capsh. Originally specced as capsh --drop=all --user=sandbox -- "$@". Smoke discovered that capsh -- args
actually invokes /bin/bash and treats the rest as
script script-args -- which works for the prod case (the entrypoint
IS a script) but silently breaks for any binary target and made the
local-dev smoke fail with "cannot execute binary file" on
non-script stand-ins. setpriv's -- directly execve's the
target with no shell wrapper. Same security posture, cleaner
semantics, no extra package dependency.
server.py originally hard-coded /var/run/sandbox-proxy/mitmproxy-confdir.
Make it env-tunable so local-dev runs (proxy under the user's venv,
no root) can point at /tmp:
_MITM_CONFDIR = os.environ.get(
"SANDBOX_PROXY_MITM_CONFDIR",
"/var/run/sandbox-proxy/mitmproxy-confdir",
)
_bootstrap_ca passes pem_path=f"{_MITM_CONFDIR}/mitmproxy-ca.pem"
explicitly to CABootstrap so the CA-bootstrap path tracks the
confdir override. The default is unchanged for prod (K8s pods run
the proxy as root with the tmpfs-mounted /var/run location).
The K8s path uses build_multi_provider_opencode_config(..., plugins= [_OPENCODE_SESSION_TAG_PLUGIN_PATH]) to register the session-tag
plugin. The docker path uses build_opencode_config(...) which does
not currently accept a plugins argument. Extend the single-provider
builder to accept plugins: list[str] | None and emit the same
plugin field opencode expects.
The docker manager mirrors the K8s gating ("only register when proxy is deployed; otherwise it would no-op"):
session_tag_plugins = (
[_OPENCODE_SESSION_TAG_PLUGIN_PATH] if sandbox_proxy_host else None
)
opencode_config_json = json.dumps(
build_opencode_config(
provider=...,
plugins=session_tag_plugins,
)
)
_OPENCODE_SESSION_TAG_PLUGIN_PATH is a module-level constant in
docker_sandbox_manager.py matching the path the K8s manager uses
(/workspace/opencode-plugins/session-proxy-tag.ts). The plugin file
is already baked into the sandbox image and shared with K8s -- the path
exists today on the docker sandbox image too.
tests/unit/onyx/server/features/build/sandbox/test_opencode_config.py
gets a case verifying the docker single-provider builder emits the
plugin path when supplied.
sandbox-proxy compose serviceAdd to deployment/docker_compose/docker-compose.craft.yml:
sandbox-proxy:
image: ${ONYX_BACKEND_IMAGE:-onyxdotapp/backend:latest}
command: ["python", "-m", "onyx.sandbox_proxy.server"]
environment:
- SANDBOX_BACKEND=docker
- SANDBOX_PROXY_CA_VOLUME_PATH=/var/lib/sandbox-proxy/ca
- SANDBOX_DOCKER_NETWORK=${SANDBOX_DOCKER_NETWORK:-onyx_craft_sandbox}
# DB + Redis credentials (mirrors api_server's env)
- POSTGRES_HOST=relational_db
- POSTGRES_USER=${POSTGRES_USER:-postgres}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- POSTGRES_DB=${POSTGRES_DB:-postgres}
- REDIS_HOST=cache
# Listen port matches default; healthz on 8081
volumes:
- sandbox_proxy_ca:/var/lib/sandbox-proxy/ca
- ${SANDBOX_DOCKER_SOCKET:-/var/run/docker.sock}:/var/run/docker.sock:ro
networks:
- default # reach relational_db, cache
- onyx_craft_sandbox # be reachable from sandboxes by name
restart: unless-stopped
healthcheck:
test: ["CMD", "wget", "-q", "-O-", "http://localhost:8081/healthz"]
interval: 10s
timeout: 3s
retries: 3
api_server:
environment:
- SANDBOX_PROXY_HOST=sandbox-proxy
- SANDBOX_PROXY_PORT=8080
# (existing env unchanged)
background:
environment:
- SANDBOX_PROXY_HOST=sandbox-proxy
- SANDBOX_PROXY_PORT=8080
volumes:
sandbox_proxy_ca:
Notes:
default (to reach postgres + redis by compose-DNS)
and onyx_craft_sandbox (to be reachable from sandboxes by the name
sandbox-proxy). Docker DNS resolves the service name to whichever
interface IP is on the network the resolver runs on.api_server already crosses (mounted RW) for docker exec-driven
sandbox control.replicas: 2. Compose deployments are single-host; horizontal
scaling of the proxy is a multi-host story we are not signing up for
in this phase. The risk is documented in approvals-plan.md already.install.sh --include-craft already creates the
onyx_craft_sandbox network and treats it as external: true. No
install-flow change needed./healthz which the proxy already serves
(Phase 1). Compose's healthcheck flips the container's health status
to unhealthy on consecutive failures. Note: restart: unless-stopped does NOT auto-recover unhealthy containers --
Docker only restarts on process exit, not on healthcheck failure
(Swarm's deploy.restart_policy and third-party watchdogs like
autoheal are the mechanisms that do). The _run loop's
reconnect-with-backoff covers the common transient failures
(daemon hiccup, stream EOF) and /healthz flips back to 200
without intervention; truly stuck-unhealthy states would require
an operator-initiated restart. Acceptable for the single-replica
MVP; a self-crash on detected unrecoverable state is a Phase 6+
follow-up if we see this in practice.No new packages needed. setpriv ships in util-linux which is in
the node:20-slim base image. The Dockerfile already installs
iptables (egress lockdown) and ca-certificates (trust-store
population); both are required regardless of backend. gosu is
retained for compatibility with any external tooling that still
expects it.
Originally specced to add libcap2-bin for capsh; dropped after
the smoke pass switched to setpriv (T5.5's "Why setpriv" note).
Compose deltas from the K8s posture:
restart: unless-stopped brings the
proxy back. The risk is documented in approvals-plan.md § Risks.
No HA story for compose.server.py is unchanged.
Compose sends SIGTERM on docker stop with a configurable timeout
(default 10s -- matches _DRAIN_TIMEOUT_S exactly, but worth
setting stop_grace_period: 20s on the service to give the drain
the same outer window K8s uses).onyxdotapp/backend image. CI does not gain a sandbox-proxy build
step.pool_size=4, max_overflow=4
(set in server.py). For single-host compose with a handful of
active sandboxes this is comfortable. Promote to env-tunable if a
real ops need surfaces.Test-tier conventions per CLAUDE.md. WAIT_TIMEOUT_S is
monkey-patched to <1s in tests where wall-clock waits would otherwise
poison CI.
Unit (backend/tests/unit/sandbox_proxy/):
test_backend_dispatch.py -- build_ca_store() and
build_ip_lookup() return the right type for each
SANDBOX_BACKEND value; raise on unknown.test_ca_docker.py -- file persistence happy path; O_EXCL
conflict path raises CAStoreConflictError; missing ca.key
after a cold-cluster crash raises with a clear message rather
than silently regenerating.test_identity_docker.py -- container labels parse correctly
into SandboxIdentity; duplicate IPs on initial sync raise;
reconnect-after-error path resets _synced.test_docker_manager_config.py -- existing test gains two cases:
(a) with SANDBOX_PROXY_HOST set, env contains the expected proxy
NET_ADMIN; (b) without
SANDBOX_PROXY_HOST set, the env is exactly the legacy 4-key
allowlist and cap_drop=ALL stands alone.test_opencode_config.py -- single-provider builder emits the
plugin path when supplied.External-dependency unit
(backend/tests/external_dependency_unit/sandbox_proxy/):
test_identity_docker_resolver.py -- spin up real sandbox-labelled
containers via docker run, assert lookup finds them, evicts on
removal. Skip if /var/run/docker.sock is absent.Integration (CI lane mirroring pr-craft-k8s-tests.yml): a new
pr-craft-compose-tests.yml lane stands up the docker-compose stack
with the --include-craft overlay, provisions a sandbox, triggers a
gated Slack request via a stand-in matcher, POSTs APPROVE via the
decision API, and asserts the upstream forward happened. The whole
test reuses test_approval_gate.py's end-to-end shape -- the gate
logic is identical, only the infrastructure underneath differs.
Smoke (runbook, not automated): on a fresh compose deployment with
--include-craft, provision a sandbox, run curl https://example.com
from inside it and confirm the chain shows the proxy CA;
curl --noproxy '*' https://example.com fails (iptables denies);
nslookup example.com fails (DNS is closed); curl -6 ... fails
(IPv6 dropped); a real Slack send through the gate triggers an
approval card in the chat UI.
--include-craft
(i.e. the onyx_craft_sandbox external network created and the
Docker socket mounted into api_server / background).cache and relational_db reachable from the proxy on the default
compose network. Already true.libcap2-bin (T5.8).deploy.replicas: 1
or rely on the absence of a deploy: section. Cosmetic; pick one
and document it.stop_grace_period: 20s should also be backported onto
api_server / background to give their celery beat workers a similar
drain window (out of scope for this phase, but worth noting if
the answer is "yes we should").--include-craft brings up a
sandbox-proxy service alongside the existing api_server /
background / postgres / redis / minio / web_server.DockerSandboxManager run
firewall-init.sh as their entrypoint, install the proxy CA into
the trust store, lock down egress via iptables, self-verify the
lockdown, drop to UID 1000 with no capabilities in the bounding
set, and start the agent.curl https://api.slack.com/... from inside a sandbox succeeds, is
MITM'd with a leaf cert signed by the proxy CA, and the proxy logs
the flow with a resolved SessionContext.curl https://example.com --noproxy '*' from inside a sandbox fails
(iptables denies). nslookup example.com fails. curl -6 fails.action_approval row, an
announce on approval:announce:{session_id}, an
ApprovalRequestedPacket on the open chat SSE, and routes APPROVED
forwards / REJECTED 403s identically to the K8s integration test.stop_grace_period.test_docker_manager_config.py allowlist invariants hold for both
proxy-enabled and proxy-disabled postures.setpriv is removed from
the image (e.g. switching to a -distroless base that strips
util-linux), the script's setpriv line fails and
firewall-init.sh exits non-zero, taking the sandbox down at
startup. This is the safe failure mode -- noisy, not silent. The
earlier alternative ("just gosu, trust the kernel to drop caps
on execve") is the unsafe failure mode: silently runs with
NET_ADMIN in the bounding set and rests on kernel-transition
subtleties for safety. Stick with setpriv.api_server already
crosses. Containers with socket access are root-equivalent on the
host even with RO mode (RO blocks writes, not info disclosure that
could enable lateral movement -- but the events API itself is the
intended surface here). Documented as part of the existing trust
posture; no new boundary.restart: unless-stopped brings it back. For compose this is acceptable;
the K8s two-replica posture is the upgrade path if a self-hosted
deployment ever needs higher availability.sandbox-proxy name is
resolvable only inside the onyx_craft_sandbox network. If a
deployer ever runs multiple compose stacks side-by-side with
different project names, the external network sharing requires
care -- external: true keeps them on the same bridge by
construction. Documented in docker-compose.craft.yml.setpriv (in util-linux, always present), gosu is only
retained for tooling compatibility. A follow-up image cleanup could
drop it.