docs/craft/features/streaming/docker-opencode-serve.md
Port DockerSandboxManager from the per-message opencode acp exec to the long-lived opencode serve HTTP transport that the Kubernetes backend already uses. Prerequisite for drop-acp-layer.md, which deletes the ACP transport entirely.
DockerSandboxManager.send_message (backend/onyx/server/features/build/sandbox/docker/docker_sandbox_manager.py:1039-1103) spawns a DockerACPExecClient per user message — same per-process startup cost, same session-lifetime-tied-to-one-turn, same opencode-1.15.7-drops-the-terminator bug enumerated in opencode-serve-migration.md §Issues. The Kubernetes backend already migrated; self-hosted docker-compose deployments are stuck on the buggy path.
The blockers, all docker-specific:
send_message takes opencode_session_id / agent_provider / agent_model kwargs but marks them noqa: ARG002 — serve-only and ignores them. There is no _send_message_via_serve, no ensure_opencode_session override, no prompt_slot impl, no event bus.V1Secret holding OPENCODE_SERVER_PASSWORD + OPENCODE_CONFIG_CONTENT (kubernetes_sandbox_manager.py:372-429). Docker has no equivalent — build_container_create_kwargs (docker_sandbox_manager.py:347-350) is an env allowlist of {ONYX_PAT, ONYX_SERVER_URL} enforced by test_docker_manager_config.py.OPENCODE_CONFIG_CONTENT at provision time. The K8s path uses pod-wide build_multi_provider_opencode_config so per-prompt model overrides can switch providers without restarting opencode (opencode_config.py:1-7 — opencode-serve does not hot-reload config). The Docker path writes per-session opencode.json files via build_opencode_config in setup_session_workspace (docker_sandbox_manager.py:668) and _regenerate_session_config (:1005-1033, write at :1018-1028), which serve cannot pick up since it loaded its provider list at startup.opencode serve, but only when AGENT_TRANSPORT=serve (backend/onyx/server/features/build/sandbox/image/entrypoint.sh:36 sets TRANSPORT="${AGENT_TRANSPORT:-acp}", gate at :46 ([ "$TRANSPORT" != "serve" ] → idle), serve branch at :56-80). The Docker manager today never sets AGENT_TRANSPORT, so the entrypoint falls through to the tail -f /dev/null idle branch and opencode acp is exec'd per message.ClusterIP Service at service_name.namespace.svc.cluster.local:4096 (kubernetes_sandbox_manager.py:2183-2194). Docker would need to reach the sandbox container over the onyx_craft_sandbox bridge, by container name on port 4096. No host port mapping (would break isolation); api_server must be on the same bridge or have a route into it.drop-acp-layer.mdTwo distinct kinds of risk:
opencode serve running in a Docker container. The blast radius is self-hosted users.drop-acp-layer.md deletes code that has already soaked. The blast radius is "did we miss a branch."Bundling them turns the deletion PR into a feature PR with a deletion riding along. Reviewer can't tell whether a failing test is "Docker serve has a bug" or "we missed an ACP branch." Keep them separate even if they land back-to-back.
The K8s manager owns six pieces of serve plumbing that map cleanly to Docker:
| K8s | Docker equivalent | Reuse strategy |
|---|---|---|
_get_service_name → DNS name | container name on bridge | Different impl, same interface — keep on each manager |
_get_opencode_secret_name + _provision_opencode_secret (V1Secret) | per-container env var injected at create | Different impl, same interface — but factor the password generation (secrets.token_urlsafe(32)) and the cleartext value into a small helper if it's used in both |
_read_opencode_password | dict lookup from container env (read back via Docker inspect) | Different impl |
_serve_base_url | f"http://{container_name}:{OPENCODE_SERVE_PORT}" | Trivial |
_wait_for_opencode_serve_ready | identical logic against the new base_url | Move to base.py as a default impl taking base_url + password; both managers call it |
_get_or_create_event_bus + _build_serve_client + _event_buses cache | identical logic | Move to base.py as a mixin or default impl; the only manager-specific bit is _serve_base_url |
The push daemon (PUSH_DAEMON_PORT=8731) is already reached via container name on the Docker bridge from the api_server (see DockerSandboxManager._docker and docker_sandbox_manager.py:32-46 docstring). Serve reachability is the same pattern; no new networking design is needed.
test_docker_manager_config.py locks down build_container_create_kwargs to a fixed env allowlist. Adding OPENCODE_SERVER_PASSWORD, OPENCODE_CONFIG_CONTENT, OPENCODE_SERVE_PORT, and AGENT_TRANSPORT=serve (transitional — see §Provisioning) means updating both the function and its test in the same change. Keep the invariant ("no S3/MinIO/Postgres/Redis creds, no compose service hostnames") and add only the four serve-related vars.
Password generation happens during provision(), before build_container_create_kwargs is called, and the cleartext is passed through as a parameter. Do not store it on disk on the api_server — the only persistent store is the Docker container's env, which _read_opencode_password recovers via client.containers.get(name).attrs["Config"]["Env"].
Because opencode-serve loads providers at startup, the Docker manager must:
opencode.json writing in setup_session_workspace and _regenerate_session_config (mirrors K8s, which already deletes this branch when AGENT_TRANSPORT=serve — kubernetes_sandbox_manager.py:1459-1471).build_multi_provider_opencode_config at provision time and inject it as OPENCODE_CONFIG_CONTENT.OpencodeServeClient.send_message already passes (see _send_message_via_serve in K8s).The per-session opencode.json files become dead but harmless. Leave them out rather than writing them — serve never reads them and they pollute snapshots.
K8s pulls LLMProviderConfig for every configured provider from the DB at provision (kubernetes_sandbox_manager.py:1166-1174). Docker's provision() (docker_sandbox_manager.py:499-530) takes a single LLMProviderConfig. To call build_multi_provider_opencode_config, the signature has to accept a list. Either:
provision() to take llm_configs: list[LLMProviderConfig] and have the session manager pass all configured providers (consistent with K8s), orbuild_opencode_config to build a single-provider OPENCODE_CONFIG_CONTENT — same behavior as today's per-session file, but injected at startup instead.Pick the second for this PR — it minimizes the surface area of the change. The first is a follow-up if/when Docker users actually need cross-provider per-prompt switching.
.opencode-dataSnapshotManager (backend/onyx/server/features/build/sandbox/manager/snapshot_manager.py) tars from the sandbox container. The shared /workspace volume that holds XDG_DATA_HOME=/workspace/.opencode-data (set indirectly by entrypoint.sh:35-38 — WORKSPACE_DATA_HOME then export XDG_DATA_HOME="${XDG_DATA_HOME:-$WORKSPACE_DATA_HOME}") is already snapshotted by both backends. No snapshot-format change.
The K8s plan flagged "snapshot mid-turn could capture a half-written SQLite WAL" — same mitigation here. The Docker sidecar daemon doesn't exist, so the abort-before-tar logic lives in DockerSandboxManager.create_snapshot directly: call OpencodeServeClient.abort for any active session on the sandbox before invoking tar. Out of scope for this PR if create_snapshot is already not called mid-turn; verify and document.
The api_server container needs to be on the onyx_craft_sandbox bridge network to resolve sandbox-{id} by name on port 4096. This is already true for the push-daemon path (PUSH_DAEMON_PORT=8731 on the same bridge); no compose change. Verify by reading the compose file and the existing push-daemon code path before claiming "no change" in the PR description.
base.pyToday SandboxManager (backend/onyx/server/features/build/sandbox/base.py) already defines prompt_slot, ensure_opencode_session, list_subagents, and subscribe_to_opencode_session as abstract / no-op defaults. K8s overrides all four with the serve-only real implementations, each gated on AGENT_TRANSPORT == AgentTransport.SERVE. The other serve helpers (_wait_for_opencode_serve_ready, _get_or_create_event_bus, _build_serve_client, _send_message_via_serve, _event_buses/_event_buses_lock/_terminated_sandboxes state) live only on the K8s subclass.
Move these to base:
_wait_for_opencode_serve_ready — promote to a concrete base method that calls two new abstracts: _serve_base_url(sandbox_id) -> str and _read_opencode_password(sandbox_id) -> str | None._get_or_create_event_bus + the _event_buses cache + _event_buses_lock + _terminated_sandboxes set._build_serve_client._send_message_via_serve — the whole body; it's already manager-agnostic once _serve_base_url and _read_opencode_password are abstract.prompt_slot, ensure_opencode_session, list_subagents, subscribe_to_opencode_session) with the real K8s implementations, and delete the K8s overrides.Subclasses implement only _serve_base_url and _read_opencode_password.
Doing this before writing the Docker serve path means the new path is ~50 lines, not ~500.
OpencodeServeClient.send_message takes model_provider / model_id, but KubernetesSandboxManager.send_message (and the manager interface generally) takes agent_provider / agent_model. The K8s _send_message_via_serve does the rename at the call site. When the body moves to base.py, the rename moves with it — Docker's send_message keeps the manager-side names. Don't introduce a third naming convention.
DockerSandboxManager to the new basePassword lifecycle. In provision(), generate secrets.token_urlsafe(32), build the OPENCODE_CONFIG_CONTENT JSON (single-provider via build_opencode_config), pass both into build_container_create_kwargs.
Env allowlist. Extend build_container_create_kwargs to take opencode_password: str and opencode_config_json: str, add them to the env dict alongside OPENCODE_SERVE_PORT=4096 and AGENT_TRANSPORT=serve. Update test_docker_manager_config.py to assert the new allowlist (still no S3/MinIO/Postgres/Redis creds, still no compose hostnames).
_serve_base_url. Return f"http://{_sandbox_container_name(sandbox_id)}:{OPENCODE_SERVE_PORT}".
_read_opencode_password. Read the container's env via self._docker.containers.get(name).attrs["Config"]["Env"], parse the OPENCODE_SERVER_PASSWORD=... line.
send_message. Replace the entire body (minus packet logging) with a call to the new base-class _send_message_via_serve (which lives on base after the refactor) — or, if that turns out to be too invasive to factor, copy the K8s _send_message_via_serve body verbatim and minimize from there.
ensure_opencode_session override. Same as K8s — build a serve client, call client.ensure_session(None, cwd=session_path, title=...). Or just inherit the base-class impl after the refactor.
Session setup. In setup_session_workspace (docker_sandbox_manager.py:682-744), drop the printf '%s' '{opencode_json}' > {session_path}/opencode.json line (around :668). Same for _regenerate_session_config (:1005-1033, write at :1018-1028), which is called from restore_snapshot (:920). AGENTS.md stays.
Cleanup on terminate. When terminate() removes the container, the event bus + the cached opencode password go with it. The base-class _terminated_sandboxes machinery handles this generically; verify it fires for Docker by hooking it into terminate().
No image change required — the existing image already runs opencode serve when AGENT_TRANSPORT=serve. Once this PR lands and Docker injects AGENT_TRANSPORT=serve at container create, the entrypoint takes the serve branch.
The AGENT_TRANSPORT=serve env var is transitional. It goes away in drop-acp-layer.md, at which point the entrypoint becomes unconditional.
Update docs/craft/opencode-serve-migration.md §"Migration phases" to note Docker is now serve-by-default. Add a one-paragraph entry to docs/craft/issues/opencode-serve-deploy-gotchas.md for the docker-compose specifics: api_server needs to be on onyx_craft_sandbox bridge, the password lives in container env, snapshot/restore needs the abort-before-tar guard.
External-dependency-unit (backend/tests/external_dependency_unit/craft/):
test_docker_sandbox_serve_streaming.py — mirror test_opencode_serve_streaming.py but against a Docker-provisioned sandbox container. Asserts ordered event sequence from send_message. Use a no-tools prompt for determinism.test_kubernetes_sandbox_file_ops.py if any imports churn from the base.py refactor.Unit (backend/tests/unit/onyx/server/features/build/sandbox/):
test_docker_manager_config.py — extend the env-allowlist assertion to include the four new serve env vars. Assert the OLD allowlist no longer matches (catches regressions in either direction).test_docker_provision_opencode_secret.py — assert password generation is per-provision and that OPENCODE_CONFIG_CONTENT is a valid build_opencode_config JSON.Integration (backend/tests/integration/tests/craft/):
test_messages_api.py already covers the serve path generically; verify it runs against the Docker backend (SANDBOX_BACKEND=docker) in CI. If not, parametrize.Existing tests that must not regress:
test_docker_acp_exec_client.py — this test exercises the path being deleted by drop-acp-layer.md, not this PR. It must still pass here (the ACP code is still in the tree).OPENCODE_CONFIG_CONTENT for Docker. Single-provider is sufficient and matches today's behavior; follow-up when needed.DockerACPExecClient). That happens in drop-acp-layer.md.