docs/developers/qwen-serve-protocol.md
qwen serve HTTP protocol referenceStage 1 of the qwen-code daemon design. All routes live under the daemon's base URL (default http://127.0.0.1:4170).
When the daemon was started with --token or QWEN_SERVER_TOKEN, every route except /health on loopback binds must carry:
Authorization: Bearer <token>
Without a configured token (loopback dev default) the header is optional. Token comparison is constant-time. 401 responses are uniform across missing header / wrong scheme / wrong token.
/health exemption (Bctum): on loopback binds (127.0.0.1 / localhost / ::1 / [::1]) /health is registered BEFORE the bearer middleware, so liveness probes inside the pod don't need to carry the token even when the daemon was started with --token. Non-loopback binds (--hostname 0.0.0.0 etc.) gate /health behind the bearer like every other route — see the GET /health section for the rationale.
--require-auth (#4175 PR 15). Pass this flag at boot to extend the "must have a token" rule to loopback as well. Boot fails without a token; the /health exemption is dropped (so /health also requires Authorization: Bearer …).
When the flag is on, the global bearerAuth middleware gates every route — including /capabilities. An unauthenticated client therefore cannot pre-flight caps.features to discover that auth is required: the discovery surface for that case is the 401 response body itself (uniform across all routes per the Authentication section). The require_auth capability tag is a post-authentication confirmation — once a client successfully authenticates and reads /capabilities, the tag's presence confirms the daemon was started with --require-auth (useful for audit / compliance UIs and for SDK clients to surface "this deployment is hardened" in a settings panel). Mutation routes that opt into per-route strict mode (Wave 4 follow-ups) refuse with 401 { code: "token_required", error: "…" } when reached on a no-token loopback default — but with --require-auth enabled the global bearer middleware short-circuits the request before the per-route gate, so the legacy Unauthorized body is what unauthenticated callers actually see.
5xx responses carry the original error's code and data when present (JSON-RPC style — the ACP SDK forwards {code, message, data} from the agent):
{
"error": "Internal error",
"code": -32000,
"data": { "reason": "model quota exceeded" }
}
Malformed JSON in a request body returns:
{ "error": "Invalid JSON in request body" }
with status 400.
SessionNotFoundError for an unknown session id returns:
{ "error": "No session with id \"<sid>\"", "sessionId": "<sid>" }
with status 404.
WorkspaceMismatchError for a POST /session whose cwd doesn't canonicalize to the daemon's bound workspace (#3803 §02 — 1 daemon = 1 workspace) returns 400 with:
{
"error": "Workspace mismatch: daemon is bound to \"…\" but request asked for \"…\". …",
"code": "workspace_mismatch",
"boundWorkspace": "/path/the/daemon/binds",
"requestedWorkspace": "/path/in/the/request"
}
Use this to detect mismatch pre-flight: read workspaceCwd off /capabilities and omit cwd from POST /session (it falls back to the bound workspace), or route the request to a daemon bound to requestedWorkspace.
POST /session past the daemon's --max-sessions cap returns 503 with a Retry-After: 5 header and:
{
"error": "Session limit reached (20)",
"code": "session_limit_exceeded",
"limit": 20
}
Attaches to existing sessions are NOT counted toward the cap, so an idle daemon's reconnects keep working even when at-capacity.
RestoreInProgressError — only emitted by POST /session/:id/load and POST /session/:id/resume — returns 409 with a Retry-After: 5 header (matching session_limit_exceeded) and:
{
"error": "Session \"<sid>\" is already being restored via session/<resume|load>; retry session/<load|resume> after it completes",
"code": "restore_in_progress",
"sessionId": "<sid>",
"activeAction": "load",
"requestedAction": "resume"
}
Fired when a session/load is issued for an id that already has a session/resume in flight (or vice versa). Wait at least Retry-After seconds and retry — the underlying restore completes within initTimeoutMs (default 10s). Same-action races (load vs load, resume vs resume) coalesce instead of erroring.
The daemon advertises its supported feature tags from the serve capability
registry. Clients must gate UI off features, not off mode (per design
§10).
['health', 'capabilities', 'session_create', 'session_scope_override',
'session_load', 'unstable_session_resume',
'session_list', 'session_prompt', 'session_cancel', 'session_events',
'slow_client_warning', 'typed_event_schema',
'session_set_model', 'client_identity', 'client_heartbeat',
'session_permission_vote', 'permission_vote', 'workspace_mcp', 'workspace_skills',
'workspace_providers', 'workspace_env', 'workspace_preflight',
'session_context', 'session_supported_commands',
'session_close', 'session_metadata', 'mcp_guardrails',
'mcp_guardrail_events',
'workspace_file_read', 'workspace_file_bytes', 'workspace_file_write',
'session_approval_mode_control', 'workspace_tool_toggle',
'workspace_init', 'workspace_mcp_restart']
session_scope_override is the negotiation handle for the per-request sessionScope field on POST /session (see below). Older daemons silently ignore the field, so SDK clients should pre-flight caps.features for this tag before sending it.
session_load and unstable_session_resume advertise the explicit-restore routes (POST /session/:id/load and POST /session/:id/resume). Older daemons return 404 for these paths, so SDK clients should pre-flight caps.features before calling. The unstable_ prefix on unstable_session_resume mirrors the underlying ACP method (connection.unstable_resumeSession) — the daemon's wire shape is committed for v1, but the ACP method name itself may change before ACP marks resume stable.
slow_client_warning covers two co-released SSE backpressure knobs introduced in #4175 Wave 2.5 PR 10: (a) the daemon emits a slow_client_warning synthetic event-stream frame when a subscriber's queue crosses 75% full, once per overflow episode (rearmed after the queue drains below 37.5%); (b) GET /session/:id/events accepts a ?maxQueued=N query param (range [16, 2048]) to pre-size the per-subscriber backlog for cold reconnects against a large replay ring. The daemon-wide ring size is controlled by --event-ring-size (default 8000, per #3803 §02). Old daemons silently lack both — pre-flight this tag before opting in.
typed_event_schema advertises daemon event payloads that match the SDK's KnownDaemonEvent schema. Older daemons may still stream compatible frames, but SDK clients should pre-flight this tag before assuming typed event coverage.
client_heartbeat advertises POST /session/:id/heartbeat. Older daemons return 404; pre-flight this tag before issuing periodic heartbeats.
session_close and session_metadata advertise DELETE /session/:id and PATCH /session/:id/metadata. Older daemons return 404; pre-flight these tags before exposing close or rename affordances.
session_approval_mode_control, workspace_tool_toggle, workspace_init, and workspace_mcp_restart (issue #4175 PR 17) advertise the four mutation control routes documented under "Mutation: approval, tools, init, MCP restart" below. All four are strict-gated by the PR 15 mutation gate (a daemon configured without a bearer token rejects them with 401 token_required). Older daemons return 404; pre-flight each tag before exposing the corresponding affordance.
mcp_guardrails (issue #4175 PR 14) covers the MCP budget surface: the clientCount / clientBudget / budgetMode / budgets[] fields on GET /workspace/mcp, the disabledReason field on per-server cells, and the --mcp-client-budget / --mcp-budget-mode CLI flags. Older daemons omit the new fields entirely; SDK clients pre-flight this tag before relying on budgets[] semantics. The registry descriptor also carries modes: ['warn', 'enforce'] for future feature-modes exposure — for now, clients infer mode from the snapshot's budgetMode field. Server refusal under enforce mode is deterministic by Object.entries(mcpServers) declaration order; a future scope-precedence layer (if qwen-code adopts one) would shift this to "lowest-precedence first" to mirror claude-code's plugin < user < project < local convention.
⚠️ PR 14 v1 scope: per-session, not per-workspace. Each ACP session inside the daemon constructs its own
Config+McpClientManager(viaacpAgent.newSessionConfig). The budget caps live MCP clients per session; each session independently readsQWEN_SERVE_MCP_CLIENT_BUDGETfrom the forwarded env. With--mcp-client-budget=10and 5 concurrent ACP sessions, the actual live MCP client count can reach 5 × 10 = 50 across the daemon. TheGET /workspace/mcpsnapshot reads the bootstrap session'sMcpClientManageraccounting only — thebudgets[0].scope: 'session'value is the honest signal that this is per-session, not aggregated. Wave 5 PR 23 (shared MCP pool) will introduce a workspace-scoped manager and add ascope: 'workspace'cell alongside the per-session cell for true cross-session aggregation. v1 is the in-process counter + soft enforcement foundation that PR 23 builds on.
workspace_file_read covers the text/list/stat/glob workspace file routes
(GET /file, GET /list, GET /glob, GET /stat). workspace_file_bytes
covers GET /file/bytes, which was added later so clients can pre-flight raw
byte-window support against PR19-era daemons. workspace_file_write covers
the hash-aware text mutation routes (POST /file/write, POST /file/edit).
The write tag means the route contract exists; it does not mean the current
deployment is open for anonymous mutation. Write/edit are strict mutation
routes and require a configured bearer token even on loopback.
Conditional tags. A small number of feature tags are advertised only when the matching deployment toggle is on. Tag presence = behavior is on; absence = either an older daemon predating the tag, OR a current daemon where the operator did not opt in. Currently:
| Tag | Advertised when … |
|---|---|
require_auth | the daemon was started with --require-auth (or requireAuth: true via the embedded API). Bearer token is mandatory on every route, including /health on loopback binds. |
mcp_guardrails is not in this conditional table — it's an always-on tag, advertised whenever the binary supports the new /workspace/mcp budget fields, regardless of whether the operator configured a budget. Operators who haven't set --mcp-client-budget still get the new fields (with budgetMode: 'off', budgets: []).
mcp_guardrail_events (issue #4175 PR 14b) advertises the typed SSE push events that surface MCP budget state crossings without a poll loop. Two frame types arrive on GET /session/:id/events:
mcp_budget_warning — fires once on the upward 75% crossing of reservedSlots.size / clientBudget. Re-arms only after the ratio drops below 37.5% (MCP_BUDGET_REARM_FRACTION). Mirrors PR 10's slow_client_warning hysteresis, but at the manager level rather than the per-subscriber backlog level. Payload: { liveCount, reservedCount, budget, thresholdRatio: 0.75, mode: 'warn' | 'enforce' }. Fires under both warn and enforce modes; never under off.mcp_child_refused_batch — fires at end of each discoverAllMcpTools* pass when one or more servers were refused, AND as a length-1 batch on the readResource lazy-spawn refusal path. Payload: { refusedServers: [{ name, transport, reason: 'budget_exhausted' }, ...], budget, liveCount, reservedCount, mode: 'enforce' }. mode is the literal 'enforce' because warn mode never refuses.Both events live in the per-session SSE replay ring (they carry an id) so a client reconnecting with Last-Event-ID resumes through them; the snapshot at GET /workspace/mcp is still the source-of-truth for state-after-extended-disconnect. Always-on once advertised — there is no conditional toggle. SDK reducer state (DaemonSessionViewState) exposes mcpBudgetWarningCount, lastMcpBudgetWarning, mcpChildRefusedBatchCount, lastMcpChildRefusedBatch for adapters that want simple lag-style UI.
GET /healthLiveness probe. Default form returns 200 {"status":"ok"} if the listener is up — cheap, no bridge access, suitable for high-frequency k8s/Compose liveness probes.
Pass ?deep=1 (also accepts ?deep=true or bare ?deep) for a probe that exposes bridge counters (informational only, not a true liveness check):
{ "status": "ok", "sessions": 3, "pendingPermissions": 1 }
⚠️ The deep probe is informational, not a real liveness verification. It reads counter accessors (
bridge.sessionCount,bridge.pendingPermissionCount) which are simple Map-size getters; they don't ping individual child processes / channels and so won't detect a wedged-but-still-counted session. Use it for capacity dashboards (current concurrency vs.--max-sessions, queue depth) rather than as the trigger for "pull this daemon out of rotation". A503 {"status":"degraded"}response is theoretically possible if a custom bridge implementation's getters throw, but the real bridge's getters never do — under normal operation the deep probe always returns 200. For real liveness, rely on whether the listener accepts a TCP connection at all (i.e. the default/healthwithout?deep).
Auth: required only on non-loopback binds. On loopback (127.0.0.1, ::1, [::1]) /health is registered before the bearer middleware so k8s/Compose probes inside the pod don't need to carry the token. On non-loopback (--hostname 0.0.0.0 etc.) the route is registered after the bearer middleware and returns 401 without a valid token — otherwise an unauthenticated caller could probe arbitrary addresses to confirm a qwen serve exists, a low-severity info leak that combines poorly with port scanning. CORS deny + Host allowlist still apply on the loopback exemption.
GET /capabilities{
"v": 1,
"protocolVersions": {
"current": "v1",
"supported": ["v1"]
},
"mode": "http-bridge",
"features": ["health", "capabilities", "..."],
"modelServices": [],
"workspaceCwd": "/canonical/path/to/workspace"
}
Stable contract: when v increments the frame layout has changed in a backwards-incompatible way.
protocolVersionsdescribes the serve protocol versions the daemon can speak.currentis the daemon's preferred protocol version andsupportedis the compatible set. Clients that require a specific protocol should checksupported; feature-specific UI should still gate onfeatures. Additive to v=1: older v=1 daemons omit this field, so SDK clients that target older builds should treat it as optional.
modelServicesis always[]in Stage 1. The agent uses its single default model service and doesn't enumerate it over the wire. Stage 2 will populate this from registered model adapters so SDK clients can build service-pickers; until then, do NOT rely on this field being non-empty.
workspaceCwdis the canonical absolute path this daemon binds to (#3803 §02 — 1 daemon = 1 workspace). Use it to (a) detect mismatch before posting/sessionand (b) omitcwdonPOST /session(the route falls back to this path). Multi-workspace deployments expose multiple daemons on different ports, each with its ownworkspaceCwd. Additive to v=1: pre-§02 v=1 daemons omit the field — clients that target older builds should null-check before consuming it.
These routes report daemon-side runtime snapshots. They are additive v1 routes,
do not mutate state, and do not change the serve protocol version. Workspace
status routes intentionally do not start the ACP child process just because
a client polls a GET route: if the daemon is idle, they return
initialized: false with an empty snapshot. Session status routes require a
live session and use the standard 404 SessionNotFoundError shape for unknown
ids.
Capability tags:
workspace_mcp → GET /workspace/mcpworkspace_skills → GET /workspace/skillsworkspace_providers → GET /workspace/providersworkspace_env → GET /workspace/envworkspace_preflight → GET /workspace/preflightsession_context → GET /session/:id/contextsession_supported_commands → GET /session/:id/supported-commandsCommon status cell:
type DaemonStatus =
| 'ok'
| 'warning'
| 'error'
| 'disabled'
| 'not_started'
| 'unknown';
type DaemonErrorKind =
| 'missing_binary'
| 'blocked_egress'
| 'auth_env_error'
| 'init_timeout'
| 'protocol_error'
| 'missing_file'
| 'parse_error';
interface DaemonStatusCell {
kind: string;
status: DaemonStatus;
error?: string;
errorKind?: DaemonErrorKind;
hint?: string;
}
errorKind is a closed enum shared by /workspace/preflight,
/workspace/env, and (eventually) MCP guardrails so SDK clients can render
remediation per category instead of parsing free-form messages. PR 13
(#4175) introduced the seven literals listed above; PR 14 will populate
blocked_egress once the egress probe lands.
Status payloads never expose MCP env values, headers, OAuth/service-account
details, provider API keys, provider baseUrl / envKey, skill body, skill
filesystem paths, hook definitions, or values of secret environment
variables. /workspace/env reports the presence of whitelisted env
vars only; proxy URLs are stripped of credentials and reduced to
host:port before they hit the wire.
GET /workspace/mcp{
"v": 1,
"workspaceCwd": "/canonical/path",
"initialized": true,
"discoveryState": "completed",
"servers": [
{
"kind": "mcp_server",
"status": "ok",
"name": "docs",
"mcpStatus": "connected",
"transport": "stdio",
"disabled": false,
"description": "Documentation server",
"extensionName": "docs-ext"
}
]
}
discoveryState is one of not_started, in_progress, or completed.
transport is one of stdio, sse, http, websocket, sdk, or
unknown. errors is omitted when discovery succeeds.
MCP client guardrails (issue #4175 PR 14). Post-PR-14 daemons extend the payload with four additive fields and one workspace-level cell:
{
"v": 1,
"workspaceCwd": "/canonical/path",
"initialized": true,
"discoveryState": "completed",
"clientCount": 3,
"clientBudget": 2,
"budgetMode": "enforce",
"budgets": [
{
"kind": "mcp_budget",
"scope": "session",
"status": "error",
"errorKind": "budget_exhausted",
"hint": "Raise --mcp-client-budget or remove servers from mcpServers config.",
"liveCount": 2,
"budget": 2,
"mode": "enforce",
"refusedCount": 1,
},
],
"servers": [
{
"kind": "mcp_server",
"status": "ok",
"name": "a",
"mcpStatus": "connected",
"transport": "stdio",
"disabled": false,
},
{
"kind": "mcp_server",
"status": "ok",
"name": "b",
"mcpStatus": "connected",
"transport": "stdio",
"disabled": false,
},
{
"kind": "mcp_server",
"status": "error",
"name": "c",
"mcpStatus": "disconnected",
"transport": "stdio",
"disabled": false,
"disabledReason": "budget",
"errorKind": "budget_exhausted",
"hint": "...",
},
],
}
budgetMode is one of enforce, warn, or off. clientBudget is absent when no budget was set. budgets[] is always an array on post-PR-14 daemons (possibly empty when budgetMode === 'off'); pre-PR-14 daemons omit the field entirely. v1 emits one cell with scope: 'session' (per-session enforcement — see the capabilities section above for why). Consumers MUST tolerate additional budgets[] entries with unrecognized scope values — Wave 5 PR 23 will add scope: 'workspace' (or 'pool') alongside the per-session cell without a schema bump.
disabledReason on per-server cells distinguishes operator-disabled ('config' — disabledMcpServers config list) from budget-refused ('budget' — discovered but never connected due to enforce mode). Refusals are deterministic by Object.entries(mcpServers) declaration order. The per-server status: 'error', errorKind: 'budget_exhausted' shadows the raw mcpStatus: 'disconnected' (which is true but not the operator-facing severity).
Budget enforcement in PR 14 v1 is per-session, not per-workspace. Although Mode B daemons are 1 daemon = 1 workspace × N sessions post-#4113 at the process level, the McpClientManager is constructed inside each ACP session's Config via acpAgent.newSessionConfig, so N sessions each enforce their own copy of the cap. The snapshot represents the bootstrap session's view. Wave 5 PR 23 introduces a workspace-scoped shared MCP pool that graduates this to true per-workspace enforcement.
Detecting budget pressure. Two surfaces, both populated post-PR-14b:
Push events (advertised via mcp_guardrail_events): subscribe to GET /session/:id/events and narrow mcp_budget_warning / mcp_child_refused_batch frames through KnownDaemonEvent. The state machine fires once per upward 75% crossing (re-armed below 37.5%); refusals are coalesced once per discovery pass under enforce mode.
Snapshot poll (advertised via mcp_guardrails): GET /workspace/mcp and inspect the per-session budget cell (budgets[0]):
budgets[0].status === 'warning' ⇔ liveCount >= 0.75 * clientBudget (matches the hysteresis threshold PR 14b's push event will use).
budgets[0].status === 'error' ⇔ refusedCount > 0 (one or more servers refused this discovery pass).
budgets[0].status === 'ok' ⇔ below the 75% threshold AND no refusals.
Recommended poll cadence: aligned with whatever already polls /workspace/mcp; the snapshot is cheap and the budget cell carries no extra discovery cost. SDK clients that subscribe to push events still benefit from the snapshot for state-after-extended-disconnect (the SSE replay ring depth is finite — --event-ring-size, default 8000 — so a client offline longer than the ring's coverage falls back to snapshot resync).
GET /workspace/skills{
"v": 1,
"workspaceCwd": "/canonical/path",
"initialized": true,
"skills": [
{
"kind": "skill",
"status": "ok",
"name": "review",
"description": "Review code",
"level": "project",
"modelInvocable": true,
"argumentHint": "[path]"
}
]
}
level is one of project, user, extension, or bundled. errors is
omitted when discovery succeeds.
GET /workspace/providers{
"v": 1,
"workspaceCwd": "/canonical/path",
"initialized": true,
"current": { "authType": "qwen", "modelId": "qwen3(qwen)" },
"providers": [
{
"kind": "model_provider",
"status": "ok",
"authType": "qwen",
"current": true,
"models": [
{
"modelId": "qwen3(qwen)",
"baseModelId": "qwen3",
"name": "Qwen 3",
"description": null,
"contextLimit": 4096,
"isCurrent": true,
"isRuntime": false
}
]
}
]
}
Models are grouped by auth type. Provider connection diagnostics live on
/workspace/preflight's providers cell; environment preflight lives on
/workspace/preflight and /workspace/env (below). errors is omitted
when snapshot construction succeeds.
GET /workspace/envReports the daemon process's runtime, platform, sandbox, proxy, and the
presence of whitelisted secret environment variables. Always answers
from process.* state — the daemon never spawns an ACP child to serve
this route, and the response is identical whether ACP is up or idle. The
acpChannelLive field is informational only.
{
"v": 1,
"workspaceCwd": "/canonical/path",
"initialized": true,
"acpChannelLive": false,
"cells": [
{ "kind": "runtime", "name": "node", "status": "ok", "value": "22.4.0" },
{ "kind": "platform", "name": "darwin", "status": "ok", "value": "arm64" },
{
"kind": "sandbox",
"name": "SANDBOX",
"status": "disabled",
"present": false
},
{
"kind": "proxy",
"name": "HTTPS_PROXY",
"status": "ok",
"present": true,
"value": "proxy.internal:1080"
},
{
"kind": "proxy",
"name": "NO_PROXY",
"status": "disabled",
"present": false
},
{
"kind": "env_var",
"name": "OPENAI_API_KEY",
"status": "ok",
"present": true
},
{
"kind": "env_var",
"name": "ANTHROPIC_BASE_URL",
"status": "disabled",
"present": false
}
]
}
Cell shape:
type DaemonEnvKind =
| 'runtime' // name: 'node' | 'bun' | 'unknown'; value: process.versions.node
| 'platform' // name: process.platform; value: process.arch
| 'sandbox' // name: 'SANDBOX' | 'SEATBELT_PROFILE'; value optional
| 'proxy' // name: HTTP_PROXY | HTTPS_PROXY | NO_PROXY | ALL_PROXY; value: redacted host
| 'env_var'; // presence-only; value field is ALWAYS omitted
interface DaemonEnvCell extends DaemonStatusCell {
kind: DaemonEnvKind;
name: string;
present?: boolean;
value?: string;
}
Redaction policy. kind: 'env_var' cells never include a value
field; clients see present: boolean only. kind: 'proxy' cells run the
raw env value through credential redaction (redactProxyCredentials) and
then through URL parsing so the wire only carries host:port. NO_PROXY
is passed through redaction verbatim because it is a host list rather than
a URL. The whitelist of enumerated secret env vars currently includes
OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, GOOGLE_API_KEY,
DASHSCOPE_API_KEY, OPENROUTER_API_KEY, and QWEN_SERVER_TOKEN. Other
env vars are not enumerated, so accidentally-set secrets stay invisible.
GET /workspace/preflightReports daemon readiness checks. Daemon-level cells (node_version,
cli_entry, workspace_dir, ripgrep, git, npm) are always
populated from process.* and node:fs. ACP-level cells (auth,
mcp_discovery, skills, providers, tool_registry, egress)
require a live ACP child — when the daemon is idle they emit
status: 'not_started' placeholders. The route never spawns ACP solely
to populate cells; the corresponding cells fall back to not_started.
Idle response (no ACP child):
{
"v": 1,
"workspaceCwd": "/canonical/path",
"initialized": true,
"acpChannelLive": false,
"cells": [
{
"kind": "node_version",
"status": "ok",
"locality": "daemon",
"detail": { "version": "22.4.0", "required": ">=22" }
},
{
"kind": "cli_entry",
"status": "ok",
"locality": "daemon",
"detail": { "path": "/usr/local/bin/qwen", "source": "process.argv[1]" }
},
{
"kind": "workspace_dir",
"status": "ok",
"locality": "daemon",
"detail": { "path": "/canonical/path" }
},
{ "kind": "ripgrep", "status": "ok", "locality": "daemon" },
{
"kind": "git",
"status": "ok",
"locality": "daemon",
"detail": { "version": "2.45.0" }
},
{
"kind": "npm",
"status": "ok",
"locality": "daemon",
"detail": { "version": "10.7.0" }
},
{
"kind": "auth",
"status": "not_started",
"locality": "acp",
"hint": "spawn a session to populate"
},
{
"kind": "mcp_discovery",
"status": "not_started",
"locality": "acp",
"hint": "spawn a session to populate"
},
{
"kind": "skills",
"status": "not_started",
"locality": "acp",
"hint": "spawn a session to populate"
},
{
"kind": "providers",
"status": "not_started",
"locality": "acp",
"hint": "spawn a session to populate"
},
{
"kind": "tool_registry",
"status": "not_started",
"locality": "acp",
"hint": "spawn a session to populate"
},
{
"kind": "egress",
"status": "not_started",
"locality": "acp",
"hint": "egress probing lands in PR 14 (#4175)"
}
]
}
Cell shape:
type DaemonPreflightKind =
| 'node_version'
| 'cli_entry'
| 'workspace_dir'
| 'ripgrep'
| 'git'
| 'npm'
| 'auth'
| 'mcp_discovery'
| 'skills'
| 'providers'
| 'tool_registry'
| 'egress';
interface DaemonPreflightCell extends DaemonStatusCell {
kind: DaemonPreflightKind;
locality: 'daemon' | 'acp';
detail?: Record<string, unknown>;
}
errorKind semantics:
missing_binary — Node version below required, missing QWEN_CLI_ENTRY,
ripgrep / git / npm not on PATH (warnings rather than errors for the
optional binaries).missing_file — boundWorkspace does not exist or is not a directory;
skill parse error pointing at a missing or unreadable file.parse_error — SKILL.md parse failure, malformed config JSON.auth_env_error — validateAuthMethod returned a non-null failure
string, or a ModelConfigError subclass propagated from provider
resolution.init_timeout — withTimeout reject in the bridge (an actual timeout
while waiting on an ACP roundtrip). Recognized via the
BridgeTimeoutError typed class. Note: a transient mcp_discovery
warning cell with connecting > 0 does NOT carry this kind — that's
a normal handshake-in-progress state, distinct from a real timeout.protocol_error — ACP extMethod rejected because the channel closed
mid-request, or because tool registry was unexpectedly absent.blocked_egress — reserved for PR 14 (#4175). PR 13 leaves the
egress cell as status: 'not_started'.If the bridge fails to reach the ACP child while serving a preflight
request (e.g. a mid-request channel close), the envelope's errors array
carries a single ServeStatusCell describing the failure and the cells
fall back to not_started ACP placeholders. Daemon-level cells are still
returned.
All file paths are resolved through the daemon's bound workspace. Responses use workspace-relative paths and never return absolute filesystem paths for normal success cases. Successful file responses include:
Cache-Control: no-store
X-Content-Type-Options: nosniff
Filesystem errors use this JSON shape:
{
"errorKind": "hash_mismatch",
"error": "expected sha256:..., found sha256:...",
"hint": "re-read the file and retry with the latest hash",
"status": 409
}
errorKind values include path_outside_workspace, symlink_escape,
path_not_found, binary_file, file_too_large, untrusted_workspace,
permission_denied, parse_error, hash_mismatch,
file_already_exists, text_not_found, and ambiguous_text_match.
GET /fileReads a text file. Query params: path (required), maxBytes, line, and
limit. The daemon rejects binary files and files above the text read cap.
The response includes hash, a SHA-256 digest over the raw on-disk bytes for
the whole file, even when line, limit, or maxBytes returned a slice.
{
"kind": "file",
"path": "src/index.ts",
"content": "export {};\n",
"encoding": "utf-8",
"bom": false,
"lineEnding": "lf",
"sizeBytes": 11,
"returnedBytes": 11,
"truncated": false,
"hash": "sha256:...",
"matchedIgnore": null,
"originalLineCount": null
}
GET /file/bytesReads raw bytes from a file without decoding. Query params: path (required),
offset (default 0), and maxBytes (default 65536, max 262144). This
route supports bounded windows on large binary files without slurping the whole
file. The response includes hash only when the returned window covers the
entire file.
{
"kind": "file_bytes",
"path": "assets/logo.png",
"offset": 0,
"sizeBytes": 3912,
"returnedBytes": 3912,
"truncated": false,
"contentBase64": "...",
"hash": "sha256:..."
}
POST /file/writeCreates or replaces a text file. This is a strict mutation route: on loopback
without a configured token it returns 401 { "code": "token_required" }.
With --require-auth, the global bearer middleware rejects unauthenticated
requests before the route runs.
Body:
{
"path": "src/new.ts",
"content": "export const value = 1;\n",
"mode": "create"
}
{
"path": "src/existing.ts",
"content": "export const value = 2;\n",
"mode": "replace",
"expectedHash": "sha256:..."
}
mode must be create or replace. create never overwrites an existing
file (409 file_already_exists). replace requires expectedHash; missing or
malformed hashes are 400 parse_error, and stale hashes are
409 hash_mismatch. expectedHash is sha256: plus 64 lowercase hex
characters, computed over raw on-disk bytes.
bom, encoding, and lineEnding may be supplied. Replacement preserves the
existing file's encoding profile by default; explicit fields override it.
Binary writes are out of scope.
The daemon writes to a random temp file in the target directory, fsyncs where
supported, re-checks the current hash immediately before rename(), then
renames into place. This prevents partial-file observation and serializes
daemon-originated writes to the same file, but it is not a cross-process
kernel compare-and-swap: an external editor can still race in the tiny window
between final hash check and rename.
{
"kind": "file_write",
"path": "src/existing.ts",
"mode": "replace",
"created": false,
"sizeBytes": 24,
"hash": "sha256:...",
"encoding": "utf-8",
"bom": false,
"lineEnding": "lf",
"matchedIgnore": null
}
POST /file/editApplies one exact text replacement to an existing text file. This is also a
strict mutation route and requires expectedHash.
{
"path": "src/config.ts",
"oldText": "timeout: 30000",
"newText": "timeout: 60000",
"expectedHash": "sha256:..."
}
oldText must be non-empty and occur exactly once. No match returns
422 text_not_found; multiple matches return 422 ambiguous_text_match.
The route preserves encoding, BOM, and line endings, and re-checks
expectedHash immediately before the atomic rename.
Explicit writes/edits to ignored paths are allowed because the authenticated
caller named the path. Success responses and audit events include
matchedIgnore: "file" | "directory" | null.
{
"kind": "file_edit",
"path": "src/config.ts",
"replacements": 1,
"sizeBytes": 128,
"hash": "sha256:...",
"encoding": "utf-8",
"bom": false,
"lineEnding": "lf",
"matchedIgnore": null
}
GET /session/:id/context{
"v": 1,
"sessionId": "<sid>",
"workspaceCwd": "/canonical/path",
"state": {
"models": {},
"modes": {},
"configOptions": []
}
}
state mirrors the same ACP model/mode/config-option shapes used by
POST /session, POST /session/:id/load, and POST /session/:id/resume.
GET /session/:id/supported-commands{
"v": 1,
"sessionId": "<sid>",
"availableCommands": [
{
"name": "init",
"description": "Initialize the project",
"input": null,
"_meta": { "source": "builtin" }
}
],
"availableSkills": ["review"]
}
availableCommands is the same command snapshot used by the
available_commands_update SSE notification. availableSkills lists skill
names only; clients must not expect skill bodies or paths over this route.
POST /sessionSpawn a new agent or attach to an existing one (under sessionScope: 'single', the default).
Request:
{
"cwd": "/absolute/path/to/workspace",
"modelServiceId": "qwen-prod",
"sessionScope": "thread"
}
| Field | Required | Notes |
|---|---|---|
cwd | no | Absolute path matching the daemon's bound workspace. If omitted, the route falls back to boundWorkspace (read it off /capabilities.workspaceCwd). A mismatched non-empty cwd returns 400 workspace_mismatch (#3803 §02 — 1 daemon = 1 workspace). Workspace paths are canonicalized via realpathSync.native (with a resolve-only fallback for non-existent paths) so case-insensitive filesystems don't reject sessions per spelling. |
modelServiceId | no | Selects which configured model service the agent will route through (the back-end provider — Alibaba ModelStudio, OpenRouter, etc). If omitted the agent uses its default. If the workspace already has a session, this calls setSessionModel on the existing one and broadcasts model_switched. Distinct from modelId on POST /session/:id/model, which selects the model within an already-bound service. The modelServices array on /capabilities is reserved for advertising configured services; in Stage 1 it is always [] (the agent's default service is used and not enumerated over HTTP). |
sessionScope | no | Per-request override for session sharing. 'single' (the daemon-wide default) makes a second same-workspace POST /session reuse the existing session (attached: true); 'thread' forces a fresh distinct session every call. Omit to inherit the daemon-wide default. Values outside the enum return 400 { code: 'invalid_session_scope' }. Old daemons (pre-#4175 PR 5) silently ignore the field — pre-flight caps.features.session_scope_override before sending. The daemon-wide default is hardcoded to 'single' in production today; #4175 may add a --sessionScope CLI flag in a follow-up. |
Response:
{
"sessionId": "<uuid>",
"workspaceCwd": "/canonical/path",
"attached": false
}
attached: true means a session for that workspace already existed and you're now sharing it.
Concurrent POST /session calls for the same workspace are coalesced to one spawn — both callers get the same sessionId, exactly one reports attached: false. If the underlying spawn fails (init timeout, malformed agent output, OOM), all coalesced callers receive the same error — the in-flight slot is cleared so a follow-up call can retry from scratch.
⚠️
modelServiceIdrejection on a fresh session is silent on the HTTP response. A badmodelServiceId(typo, unconfigured service) does NOT 500 the create — the session stays operational on the agent's default model so the caller still gets asessionIdthey can retry the model switch against (viaPOST /session/:id/model). The visible failure signal is amodel_switch_failedevent on the session's SSE stream, fired between the spawn handshake and your first subscribe. Subscribers that need to observe this event should passLast-Event-ID: 0on their firstGET /session/:id/eventsto replay from the ring's oldest available event (covers the spawn-timemodel_switch_failedeven if the subscribe lands a few ms after the create response).
POST /session/:id/loadRestore a persisted ACP session by id and replay its history through SSE. The path id is authoritative; any sessionId field in the body is ignored. Pre-flight caps.features.session_load — older daemons return 404 for this route.
Request:
{
"cwd": "/absolute/path/to/workspace"
}
| Field | Required | Notes |
|---|---|---|
cwd | no | Same canonicalization + workspace_mismatch rules as POST /session. Omit to inherit /capabilities.workspaceCwd. mcpServers is intentionally NOT accepted here — daemon-wide MCP is settings-driven (matches POST /session). |
Response:
{
"sessionId": "persisted-1",
"workspaceCwd": "/canonical/path",
"attached": false,
"state": {
"models": { ... },
"modes": { ... },
"configOptions": [ ... ]
}
}
state mirrors ACP's LoadSessionResponse — models is a SessionModelState, modes a SessionModeState, configOptions an array of SessionConfigOption. Missing fields are agent-decided. Late attachers (the attached: true paths below) get the SAME state snapshot the original load caller saw — the daemon caches it on the entry; runtime mutations (e.g. model_switched) are delivered on the SSE stream, not on subsequent attach responses.
attached: true means the session was already live (either from a prior session/load/session/resume, or because a coalesced concurrent caller raced just ahead).
History replay over SSE. While loadSession is in flight on the agent side, the agent emits session_update notifications for every persisted turn. The daemon buffers them onto the session's event-bus before the route response returns, so subscribers that immediately call GET /session/:id/events with Last-Event-ID: 0 see the full replay. The replay ring is bounded (default 4000 frames per session). Long histories with many tool-call / thought-stream turns can exceed that — the oldest frames are dropped silently. Clients that need full history should subscribe immediately after load returns; alternatively they can persist the SSE event ids and use Last-Event-ID to resume from a later turn boundary.
Errors:
404 — persisted session id doesn't exist (SessionNotFoundError).400 — workspace_mismatch (same shape as POST /session).503 — session_limit_exceeded (counts against --max-sessions; in-flight restores are accounted for too).409 — restore_in_progress (a session/resume for the same id is already in flight). Retry-After: 5. Same-action races (two concurrent session/load for the same id) coalesce — exactly one returns attached: false, the rest return attached: true with the same state.POST /session/:id/resumeRestore a persisted ACP session by id WITHOUT replaying history through SSE. The model context is restored internally on the agent side (via geminiClient.initialize reading config.getResumedSessionData); the SSE stream stays clean for clients that already have history rendered. Pre-flight caps.features.unstable_session_resume.
Same request shape as /load. Same response shape — state mirrors ACP's ResumeSessionResponse. Same error envelope, including 409 restore_in_progress (which fires when a session/load is in flight; session/resume racing behind another session/resume coalesces).
Use /load when the client has no history rendered (cold reconnect, picker → open). Use /resume when the client already has the turns on screen and only needs the daemon-side handle back.
⚠️ Why
unstable_on the capability tag? The route is wire-stable for the daemon's v1, but it's backed by ACP'sconnection.unstable_resumeSessionwhich is still subject to ACP-side breaking changes. The daemon insulates the wire shape from those changes; the prefix is a courtesy signal so SDK consumers know the underlying agent contract is not yet locked.
GET /workspace/:id/sessionsList all live sessions whose canonical workspace matches :id (URL-encoded absolute cwd).
curl http://127.0.0.1:4170/workspace/$(jq -rn --arg c "$PWD" '$c|@uri')/sessions
Response:
{
"sessions": [
{
"sessionId": "<uuid>",
"workspaceCwd": "/canonical/path",
"createdAt": "2026-05-17T08:30:00.000Z",
"displayName": "My Session",
"clientCount": 2,
"hasActivePrompt": false
}
]
}
Empty array (not 404) when no sessions exist — a session-picker UI shouldn't error just because the workspace is idle.
POST /session/:id/promptForward a prompt to the agent. Multi-prompt callers FIFO-queue per session (ACP guarantees one active prompt per session).
Request:
{
"prompt": [{ "type": "text", "text": "What does src/main.ts do?" }]
}
Validation: prompt must be a non-empty array of objects. Other failures return 400 before reaching the bridge.
Response:
{ "stopReason": "end_turn" }
Other stop reasons: cancelled, max_tokens, error, length (per ACP spec).
If the HTTP client disconnects mid-prompt, the daemon sends an ACP cancel notification to the agent, which winds the prompt down with stopReason: "cancelled".
Stage 1 limitation — no server-side prompt timeout. The bridge only races the agent's
prompt()againsttransportClosedReject(the agent child crashing) and the caller's HTTP-disconnect AbortSignal. A wedged-but-alive agent (e.g. a model call that hangs) blocks the per-session FIFO until the HTTP client times out on its end and disconnects. Long-running prompts are legitimate (deep research, large-codebase analysis) so a default deadline is deliberately not set; Stage 2 will expose a configurablepromptTimeoutMsopt-in. Until then, callers should set their own client-side timeout and disconnect (or callPOST /session/:id/cancel) on expiry.
POST /session/:id/cancelCancel the currently active prompt on the session. ACP-side this is a notification, not a request — the agent acknowledges by resolving the active prompt() with cancelled.
curl -X POST http://127.0.0.1:4170/session/$SID/cancel
# → 204 No Content
Multi-prompt contract: cancel only affects the active prompt. Any prompts the same client previously POSTed and are still queued behind the active one will continue to execute. Multi-prompt queueing is a daemon-introduced behavior (not in ACP spec); the contract for queued prompts is "they keep running unless you cancel each, or kill the session via channel exit".
DELETE /session/:idExplicitly close a live session. Force-closes even when other clients are attached — cancels any active prompt, resolves pending permissions as cancelled, publishes session_closed event, closes the EventBus, and removes the session from daemon maps. On-disk persisted sessions are NOT deleted — they can be reloaded via POST /session/:id/load. Pre-flight caps.features.session_close.
curl -X DELETE http://127.0.0.1:4170/session/$SID
# → 204 No Content
Idempotent: returns 404 for unknown sessions (same SessionNotFoundError shape as other routes).
session_closedevent. SSE subscribers receive a terminalsession_closedevent with{ sessionId, reason: 'client_close', closedBy?: '<clientId>' }before the stream ends. SDK reducers treat this identically tosession_died(setsalive: false, clearspendingPermissions).
PATCH /session/:id/metadataUpdate mutable session metadata. Currently supports displayName only. Pre-flight caps.features.session_metadata.
Request:
{ "displayName": "My Investigation Session" }
| Field | Required | Notes |
|---|---|---|
displayName | no | String, max 256 characters. Empty string clears the name. Omit to leave as-is. |
Response:
{ "sessionId": "<uuid>", "displayName": "My Investigation Session" }
Publishes a session_metadata_updated event on the session's SSE stream with { sessionId, displayName }.
POST /session/:id/heartbeatBump the daemon's last-seen bookkeeping for this session. Long-lived adapters (TUI/IDE/web) ping this on an interval so future revocation policy (Wave 5 PR 24) can distinguish dead clients from quiet ones.
Headers:
| Header | Required | Notes |
|---|---|---|
X-Qwen-Client-Id | no | Echoes the daemon-issued id from POST /session. Identified clients also bump their per-client timestamp; anonymous heartbeats only bump the per-session watermark. Must satisfy the same [A-Za-z0-9._:-]{1,128} shape as elsewhere. |
Request body is empty ({} is fine — no fields are read today).
Response:
{
"sessionId": "<sid>",
"clientId": "<cid>",
"lastSeenAt": 1700000000123
}
clientId is echoed only when a trusted X-Qwen-Client-Id was supplied. lastSeenAt is the daemon-side Date.now() epoch (ms) the bridge stored.
Errors:
400 — { code: 'invalid_client_id' } when the header is malformed (header-shape rule) or when it carries a clientId that isn't registered for this session (the bridge throws InvalidClientIdError before bumping any timestamp).404 — unknown session.Capability gating: pre-flight caps.features.client_heartbeat. Older daemons return 404 for this path.
POST /session/:id/modelSwitch the active model within the session's currently bound model service. Serialized through the per-session model-change queue.
(For switching the service itself — Alibaba ModelStudio vs OpenRouter etc — pass modelServiceId on POST /session for a fresh session. Stage 1 has no live service-switch route.)
Request:
{ "modelId": "qwen-staging" }
Response:
{ "modelId": "qwen-staging" }
On success, publishes model_switched to the SSE stream. On failure, publishes model_switch_failed (so passive subscribers see the failure, not just the caller). Races against the agent channel exit so a wedged child can't block the HTTP handler.
Issue #4175 Wave 4 PR 17 adds four mutation control routes that let remote clients change runtime posture without touching the daemon host's CLI. All four:
401 {code: 'token_required'}. Configure --token (or QWEN_SERVER_TOKEN) before opting in.X-Qwen-Client-Id header (PR 7 audit chain). When the header carries a trusted id, the daemon emits originatorClientId on the corresponding SSE event so cross-client UIs can suppress echoes of their own mutations.404 for the route.Three of the four routes (tools/:name/enable, init, mcp/:server/restart) emit workspace-scoped events: every active session SSE bus receives the event, regardless of which session was attached when the mutation was triggered. approval-mode emits a session-scoped event because the change is local to one session's Config.
POST /session/:id/approval-modeCapability tag: session_approval_mode_control. Bridge → ACP extMethod qwen/control/session/approval_mode.
Change the approval mode of a live session. The new mode lands inside the ACP child's per-session Config immediately. Settings are NOT written to disk by default — pass persist: true to also write tools.approvalMode to workspace settings.
Request:
{ "mode": "auto-edit", "persist": false }
mode must be one of 'plan' | 'default' | 'auto-edit' | 'auto' | 'yolo' (mirror of core's ApprovalMode enum; the SDK exports DAEMON_APPROVAL_MODES for runtime validation). persist defaults to false.
Response (200):
{
"sessionId": "sess:42",
"mode": "auto-edit",
"previous": "default",
"persisted": false
}
Errors:
400 {code: 'invalid_approval_mode', allowed: [...]} — unknown mode literal.400 {code: 'invalid_persist_flag'} — persist is non-boolean.403 {code: 'trust_gate', errorKind: 'auth_env_error'} — the requested mode requires a trusted folder (privileged modes in untrusted workspaces are rejected by core's Config.setApprovalMode).404 — session unknown.SSE event (session-scoped): approval_mode_changed with {sessionId, previous, next, persisted, originatorClientId?}.
POST /workspace/tools/:name/enableCapability tag: workspace_tool_toggle. Pure file IO — no ACP roundtrip.
Toggle a tool name in the workspace's tools.disabled settings list. Tools listed there are not registered at all (distinct from permissions.deny, which keeps the tool registered and rejects invocation). Both built-in tools and MCP-discovered tools flow through ToolRegistry.registerTool, which consults the disabled set.
⚠️ Names must match the registry's exposed identifier exactly. No alias resolution happens — the route stores whatever string is in the path parameter into
tools.disabled, and the next ACP child compares againsttool.nameat register time. Built-ins use their canonical registry name (snake_case verb form):run_shell_command,read_file,write_file,list_directory,glob,search_file_content,ripgrep,web_fetch, etc. — NOT the display labels (Shell,Read,Write) that the CLI surfaces. MCP-discovered tools use the qualifiedmcp__<server>__<name>form (which is also the formtool_toggledevents broadcast and whatGET /workspace/mcplists). DisablingBashwill NOT preventrun_shell_commandfrom registering on the next session.
Live ACP children retain already-registered tools — the toggle takes effect on the next ACP child spawn. Combine with POST /workspace/mcp/:server/restart (for MCP-sourced tools) or new-session creation to make the change effective in the current daemon.
Unknown tool names are accepted: pre-disabling a not-yet-installed MCP tool is a legitimate use case.
Request:
{ "enabled": false }
Response (200):
{ "toolName": "run_shell_command", "enabled": false }
Errors:
400 {code: 'invalid_tool_name'} — empty path parameter, or path parameter exceeds the 256-character cap.400 {code: 'invalid_enabled_flag'} — enabled missing or non-boolean.SSE event (workspace-scoped): tool_toggled with {toolName, enabled, originatorClientId?}.
POST /workspace/initCapability tag: workspace_init. Pure file IO — no ACP roundtrip, no LLM invocation.
Scaffold an empty QWEN.md (or whatever getCurrentGeminiMdFilename() returns under --memory-file-name overrides) at the daemon's bound workspace root. Mechanical only — for AI-driven content fill, follow up with POST /session/:id/prompt.
Default refuses to overwrite when the target file exists with non-whitespace content. Whitespace-only files are treated as absent (matches the local /init slash command).
Request:
{ "force": false }
Response (200):
{ "path": "/work/bound/QWEN.md", "action": "created" }
action is 'created' for fresh creates, 'noop' when an existing whitespace-only file was left untouched (no write performed), and 'overwrote' when force: true replaced non-empty content. The workspace_initialized SSE event mirrors the response action — observers can filter for action !== 'noop' to react only to actual on-disk changes.
Errors:
400 {code: 'invalid_force_flag'} — force is non-boolean.409 {code: 'workspace_init_conflict', path, existingSize} — file exists with non-whitespace content and force is omitted/false. Body carries the absolute path and size (bytes) so SDK clients can render an "overwrite N bytes?" prompt without re-stat'ing.SSE event (workspace-scoped): workspace_initialized with {path, action, originatorClientId?}.
POST /workspace/mcp/:server/restartCapability tag: workspace_mcp_restart. Bridge → ACP extMethod qwen/control/workspace/mcp/restart.
Restart a configured MCP server through the ACP child's McpClientManager.discoverMcpToolsForServer (disconnect + reconnect + rediscover). Pre-checks the live budget snapshot from PR 14 v1's accounting so a restart on a budget-saturated workspace returns a soft refusal rather than triggering a BudgetExhaustedError cascade.
Request body is empty ({}). The path parameter is the URL-encoded server name as it appears in mcpServers config.
Response (200) — discriminated union on restarted:
{ "serverName": "docs", "restarted": true, "durationMs": 1234 }
{
"serverName": "docs",
"restarted": false,
"skipped": true,
"reason": "budget_would_exceed"
}
Soft skip reasons (all return 200):
reason | Meaning |
|---|---|
'in_flight' | Another discovery / restart for this server is already in progress. The route returns immediately rather than awaiting the original promise. Caller should retry after a short delay. |
'disabled' | Server is configured but listed in excludedMcpServers. Re-enable before restart. |
'budget_would_exceed' | Daemon is --mcp-budget-mode=enforce, the target server is not currently in reservedSlots, and the live total has reached clientBudget. Caller should free a slot first. |
Errors (non-2xx):
400 {code: 'invalid_server_name'} — empty path parameter.404 — server name not in mcpServers config, or no live ACP channel exists (restart inherently requires a live McpClientManager instance).500 — internal error (e.g. ToolRegistry not initialized).SSE events (workspace-scoped): mcp_server_restarted with {serverName, durationMs, originatorClientId?} on success; mcp_server_restart_refused with {serverName, reason, originatorClientId?} on soft skip.
GET /session/:id/events (SSE)Subscribe to the session's event stream.
Headers:
Accept: text/event-stream
Last-Event-ID: 42 ← optional, replays from after id 42
Query params:
| Param | Required | Notes |
|---|---|---|
maxQueued | no | Per-subscriber live-backlog cap. Range [16, 2048], default 256. Replay frames force-pushed at subscribe time are exempt from the cap; what actually consumes it is live events that arrive while the subscriber is still draining a large Last-Event-ID: 0 replay. Bump for cold reconnects so the live tail doesn't trip the slow-client warning / eviction before the consumer catches up. Out-of-range / non-decimal / present-but-empty values return 400 invalid_max_queued before the SSE handshake opens. Pre-flight caps.features.slow_client_warning — old daemons silently ignore the param. |
Frame format. The data: line is the full event envelope, JSON-stringified on a single line — {id?, v, type, data, originatorClientId?}. The ACP-specific payload (sessionUpdate, requestPermission arguments, etc.) sits under the envelope's data field; the envelope's own type matches the SSE event: line.
id: 7
event: session_update
data: {"id":7,"v":1,"type":"session_update","data":{"sessionUpdate":"agent_message_chunk","content":{"type":"text","text":"…"}}}
id: 8
event: permission_request
data: {"id":8,"v":1,"type":"permission_request","data":{"requestId":"<uuid>","sessionId":"<sid>","toolCall":{...},"options":[...]}}
: heartbeat ← every 15s, no payload
event: client_evicted ← terminal frame, no id (synthetic)
data: {"v":1,"type":"client_evicted","data":{"reason":"queue_overflow","droppedAfter":42}}
The SSE-level id: / event: lines duplicate envelope.id / envelope.type for EventSource compatibility. Raw-fetch consumers (the SDK's parseSseStream) read everything off the JSON envelope and ignore the SSE preamble lines.
| Event type | Trigger |
|---|---|
session_update | Any ACP sessionUpdate notification (LLM chunks, tool calls, usage) |
permission_request | Agent asked for tool approval |
permission_resolved | Some client voted on a permission via POST /permission/:requestId |
model_switched | POST /session/:id/model succeeded |
model_switch_failed | POST /session/:id/model rejected |
session_died | Agent child crashed unexpectedly. Terminal: SSE stream closes after this frame; the session is gone from byId. Subscribers should reconnect via POST /session to spawn a fresh one. |
slow_client_warning | Subscriber-local: queue ≥ 75% full. Non-terminal — the stream continues; the warning is a heads-up before eviction. Carries {queueSize, maxQueued, lastEventId}. Fires ONCE per overflow episode; re-arms after the queue drains below 37.5%. No id (synthetic). Pre-flight caps.features.slow_client_warning. |
client_evicted | Subscriber-local: queue overflow. Terminal: SSE stream closes after this frame (no id — synthetic). Other subscribers on the same session continue. |
stream_error | Daemon-side error during fan-out. Terminal: SSE stream closes after this frame (no id — synthetic). |
Reconnect semantics:
Last-Event-ID: <n> to replay events with id > n from the per-session ring (default depth 8000, tunable via qwen serve --event-ring-size <n>)<n> predates the oldest event still in the ring (e.g. you reconnect with Last-Event-ID: 50 but the ring now holds 200–1199), the daemon replays from the oldest available event without raising. Compare the first replayed event's id against n + 1; any difference is the size of the lost window. Stage 2 will inject an explicit stream_gap synthetic frame on the daemon side; in Stage 1 detection is the client's responsibility.client_evicted, slow_client_warning, stream_error) intentionally omit id so they don't burn a sequence slot for other subscribersBackpressure:
maxQueued: 256 live items (replay frames during reconnect bypass the cap). Override via ?maxQueued=N (range [16, 2048]) on the SSE request.slow_client_warning synthetic frame to that subscriber (once per overflow episode; re-armed after drain below 37.5%). The stream stays open — the warning is a heads-up so the client can drain faster or detach + reconnect cleanly.client_evicted terminal frame and closes the subscription.POST /permission/:requestIdCast a vote on a pending permission_request. First responder wins — once one client answers, every other client trying to answer the same id gets 404.
Stage 1 limitation — no permission timeout. A
permission_requeststays pending until: (a) some client votes here, (b)POST /session/:id/cancelfires, (c) the HTTP client driving the prompt disconnects (mid-prompt cancel resolves outstanding permissions ascancelled), (d) the session is killed, or (e) the daemon shuts down. In a fully-headless deployment with no SSE subscriber,requestPermissionblocks the agent indefinitely — there's nothing to time out the wait. Stage 2 will add a configurablepermissionTimeoutMs. Until then, headless callers should keep an SSE subscription open or wrap their prompt loop in their own timeout
POST /session/:id/cancelon expiry.
Request:
{
"outcome": {
"outcome": "selected",
"optionId": "proceed_once"
}
}
Outcomes:
{ "outcome": "selected", "optionId": "<one-of-the-options>" } — accept / reject / proceed-once / etc, per the agent's offered choices{ "outcome": "cancelled" } — drop the request (matches what cancelSession / shutdown do internally)Response:
200 {} — your vote was accepted404 { "error": "..." } — the requestId is unknown (already resolved, never existed, or session torn down)After a successful vote, every connected client sees permission_resolved with the same requestId and the chosen outcome.
The daemon brokers an OAuth 2.0 Device Authorization Grant (RFC 8628) so a remote SDK client can trigger a login whose tokens land on the daemon filesystem — not on the client. The daemon polls the IdP itself; the client's only job is to display the verification URL + user code and (optionally) subscribe to SSE for completion events.
Capability tag: auth_device_flow (always advertised). Supported providers in v1: qwen-oauth.
Runtime locality. The daemon never spawns a browser — even if it can. The client decides whether to call open(verificationUri) locally; on a headless pod (the canonical Mode B deployment) the user opens the URL on whatever device they have a browser on. See docs/users/qwen-serve.md for the recommended UX.
No token leakage in events. auth_device_flow_started carries {deviceFlowId, providerId, expiresAt} only. The user code and verification URL come back point-to-point in the POST 201 body and via GET /workspace/auth/device-flow/:id; they are never broadcast on SSE.
Per-provider singleton. A second POST for the same provider while a flow is pending is an idempotent take-over — it returns the existing entry with attached: true rather than starting a fresh IdP request.
POST /workspace/auth/device-flowStrict mutation gate: requires a bearer token even on token-less loopback defaults (401 token_required).
Request:
{ "providerId": "qwen-oauth" }
Response (201 fresh start, 200 idempotent take-over):
{
"deviceFlowId": "fa07c61b-…",
"providerId": "qwen-oauth",
"status": "pending",
"userCode": "USER-1",
"verificationUri": "https://chat.qwen.ai/api/v1/oauth2/device",
"verificationUriComplete": "https://chat.qwen.ai/api/v1/oauth2/device?user_code=USER-1",
"expiresAt": 1700000600000,
"intervalMs": 5000,
"attached": false
}
Errors:
400 unsupported_provider — unknown providerId (response includes supportedProviders)409 too_many_active_flows — workspace cap (4) reached; cancel one with DELETE401 token_required — strict gate denied a token-less request502 upstream_error — IdP returned an unexpected errorGET /workspace/auth/device-flow/:idRead the current state. Pending entries echo userCode/verificationUri/expiresAt/intervalMs; terminal entries (5-min grace) drop them and surface status + optional errorKind/hint.
Returns 404 device_flow_not_found for unknown ids and post-grace evicted entries.
DELETE /workspace/auth/device-flow/:idIdempotent cancel:
204 + emit auth_device_flow_cancelled204 no-op (no event re-emit)404GET /workspace/auth/statusSnapshot of pending flows + supported providers:
{
"v": 1,
"workspaceCwd": "/work/bound",
"providers": [],
"pendingDeviceFlows": [
{
"deviceFlowId": "fa07c61b-…",
"providerId": "qwen-oauth",
"expiresAt": 1700000600000
}
],
"supportedDeviceFlowProviders": ["qwen-oauth"]
}
Five typed events (workspace-scoped, fanned out to every active session bus):
auth_device_flow_started {deviceFlowId, providerId, expiresAt} — POST succeeded; SDK should subscribe (no userCode here, fetch via GET if needed)auth_device_flow_throttled {deviceFlowId, intervalMs} — daemon honored upstream slow_down; clients polling GET should bump their interval to matchauth_device_flow_authorized {deviceFlowId, providerId, expiresAt?, accountAlias?} — credentials persisted; accountAlias is a non-PII label (never email/phone)auth_device_flow_failed {deviceFlowId, errorKind, hint?} — terminal; errorKind is one of expired_token | access_denied | invalid_grant | upstream_error | persist_failed. persist_failed is daemon-internal: the IdP exchange succeeded but the daemon couldn't durably store credentials (EACCES / EROFS / ENOSPC). The user should retry once the underlying disk condition is fixed.auth_device_flow_cancelled {deviceFlowId} — DELETE succeeded against a pending entryNot MCP-compatible. The MCP authorization spec (2025-06-18) mandates OAuth 2.1 + PKCE auth-code with a redirect callback, which doesn't work for headless-pod daemons. Mode B's device-flow surface is daemon-private — clients targeting MCP-compliant servers should use a different auth path.
Events are emitted as standard EventSource frames. The daemon writes one data: line per frame (the JSON has no embedded newlines after JSON.stringify); the SDK parser at packages/sdk-typescript/src/daemon/sse.ts handles both that and the spec-allowed multi-data: form on the receive side.
If the bridge iterator throws while serving an SSE subscriber, the daemon emits a terminal stream_error frame (no id). The data: line is the full envelope (same shape as every other SSE frame in this doc); the actual error message lives under envelope.data.error:
event: stream_error
data: {"v":1,"type":"stream_error","data":{"error":"<message>"}}
The connection then closes.
| Var | Purpose |
|---|---|
QWEN_SERVER_TOKEN | Bearer token. Stripped of leading/trailing whitespace at boot. |
SKIP_LLM_TESTS | Set to 1 to skip LLM-required integration tests in integration-tests/cli/qwen-serve-streaming.test.ts (default-on for CI envs that lack provider API keys). |
| Path | Purpose |
|---|---|
packages/cli/src/commands/serve.ts | yargs command + flag schema |
packages/cli/src/serve/runQwenServe.ts | listener lifecycle + signal handling |
packages/cli/src/serve/server.ts | Express routes + middleware |
packages/cli/src/serve/auth.ts | bearer + Host allowlist + CORS deny |
packages/cli/src/serve/httpAcpBridge.ts | spawn-or-attach + per-session FIFO + permission registry |
packages/cli/src/serve/status.ts | read-only daemon status wire types + ServeErrorKind + BridgeTimeoutError + mapDomainErrorToErrorKind |
packages/cli/src/serve/envSnapshot.ts | pure helper that builds /workspace/env payloads from process.* state, including credential redaction |
packages/cli/src/serve/eventBus.ts | bounded async queue + replay ring |
packages/sdk-typescript/src/daemon/DaemonClient.ts | TS client |
packages/sdk-typescript/src/daemon/sse.ts | EventSource frame parser |
integration-tests/cli/qwen-serve-routes.test.ts | 18 cases, no LLM |
integration-tests/cli/qwen-serve-streaming.test.ts | 3 cases, real qwen --acp child (skipped when SKIP_LLM_TESTS=1) |