docs/users/qwen-serve.md
qwen serve)Run Qwen Code as a local HTTP daemon so multiple clients (IDE plugins, web UIs, CI scripts, custom CLIs) share one agent session over HTTP + Server-Sent Events instead of each spawning their own subprocess.
Status: Stage 1 (experimental). The protocol surface is locked at the §04 routes table from issue #3803. Stage 1.5 (
qwen --serveflag — TUI co-hosts the same HTTP server) and Stage 2 (in-process refactor +mDNS/OpenAPI/WebSocket/Prometheus polish) are immediately downstream.Scope honesty: Stage 1 is sized for developers prototyping clients against the protocol surface and for local single-user / small-team collaboration. Production-grade multi-client / long-running / network-flaky workloads (mobile companions, IM bots reaching 1000+ chats) need Stage 1.5+ guarantees that aren't in this release. See Stage 1.5+ runtime guarantees for the full gap list and #3803 for the convergence roadmap.
sessionScope: 'single', every client connecting to the daemon shares one ACP session. Live cross-client collaboration on the same conversation, the same file diffs, the same permission prompts.Last-Event-ID reconnect lets a client drop and pick up exactly where it left off (within the ring's replay window).qwen serve process binds to exactly one workspace at boot (per #3803 §02). Multi-workspace deployments run one daemon per workspace on separate ports (or behind an orchestrator).POST /session/:id/approval-mode), toggle a tool per workspace (POST /workspace/tools/:name/enable), scaffold an empty QWEN.md (POST /workspace/init, mechanical only — does NOT call the model; for AI-fill, follow up with POST /session/:id/prompt), or restart a single MCP server with a budget pre-check (POST /workspace/mcp/:server/restart). All four are strict-gated — configure --token first.cd your-project/
qwen serve
# → qwen serve listening on http://127.0.0.1:4170 (mode=http-bridge, workspace=/path/to/your-project)
# → qwen serve: bearer auth disabled (loopback default). Set QWEN_SERVER_TOKEN to enable.
The default bind is 127.0.0.1:4170. Bearer auth is off on loopback so local development "just works". The daemon binds to the current working directory; use --workspace /path/to/dir to override.
curl http://127.0.0.1:4170/health
# → {"status":"ok"}
curl http://127.0.0.1:4170/capabilities
# → {"v":1,"mode":"http-bridge","features":["health","capabilities","session_create",...],"workspaceCwd":"/path/to/your-project"}
The workspaceCwd field surfaces the bound workspace so clients can pre-flight check + omit cwd on POST /session.
The daemon also exposes read-only runtime snapshots for client UIs:
GET /workspace/mcp, GET /workspace/skills, GET /workspace/providers,
GET /workspace/env, GET /workspace/preflight,
GET /session/:id/context, and GET /session/:id/supported-commands.
GET /workspace/mcp, GET /workspace/skills, and GET /workspace/providers
report the live ACP runtime and do not start the ACP child when idle; an
idle daemon returns initialized: false with an empty snapshot. Once a
session is alive they switch to initialized: true and surface the real
state.
GET /workspace/env and GET /workspace/preflight always answer with
initialized: true regardless of ACP state. env never consults ACP
(daemon-process info only); preflight answers daemon-level cells from
process.* and emits status: 'not_started' placeholders for ACP-level
cells when the child is idle.
GET /workspace/env reports the daemon process's runtime, platform, sandbox,
proxy, and the presence (never the value) of whitelisted secret env vars
such as OPENAI_API_KEY. Proxy URLs are stripped of credentials and reduced
to host:port before they hit the wire. The route always answers from the
daemon process directly and never spawns an ACP child.
GET /workspace/preflight returns a list of readiness checks. Daemon-level
cells (Node version, CLI entry, workspace directory, ripgrep, git, npm)
always render. ACP-level cells (auth, MCP discovery, skills, providers,
tool registry, egress) require a live ACP child — when the daemon is idle
they emit status: 'not_started' placeholders rather than spawning ACP just
to populate them. Failures map to a closed errorKind enum (missing_binary,
auth_env_error, init_timeout, protocol_error, missing_file,
parse_error, blocked_egress) so client UIs can render structured
remediation.
The daemon also exposes workspace file helpers:
GET /file reads text files and returns a raw-byte sha256:<hex> hash.GET /file/bytes reads bounded raw byte windows and returns base64 content.POST /file/write creates or replaces text files.POST /file/edit applies one exact text replacement.Write/edit are strict mutation routes: even on loopback they require a
configured bearer token, otherwise they return token_required. Replacements
and edits require the latest expectedHash from GET /file (or a full-window
GET /file/bytes). create never overwrites. Explicit writes to ignored paths
are allowed but audited. Binary writes, delete/move/mkdir, and recursive parent
creation are not part of this surface.
curl -X POST http://127.0.0.1:4170/session \
-H 'Content-Type: application/json' \
-d '{}'
# → {"sessionId":"<uuid>","workspaceCwd":"…","attached":false}
cwd may be omitted — the route falls back to the daemon's bound workspace. Posting a cwd that doesn't match the bound workspace returns 400 workspace_mismatch (the daemon is bound to exactly one workspace; start a separate daemon for a different one).
A second client posting to /session (any matching cwd or none) gets "attached": true — they're now sharing the agent.
SESSION_ID="<from step 3>"
curl -N http://127.0.0.1:4170/session/$SESSION_ID/events
# → id: 1
# event: session_update
# data: {"id":1,"v":1,"type":"session_update","data":{"sessionUpdate":"agent_message_chunk","content":{"type":"text","text":"…"}}}
The data: line is the full event envelope — {id?, v, type, data, originatorClientId?} — JSON-stringified on a single line. The ACP payload (the sessionUpdate block in this example) sits under data inside that envelope. The SSE-level id: / event: lines are convenience for EventSource clients; the same values appear inside the JSON envelope so raw-fetch consumers get them too.
Open this before sending the prompt — the SSE replay buffer holds the
last 4000 events so a late subscriber can catch up via Last-Event-ID,
but for the simple "watch a single prompt" case it's easiest to subscribe
first and let it stream live.
The stream emits session_update (LLM chunks, tool calls, usage),
permission_request (tool needs approval), permission_resolved
(someone voted), model_switched, model_switch_failed, and the terminal
frames session_died (agent child crashed — SSE then closes) and
client_evicted (your queue overflowed — SSE then closes).
curl -X POST http://127.0.0.1:4170/session/$SESSION_ID/prompt \
-H 'Content-Type: application/json' \
-d '{"prompt":[{"type":"text","text":"What does src/main.ts do?"}]}'
# → {"stopReason":"end_turn"}
The curl -N from step 4 will print frames as they arrive.
For anything beyond loopback, you must pass a bearer token:
export QWEN_SERVER_TOKEN="$(openssl rand -hex 32)"
qwen serve --hostname 0.0.0.0 --port 4170
# → boot refuses without QWEN_SERVER_TOKEN
Clients then send Authorization: Bearer $QWEN_SERVER_TOKEN on every request. /health is exempted only on loopback binds so k8s/Compose liveness probes inside the pod (where the daemon listens on 127.0.0.1) don't need credentials. On non-loopback binds (--hostname 0.0.0.0 etc.) /health requires the token like every other route — otherwise an attacker can probe arbitrary addresses to confirm the daemon's existence. Use /capabilities to verify your token is correct end-to-end (it always requires auth):
Hardened loopback (
--require-auth). The default loopback no-token behavior is fine for a single-user laptop but unsafe on shared dev hosts, CI runners, or multi-tenant workstations where any local user cancurl 127.0.0.1:4170. Pass--require-authto make the bearer token mandatory on every route — including/healthand/capabilities— even when bound to127.0.0.1. Boot fails without a token. With the flag on, an unauthenticated client can't read/capabilitiesto discover that auth is required; the discovery surface is the 401 response body itself. Once authenticated, thecaps.features.require_authtag is a post-auth confirmation that the deployment is hardened (useful for audit / compliance UIs):bashqwen serve --require-auth --token "$(openssl rand -hex 32)" # → /health, /capabilities, /session, … all require Authorization: Bearer … curl http://127.0.0.1:4170/health # → 401 curl -H "Authorization: Bearer $TOKEN" http://127.0.0.1:4170/capabilities | jq '.features | index("require_auth")' # → 13 (or whatever index — non-null after authenticating means the tag is present)
curl -H "Authorization: Bearer $QWEN_SERVER_TOKEN" http://your-host:4170/capabilities
# → {"v":1,"mode":"http-bridge","features":[...],"modelServices":[],"workspaceCwd":"/path/to/your-project"}
# Wrong token → 401
The token comparison is constant-time (SHA-256 + crypto.timingSafeEqual); 401 responses are uniform across "missing header", "wrong scheme", and "wrong token" so a side-channel can't distinguish.
| Flag | Default | Purpose |
|---|---|---|
--port <n> | 4170 | TCP port. 0 = OS-assigned ephemeral port. |
--hostname <addr> | 127.0.0.1 | Bind interface. Anything beyond loopback requires a token. |
--token <str> | — | Bearer token. Falls back to QWEN_SERVER_TOKEN env var (with leading/trailing whitespace stripped — handy for $(cat token.txt)). |
--require-auth | false | Refuse to start without a bearer token, even on loopback. Hardens the 127.0.0.1 developer default for shared dev hosts / CI runners / multi-tenant workstations where any local user can hit the listener. Boots only with --token or QWEN_SERVER_TOKEN set; gates /health behind the bearer too. |
--max-sessions <n> | 20 | Cap on concurrent live sessions. New POST /session requests that would spawn a fresh child return 503 (with Retry-After: 5) when the cap is hit; attaches to existing sessions are NOT counted. Set to 0 to disable. Sized for single-user / small-team usage; raise it if your deployment has the RAM/FD headroom (~30–50 MB per session). |
--workspace <path> | process.cwd() | Absolute workspace path this daemon binds to (per #3803 §02 — 1 daemon = 1 workspace). POST /session requests with a mismatched cwd return 400 workspace_mismatch. For multi-workspace deployments, run one qwen serve per workspace on separate ports. |
--max-connections <n> | 256 | Listener-level TCP connection cap (server.maxConnections). Bounds raw socket count irrespective of session count — slow / phantom SSE clients get rejected at accept time once full. Raise alongside --max-sessions if your deployment expects many SSE subscribers per session. |
--event-ring-size <n> | 8000 | Per-session SSE replay ring depth (#3803 §02 target). Sets the backlog available to GET /session/:id/events with Last-Event-ID: N. Larger = more reconnect headroom at the cost of a few hundred KB extra RAM per session. SDK clients can additionally request a larger per-subscriber backlog cap on a specific subscription via ?maxQueued=N (range [16, 2048], default 256). Daemons also emit a non-terminal slow_client_warning SSE frame at 75% queue fill so clients can drain / reconnect before getting evicted. Pre-flight caps.features.slow_client_warning. |
--mcp-client-budget <n> | — | Positive integer cap on live MCP clients per ACP session (issue #4175 PR 14 v1; PR 23 graduates this to per-workspace via the shared MCP pool). Combine with --mcp-budget-mode. When unset, no accounting-driven enforcement (but GET /workspace/mcp still reports clientCount). Distinct from claude-code's MCP_SERVER_CONNECTION_BATCH_SIZE which gates startup concurrency, not the total client count. Pre-flight caps.features.mcp_guardrails. |
--mcp-budget-mode <m> | warn / off | How --mcp-client-budget is enforced. warn (default when budget set): no refusal, snapshot's budgets[0].status flips to warning at ≥75% of budget. enforce: connects past the cap are refused, per-server cell shows disabledReason: 'budget', deterministic by mcpServers declaration order. off (default when budget unset): pure observability. Boot rejects enforce without a budget. |
--http-bridge | true | Stage 1 mode: one qwen --acp child per daemon (bound to one workspace at boot, per #3803 §02); N sessions multiplex onto that child via ACP newSession(). Stage 2 native in-process becomes available later. |
Sizing the load knobs.
--max-sessionsis the new-child cap. Three other layers also limit load — when sizing for a high-concurrency deployment, tune them together:
- listener-level:
--max-connections/server.maxConnections=256bounds raw TCP connections (slow-client back-pressure).- per-session subscribers: the EventBus caps SSE subscribers at 64 per session by default; the 65th client gets a terminal
stream_errorand is closed.- per-subscriber backlog: a 256-frame queue per SSE client; an over-capacity client gets a terminal
client_evictedframe and is closed (one slow consumer can't pin the daemon).The four caps interact:
--max-sessions × 64 subscribers × 256 framesis the worst-case in-flight memory at the EventBus layer. Default sizing assumes single-user / small-team load; raise progressively (and watch RSS) for multi-tenant deployments.
MCP client guardrails (issue #4175 PR 14). A workspace declaring 30 MCP servers in
mcpServerswill start 30 clients with no upstream cap unless you set one.--mcp-client-budget=Ncaps the live MCP client count;--mcp-budget-mode={enforce,warn,off}chooses the behavior. Default iswarnwhen a budget is set (snapshot surfaces the warning but no client is refused — useful for measuring real-world fanout before flipping on enforcement). Refused servers underenforcemode getdisabledReason: 'budget'on their per-server cell, and thebudgets[0]cell showsstatus: 'error'+errorKind: 'budget_exhausted'. Slot reservation is by server name and survives reconnects / discovery timeouts — a refused server can't take a slot from a healthy one.⚠️ v1 scope: per-session, not per-workspace. Each ACP session inside the daemon has its own
Config/McpClientManager(created vianewSessionConfigper session). The budget caps live MCP clients per session, not aggregated across all sessions in the workspace. Snapshot atGET /workspace/mcpreflects the bootstrap session's view (the cell carriesscope: 'session'for honesty). If you run 5 concurrent ACP sessions with--mcp-client-budget=10, you may have up to 50 live MCP clients across the daemon — the cap holds per session. Wave 5 PR 23 (shared MCP pool) introduces a workspace-scoped manager and graduates this to true per-workspace enforcement.shqwen serve --mcp-client-budget=10 --mcp-budget-mode=warn # later, after telemetry shows your real-world distribution: qwen serve --mcp-client-budget=10 --mcp-budget-mode=enforceThis is not the same as claude-code's
MCP_SERVER_CONNECTION_BATCH_SIZE(which gates startup concurrency); they're orthogonal. PR 23 will add a real shared MCP pool (ascope: 'workspace'cell inbudgets[]alongside the per-session cell); PR 14 v1 is the in-process counter + soft enforcement on the existing per-session manager.Push events (issue #4175 PR 14b). SDK clients subscribed to
GET /session/:id/eventsreceive typed frames when budget thresholds cross —mcp_budget_warning(synthetic, fires once per upward 75% crossing with hysteresis re-arm at 37.5%, advertised viamcp_guardrail_events) andmcp_child_refused_batch(coalesced once per discovery pass underenforcemode; length-1 fromreadResourcelazy-spawn refusal). The snapshot atGET /workspace/mcpis still the source-of-truth for state-after-reconnect; events are change-edges. Useful when dashboarding in real-time without polling.
--hostname 0.0.0.0 requires a token — boot refuses without one.LOOPBACK_BINDS includes IPv6 — ::1 and [::1] count as loopback for the no-token rule.Host: matches localhost:port / 127.0.0.1:port / [::1]:port / host.docker.internal:port (case-insensitive per RFC 7230 §5.4) to defend against DNS rebinding. Non-loopback binds (--hostname 0.0.0.0) intentionally bypass the Host allowlist — the operator has chosen the surface area, so the bearer-token gate is the sole authentication layer; reverse proxies / SNI / client cert pinning are the operator's responsibility, not the daemon's. If you need Host-based isolation on a non-loopback bind, terminate TLS + check Host at a front proxy.403 JSON. Implication for browser-served webuis (BUy4e): any packages/webui-style frontend that lives on a separate origin will get 403 at the wire. Stage 1 options for browser-style consumption: (a) package the webui as a native shell (Electron/Tauri) so no Origin header is sent, or (b) front the daemon with a same-origin reverse proxy that strips/rewrites Origin for a known frontend. Stage 1.5 will add --allow-origin <pattern> for opt-in named frontends.qwen --acp child inherits the daemon's environment with one explicit scrub: QWEN_SERVER_TOKEN is removed before the child starts (the daemon's own bearer; the agent doesn't need it). Everything else — OPENAI_API_KEY / ANTHROPIC_API_KEY / QWEN_* / DASHSCOPE_API_KEY / your custom modelProviders[].envKey / etc. — passes through, because the agent legitimately needs those to authenticate to the LLM. This is intentional, not a sandbox. The agent runs as the same UID with shell-tool access, so anything in ~/.bashrc / ~/.aws/credentials / ~/.npmrc is reachable by prompt injection regardless. The env passthrough is not the security boundary; the user-as-trust-root is. Don't run qwen serve under an identity that has env-resident credentials you wouldn't trust the agent with.client_evicted terminal frame and is closed; one stuck consumer can't pin the daemon.⚠️ Stage 1 known gap — permissions are daemon-global, not per-session (BUy4H).
pendingPermissionslives at daemon scope; any client holding the bearer token can vote on anyrequestIdfor any session it can see (and SSEpermission_requestevents carry the requestId in their payload). This is acceptable under the single-user / small-team trust model where every authenticated client is the same human or collaborators they trust. Stage 1.5 will move toPOST /session/:id/permission/:requestId+ session-scoped pending map + per-client identity (must-have #3 from the downstream review); until then, don't runqwen servebehind a bearer shared with untrusted parties.⚠️ Stage 1 known gap —
POST /session/:id/promptbody capped at 10 MB (BUy4L). Multimodal prompts containing images / PDFs / audio that exceed 10 MB will fail at body-parse time before route logic runs (no streaming, no mid-upload abort). Workaround: shrink the content client-side, or pass a path reference and let the agent read the file viareadTextFile. Stage 1.5 will acceptmultipart/form-dataor chunked encoding on/promptso large prompts don't hit a cliff.⚠️ Stage 1 known gap — phantom SSE connections behind NAT. The daemon detects dead clients via TCP back-pressure on heartbeats (15s interval). A client that vanishes WITHOUT a TCP RST (e.g. a NAT box silently dropping idle flows) keeps the kernel-level socket "alive" until Node's keepalive probes time out — typically ~2 hours on Linux defaults. On
--hostname 0.0.0.0deployments behind such NATs, phantom SSE connections can accumulate and eventually hit the 256server.maxConnectionsceiling. Stage 2 will add an application-level idle deadline (last-byte-written tracking + per-connection timeout). Until then, operators on networks that swallow RSTs may want to lowerserver.keepAliveTimeoutvia a reverse proxy or accept periodic daemon restarts.
Per #3803 §02, each qwen serve process binds to one workspace at boot. Within that workspace it multiplexes N sessions onto a single qwen --acp child via the agent's native session map — sessions share the child's process / OAuth state / file-read cache / hierarchy-memory parse.
To host multiple workspaces (one user, several repos; or several users on the same host), run multiple daemon processes — one per workspace, each on its own port, supervised by systemd / docker-compose / k8s / a qwen-coordinator reference orchestrator. The trade-off is intentional: one workspace per child means loadSettings(cwd) / OAuth / MCP server scope stay aligned with the bound directory and don't drift across requests.
Subscribe BEFORE posting
modelServiceIdon attach. When a clientPOST /sessionwith amodelServiceIdand the workspace already has a session running a different model, the daemon issues an internalsetSessionModelcall — failures are NOT propagated as an HTTP error (the session stays operational on its current model). The visible failure signal is amodel_switch_failedevent on the session's SSE stream. If you callPOST /sessionand only THEN openGET /session/:id/events, you'll miss the failure event and silently keep talking to the wrong model. Open the SSE stream first, or passLast-Event-ID: 0on subscribe to replay the ring's oldest available event.
To handle multiple users (each with their own quota, audit log, sandbox) or to scale beyond one process's reach (cold-start budget, FD count, RSS), spawn one daemon per workspace per user behind an external orchestrator. That orchestrator (multi-tenancy / OIDC / Quota / Audit / k8s) is out of scope for the qwen-code project — see issue #3803 "External Reference Architecture" for the design pointers.
The daemon exposes ACP's session/load and session/unstable_resumeSession over HTTP via two routes:
| Route | Use when |
|---|---|
POST /session/:id/load | The client has no history rendered (cold reconnect, picker-then-open). The daemon replays every persisted turn through SSE so subscribers see the full transcript. Capability tag: session_load. |
POST /session/:id/resume | The client already has the turns on screen and only needs the daemon-side handle back. Model context is restored on the agent side without UI replay — the SSE stream stays clean. Capability tag: unstable_session_resume. |
The TypeScript SDK exposes both as static factories on DaemonSessionClient:
import { DaemonClient, DaemonSessionClient } from '@qwen-code/sdk';
const client = new DaemonClient({ baseUrl: 'http://127.0.0.1:4170' });
// Cold reconnect — daemon will replay history through SSE.
const session = await DaemonSessionClient.load(client, 'persisted-id');
// Or, if your UI already has the history, skip the replay:
// const session = await DaemonSessionClient.resume(client, 'persisted-id');
for await (const event of session.events()) {
// First the replayed `session_update` frames (load only),
// then live events.
}
Pre-flight caps.features.session_load / caps.features.unstable_session_resume before calling — older daemons return 404. Concurrent same-action requests for the same id coalesce; cross-action races (a load racing a resume) get 409 restore_in_progress with Retry-After: 5. See the protocol reference for the full error envelope.
Note: history replay is bounded by the SSE ring (default 4000 frames). Long histories with chatty turns can exceed that — earliest frames are dropped silently. For very long sessions, prefer resume and rely on the client's local persisted UI.
Sessions are still ephemeral in Stage 1 across daemon restarts, but persisted sessions on disk can be reloaded:
session_died and removes the live session from the daemon's maps. The persisted on-disk session can be reloaded via POST /session/:id/load if a fresh agent child is spawnable.Last-Event-ID reconnect succeeds but state may be incoherent. For mobile / flaky-network clients, plan to re-open SSE on long drops or call POST /session/:id/load to replay from disk.writeTextFile) are atomic across crashes (write-then-rename); they aren't atomic across daemon restarts in the sense of replaying — the file write either landed or it didn't.If your integration needs server-side cross-restart durability beyond what session/load covers (e.g. server-managed retry queues), you still need application-level state recovery. Don't hold long-running, restart-sensitive state inside the daemon's session.
Stage 1's contract is sized for prototyping. Per #3889 chiga0 downstream-consumer review, the following are not in Stage 1 — production-grade integrations need Stage 1.5+ before relying on them:
Blockers for serious downstream use:
loadSession / unstable_resumeSession over HTTP — without this, no integration can survive a child crash or daemon restart, and any orchestrator coordinating the daemon can't recover state either.originatorClientId is client-self-declared rather than daemon-stamped from authenticated identity.Reliability baseline:
POST /session/:id/heartbeat records last-seen timestamps on the daemon (capability tag client_heartbeat); SDK helpers are DaemonClient.heartbeat() / DaemonSessionClient.heartbeat().permission_already_resolved event when a vote loses the first-responder race — currently UIs have to infer state from a 404.slow_client_warning event before client_evicted — soft backpressure so well-behaved slow clients can self-throttle (trim render depth, drop chunks) before being terminated.Integration ergonomics:
POST /session/:id/_meta for IM-style context — per-session key-value attached to subsequent prompts (chat id, sender, thread id) replaces the per-channel improvisation./capabilities actual feature negotiation — protocol_versions: { acp: '0.14.x', daemon_envelope: 1 } so clients can detect drift instead of falling through to "unknown frame, ignore".The full convergence roadmap is tracked on #3803.
Two structural choices are explicit non-goals for the Stage 1 / 1.5 / 2 main-line roadmap. If your use case depends on either, plan around them rather than waiting for us.
The Stage 1.5 plan describes TUI as an in-process EventBus subscriber. In practice TUI UI is strictly larger than the wire protocol:
ModelDialog, MemoryDialog, PermissionsDialog, SessionPicker, WelcomeBackDialog, FolderTrustDialog, …) and the local-jsx slash commands (/ide, /auth, /init, /resume, /rename, /delete, /language, /arena, …) render terminal-specific Ink JSX. Remote clients on HTTP/SSE can't equivalently render Ink, and these flows emit no wire event./approval-mode, /memory add, /mcp add-server, /agents, /tools enable/disable, /auth, /init (writing CLAUDE.md) all change agent behavior, but only /model currently publishes an event (model_switched).Stage 1 choice — option (A) from the review: don't promote these mutations to wire events. The two deployment modes have different consequences.
qwen serve (this PR)No TUI shell runs inside the daemon. The slash commands listed above don't exist in this mode — there's no terminal UI to issue them from. Session state is therefore:
approval-mode / memory / mcp servers / agents / tools allowlist / auth — all loaded from settings + disk when the daemon's qwen --acp child starts; immutable for the session's lifetime.POST /session/:id/model (publishes model_switched). Permission votes (POST /permission/:requestId) are per-request, not per-session-state.Consequence: remote clients in headless mode see the full session state. No TUI hides additional state; no drift is possible. If you want to change approval-mode or add an MCP server, restart the daemon with new settings — the daemon doesn't expose runtime mutation for those today.
qwen --serve co-hosted TUI (not in this PR)When Stage 1.5 lands qwen --serve (TUI process co-hosts the same HTTP server), the TUI does exist alongside remote clients. A local operator typing /approval-mode yolo or /mcp add-server mutates session state, and remote clients on HTTP have no event to observe the change.
In this mode, TUI is a "super-client" — it observes the same agent conversation remote clients see, AND can mutate session state remote clients can't. The asymmetry is:
Consequence in Mode 2: if a remote-client UI tries to mirror session settings, it can drift after any TUI slash command. Remote clients should re-fetch state on attach / reconnect (use Last-Event-ID: 0 to replay the ring's oldest event for things like model_switched); they should NOT rely on incremental events for TUI-side mutations.
session_state_changed event family)(B) is the more ambitious answer but locks Stage 1.5 into a substantially larger wire surface that must also pass cleanly through the planned in-process refactor. We'd rather walk the smaller scope honestly. The session-state-event taxonomy work — enumerating which TUI flows are local-only by design vs. could plausibly graduate to wire under a future opt-in (B)-flavor extension — moves to #3803, not Stage 1.5 code.
qwen --acp childMultiple sessions on the same workspace share one qwen --acp child process via the agent's native multi-session support (packages/cli/src/acp-integration/acpAgent.ts:194: private sessions: Map<string, Session>). The bridge calls connection.newSession({cwd, mcpServers}) for each session — the agent stores them in its sessions map and demultiplexes per-call sessionId.
Concrete cost at N=5 sessions on the same workspace:
| Resource | Per session | At N=5 |
|---|---|---|
| Daemon Node process | one | 30–50 MB (one daemon) |
qwen --acp child | shared | 60–100 MB (one child) |
| MCP server children | per-session | 3×N if configs differ |
FileReadCache (in-child heap) | shared | parsed once |
CLAUDE.md / hierarchy memory parse | shared | parsed once |
| OAuth refresh-token state | shared | one refresh path |
| Auto-memory learned facts | shared | one knowledge base per child |
| Cold start | first only | <200 ms after first session |
The bridge keeps one channel per daemon (one daemon per workspace, per §02). The channel stays alive while at least one session is live; the last killSession (or a channel-level crash) kills the child.
MCP server children are still per-session today — each session's config can specify different servers, so they're independently spawned. Stage 1.5 follow-up: refcount MCP server children by (workspace, config-hash) so identical configs share. Not in scope for this PR.
Peer agents (Cursor / Continue / Claude Code / OpenCode / Gemini CLI) all do single-process multi-session. qwen-code matches them at the agent layer; the Stage 1 bridge in this PR makes the same architecture visible over HTTP.
When the daemon runs on a remote pod (no shared display with you), you can still log in to a Qwen account by triggering an OAuth device flow over HTTP. The daemon polls the IdP itself; your job is just to open a URL on whatever device has a browser.
# 1. Start a flow. The daemon contacts the IdP, returns a code + URL.
curl -X POST http://127.0.0.1:4170/workspace/auth/device-flow \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"providerId":"qwen-oauth"}'
# → 201 {
# "deviceFlowId": "fa07c61b-…",
# "userCode": "USER-1",
# "verificationUri": "https://chat.qwen.ai/api/v1/oauth2/device",
# "verificationUriComplete": "https://chat.qwen.ai/...?user_code=USER-1",
# "expiresAt": 1700000600000,
# "intervalMs": 5000,
# "attached": false
# }
# 2. Visit the URL on your phone / laptop, enter the user code.
# 3. Poll for completion (or subscribe to SSE for the auth_device_flow_authorized event):
curl http://127.0.0.1:4170/workspace/auth/device-flow/fa07c61b-… \
-H "Authorization: Bearer $TOKEN"
# → status transitions: pending → authorized
The TypeScript SDK wraps both steps into a single helper:
import { DaemonClient } from '@qwen-code/sdk';
const client = new DaemonClient({ baseUrl, token });
const flow = await client.auth.start({ providerId: 'qwen-oauth' });
console.log(`Open ${flow.verificationUri}\nCode: ${flow.userCode}`);
const result = await flow.awaitCompletion({ signal: abortCtrl.signal });
// result.status === 'authorized'
The daemon never opens a browser on your behalf. Even when running locally, the daemon stays passive — it returns the URL and lets the SDK / user choose where to open it. This is intentional: a daemon on a headless pod that called xdg-open would silently fail, masking the actual auth surface. Mirror gh auth login's "Press Enter to open browser" UX in your client.
--require-auth and dev convenience. The device-flow routes use the strict mutation gate (PR 15), which means a token-less loopback default returns 401 token_required. Locally, the simplest way around this during development is qwen serve --token=dev-token; you don't need --require-auth unless you're hardening the loopback default.
Cross-daemon limitation. oauth_creds.json is daemon-shared (~/.qwen/oauth_creds.json), so a successful login in daemon A is automatically picked up by daemon B's next token refresh — but daemon B's SDK clients won't receive the auth_device_flow_authorized event (events are per-daemon).
Cross-client take-over. Two SDK clients on the same daemon that both POST /workspace/auth/device-flow for the same provider get the per-provider singleton: the first call starts a fresh IdP request and returns attached: false; the second call returns the EXISTING in-flight entry with attached: true. The take-over is recorded on the audit trail (under the second client's X-Qwen-Client-Id) but does NOT emit a separate event — both clients eventually observe the SAME auth_device_flow_authorized once the user finishes the IdP page. If your UI distinguishes "I started this" from "someone else's flow I joined", branch on the attached field returned by start().
packages/cli/src/serve/; SDK client at packages/sdk-typescript/src/daemon/.