docs/internal/PER_SESSION_BACKENDS_DESIGN.md
Status: design target, partially landed. Realizes the
AUTHORITY_DESIGN.md§"Evolution: per-session authority" direction and closes the gaps tracked in issue #2280. Keep this short; deep mechanics belong inAUTHORITY_DESIGN.md/K8S_AUTHORITY_DESIGN.md.
A session (project / window) should fully own where it runs (local /
container / SSH / Kubernetes), whether it's trusted, and which dev
environment it has activated. Today the live Authority, WorkspaceTrust,
and EnvProvider are effectively process-wide: one is fanned across every
window at boot/restart. Visible consequences (issue #2280): remote sessions
come back local after a restart/relaunch, and trusting/activating one
project bleeds into others.
Already landed: an installed backend no longer leaks onto other windows
when you switch (each window owns resources.authority; background windows
are built local instead of inheriting the active backend). The rest of this
doc is the remaining design.
Give every session a small, declarative SessionProfile — the data
needed to rebuild its world — alongside its live handles:
SessionProfile {
backend: BackendSpec, // Local | Plugin(AuthorityPayload) | RemoteAgent(RemoteAgentSpec)
trust: TrustDecision, // this session's level (+ key into a shared registry)
env: EnvSpec, // activated venv/direnv/mise recipe, or none
}
BackendSpec reuses the existing AuthorityPayload / RemoteAgentSpec
verbatim, so there is no new backend vocabulary and core stays
backend-opaque (AUTHORITY_DESIGN.md principle 3). The profile is set
wherever a backend/trust/env is installed and is the source of truth for
restoration; the live Authority is derived from it.
Bringing a session's backend back is not enough for an agent session
(claude, aider, …): its seed terminal ran a process that is gone, and
re-opening a bare shell loses the agent. This is the what to re-run half of
restore, and it is already shipped — see
agent-resume-design.md. In brief, each terminal
persists two argvs in its workspace entry:
command — the launch argv (the agent / shell the PTY was spawned
with), andagent_resume.argv — how to rejoin the conversation, distinct from
launch and provided by the Orchestrator's agent registry (e.g. launch
claude --session-id <uuid>, resume claude --resume <uuid>).On restore, restore_terminal_from_workspace runs agent_resume → command → shell, gated by terminal.resume_agents. Detection and the per-agent flags
live in TS data (the registry), not core — the same mechanism/policy split
this design uses for backends.
So a session's restore data has two complementary halves: the
SessionProfile (this doc — where the session runs) and the per-terminal
agent-resume (what to re-run in it). They are persisted independently in the
same workspace file and compose at restore. Two constraints make that
composition correct — and both are new work this design owns, because the
agent-resume feature landed assuming a local backend:
The agent runs inside the session's backend, not beside it. Today a
command terminal builds a TerminalWrapper that replaces the
authority's wrapper, so an agent argv runs on the host even under a
container / SSH / k8s authority. The fix is to make the authority own
"run this interactive argv in my backend" —
Authority::terminal_command(argv) returning docker exec … <argv> /
kubectl exec … <argv> / ssh … -- <argv> (bare argv for local). Launch
and resume argvs go through it, so they compose with the backend
instead of bypassing it. (This also fixes a born-attached remote agent's
seed terminal, which has the same bypass today.)
Status: shipped for local + containers (
docker exec -it [-u][-w][-e env] <id> <argv>), the argv staying intact (no shell string). SSH and Kubernetes are stubbed (TODO): the agent must run in the workspace dir, anddocker execexpresses that argv-pure via-w <dir>, butkubectl execandsshhave no cwd flag — doing it there needs a shell hop (sh -lc 'cd <dir> && exec <argv>'), which trades away the argv-slot purity (acceptable for a UUID, but a real decision). Until that lands, ssh/kube command terminals fall back to running on the host (today's behaviour), no regression.
Backend first, then agent. A restored remote session is Dormant
(local placeholder) until reconnect, so its agent must not re-run on
the placeholder. Restore order is: local sessions re-run the agent
immediately (unchanged); remote sessions defer the agent re-run to the
reconnect-on-activate step and then run it via
authority.terminal_command(resume_argv) in the now-live backend.
Each session's authority is in one of two states:
RemoteIndicatorState::Disconnected/Connecting facet. Reads /
terminals that require the real backend are gated until it activates.Only one authority is the active router at a time (principle 2 intact); background live sessions keep their connection warm via their own
session_keepalive, exactly as today.
The profile round-trips through the per-dir workspace file (the session
registry — there is no central windows.json for sessions anymore). Saved
on the same paths that already persist a session (save_all_windows_workspaces,
pre-restart, pre-quit); read back by session discovery at construction. A
missing profile reads as Local (back-compat).
Construction (cold launch and the install_authority restart, which
both rebuild from disk) builds each session's authority from its profile:
Local → local authority.Connecting → Connected / FailedAttach); background sessions stay
dormant until used.This replaces today's "fan one authority onto every window."
Activating a dormant remote session reconnects that session only — the
per-window activation AUTHORITY_DESIGN.md calls for:
connect_ssh_authority / connect_kube_authority
(async, via the existing RemoteAttachReady bridge), then
set_session_authority(id, authority) and park the keepalive in
session_keepalives[id].devcontainer up; fire a
session_reattach_requested { window_id, profile } hook so the
devcontainer plugin re-attaches. Core stays opaque.Once the backend is live again, the session's agent terminals re-run in
it — each terminal's agent_resume → command argv is run through
authority.terminal_command(argv) (the composition seam above), so the agent
rejoins inside its real backend rather than on the host. This is the
"backend first, then agent" order from Restoring agent terminals.
Reconnect is trust-gated (below). A dead container/pod surfaces
FailedAttach, not a crash.
WorkspaceTrust and EnvProvider move from one shared handle to one per
session, each window's authority carrying its own:
WorkspaceTrust scoped to its root
(WorkspaceTrust::for_session), so a trust decision in one project never
changes the live level another open session's spawns are gated against. The
"remember this folder" registry is the existing per-project on-disk
TrustStore (each session adopts its project's recorded level on
construction). Every per-session-authority construction point —
local_session_authority, create_window_at, the set_boot_authority
background fan-out, and the constructor's restored shells — builds a trust
for that window's root; the active session's trust is what drives the pill
and gating. (Auto-prompting on switch is deliberately not wired: the dock
live-switches windows on ↑/↓, so a modal there would spam — the boot prompt
EnvProvider (fresh + inactive
per session), so activating a venv/direnv/mise in one project never
activates it for another open session.Authority is not Clone and is owned by exactly one Window — there
is no editor-wide copy (Editor::authority() returns
active_window().authority). Trust + env are minted as a move-only
SessionScope (SessionScope::for_root) and consumed into the authority.
So a session's backend/trust/env cannot be shared into another window: the
type system rejects it at construction, rather than a runtime assertion
catching it after the fact. Authorities are moved between slots (restore,
set_session_authority, set_boot_authority), never copied.
Switching sessions therefore never changes another session's backend, trust, or env.
Authority is always derivable from the SessionProfile; the
profile, not the live handle, is what persists.AuthorityPayload / RemoteAgentSpec now double as a
persistence format and must stay serde-stable.K8S_WORKSPACE_UX_DESIGN.md). Cost: a switch into a cold remote
session has connect latency (shown via the spinner).devcontainer up), so
core hands off via a hook. Cost: a small plugin contract.SessionProfile.backend + per-window field + workspace-file persistence;
spec-driven Dormant restore (no reconnect yet). Fixes "comes back
local" → "comes back disconnected, profile retained."WorkspaceTrust + trusted-host registry; trust-gate reconnect.EnvProvider.