docs/internal/K8S_WORKSPACE_UX_DESIGN.md
Status: design / product framing. This is the top of the stack. It defines what the cloud-editing feature is for a user and the journeys it must support. The two docs below it are implementation detail:
K8S_AUTHORITY_DESIGN.md — how the
editor attaches (SSH remote-agent stack over a kubectl exec
transport). Mechanics.K8S_WORKSPACE_PLUGIN_DESIGN.md —
how pods come into being (the Provider contract; Terraform/manifest
escape hatches; storage = EBS-live + S3-sync). Plumbing.Storage, transport, and provisioning are settled enough. This doc is about the experience, because that is what determines whether anyone uses the feature.
Open Fresh, pick a workspace, and you're editing in your own cloud — it feels local, it survives interruptions, and it never bills you by surprise.
Everything below serves that sentence.
The single most important UX decision. The user must think in terms of a durable, named Workspace they own — not a Kubernetes pod, not a "connection." A Workspace is:
acme-api, my-scratch. Its identity is its
storage + its environment definition, not whatever pod is currently
running it.not-provisioned · starting ·
running · connected · stopped (suspended, storage kept, compute
released) · error.This abstraction is what lets reconnect-after-reschedule feel like nothing happened, and what lets "stop to save money, resume tomorrow" make sense. Everything the user does is a verb on a Workspace: connect, disconnect, stop, resume, rebuild, resize, destroy.
not-provisioned ──connect──► starting ──► running ──attach──► connected
▲ │ ▲ │ │
│ (fail) │ │ idle/explicit │ disconnect
destroy ▼ │ ▼ ▼
└────────────────────── error stopped ◄────────── running
│ (compute released, storage kept)
resume│
└──► starting …
This is the structural keystone. Workspace management is not a new
subsystem bolted onto Fresh; it is the Orchestrator
(orchestrator-sessions-design.md)
with one additive facet. The fit is near-exact:
Session already bundles "everything rooted at one project root"
— its own file explorer, LSP, quick-open scope, buffers, splits,
terminals — and switching sessions retargets the entire editor state
atomically while the Orchestrator stays anchored above the swap.Session whose Authority is the EKS
remote-agent authority instead of local. The durable "Workspace"
identity (§mental model) is the durable "Session" identity. Connect =
activate that session; its session-swap is the destructive authority
transition (same machinery — drop & rebuild editor state around the new
backend). The two concepts were the same thing all along.The hard requirement: great for cloud users, zero cost for local-only users. The mechanism is that the Orchestrator stays backend-opaque (Authority principle 3 — core never names "eks"):
starting/running/stopped/error), an optional cost/idle hint, and a
set of lifecycle actions (stop/resume/rebuild/resize). The
Orchestrator renders whatever a session provides and nothing when a
session provides nothing.So the blast radius on the non-cloud experience is: one optional, empty-by-default field on a session row. That is the whole answer to "without hurting people who don't use pods."
Reassessed against the live Orchestrator dock / Open-modal
(orchestrator-pr-pill-wireframes.md, ORCHESTRATOR_DOCK_NNG_*), the
"remote facet" is not speculative — it slots into patterns that already
ship:
<sym> [ ] <name> <· project / · on-disk>, where <sym> is *
(working) / ✓ (idle) / blank (on-disk), rendered by
renderListItem() in orchestrator.ts. The cloud state
(starting/running/stopped/error) is another glyph in the same
symbol slot, not new chrome.· on-disk) from a
live session (✓/*). A stopped cloud workspace (exists, not
connected) is the same idea — provisioned identity, no live backend —
and reuses that visual language.[ Visit ] [ Details ] [ Stop ] [ Archive ] [ Delete ]. Cloud verbs layer here:
Resume/Rebuild/Resize join the row, and Stop is reconciled
— for a cloud session it means suspend the pod (D2/D4), a superset of
today's "stop session activity." Name the overlap deliberately so it
isn't two different "Stop"s.$/hr + idle timer reuse
the same pill mechanism and the same opportunistic-gather plumbing
— not a parallel renderer.list widget
renders exactly one terminal row per item; the PR-pill design already
had to navigate this. The remote facet must ride whatever multi-line
pill mechanism that work establishes rather than fighting the renderer.Upshot of the reassessment: the dock is converging on rich, multi-line, opportunistically-populated session pills anyway (for PR/CI status). The cloud remote facet is one more opportunistic data source feeding the same pill — which makes "management lives in the Orchestrator" (D3) cheaper and more natural than when it was first written.
The Orchestrator exists to run parallel AI agents, each in its own worktree. Cloud Workspaces compose with that directly: run those agents in cloud pods (ephemeral, scoped creds, scale-to-zero) and the Control Room's existing live-preview / parsed-state / collision-radar machinery covers them with no new UI. Cloud + agents is a multiplier, not a parallel track.
command/Terraform escape hatch for platform teams.The make-or-break flow. Branch by starting point, easiest first:
attach-existing: pick context →
namespace → pod → connected. No provisioning, no config.create on pods/exec, not-on-Fargate) with fix-it messages.Pick a Workspace → Fresh reconciles to connected:
running → attach immediately.stopped → resume (start compute, re-mount storage), then attach.not-provisioned → confirm spend → provision → attach.
Progress streams to a log view; status bar shows the phase.Editing, integrated terminal in the pod, LSP in the pod, run/build/test, git, file explorer — all routed through the authority. Plus the things that break the illusion if we ignore them:
kubectl port-forward), surfaced as a first-class
"Forward a port" action, auto-detecting listening ports.Three distinct, clearly-labeled exits — conflating them is a classic footgun:
A Workspaces panel lists all of them across clusters with state + rough cost + idle timer. Switch = disconnect current + connect chosen (authority is modal: one window, one workspace). Open a second workspace = new window.
"What do I have running?" view; bulk stop/destroy; orphan detection ("a pod from this workspace has been running 3 days — stop it?").
Platform eng commits a template/provider config to the repo (or org config). A teammate opens the repo → Fresh detects it (devcontainer- style) → "This project defines a cloud workspace. Connect?" → remembered per project. Zero ceremony for the consumer.
Creds expired → re-auth. Quota hit → message + which quota. Image pull fail → show the pull error. Provision timeout → keep logs, offer retry. Pod evicted-and-gone → "ended; Rebuild?". Unschedulable (AZ/Spot) → "no capacity; try On-Demand / another AZ?".
An AI agent or CI spins up an ephemeral workspace, works, tears down. The
same verbs exposed programmatically (the plugin's Provider + the CLI
form). Not a v1 UI, but the primitives shouldn't preclude it.
Each row is a decision; the bold option is my recommendation.
| # | Decision | Options & trade-offs |
|---|---|---|
| 1 | Unit of interaction | Workspace (durable, hides pods; more to build but the only model that makes resume/reconnect/cost coherent) · Pod (k8s-native, leaky, confusing churn) · Session (transient, loses "my durable thing"). |
| 2 | Lifecycle ownership | Pure-attach (Fresh only connects; user runs Terraform themselves — minimal, but "DIY then attach" is a poor daily UX) · Hybrid (Fresh tracks state & drives verbs, the Provider executes) · Full-manage [DECIDED] — Fresh owns an opinionated, zero-config provisioning engine end to end; the command/Terraform provider is the deliberate escape hatch, not the default. |
| 3 | Cold-start strategy | Provision-on-connect (cheapest, slowest — minutes) · Stop/resume as headline (keep volume, release compute — cheap and ~fast; the VDI-style model teams expect) · Warm pool (instant, idle cost — offer via provider for teams who want it). |
| 4 | Primary surface | Command palette only (discoverable-ish, no overview) · A "Remote/Workspaces" panel as home base + palette commands + a fresh eks://…-style CLI form mirroring fresh user@host:path · Status-bar menu only (too small for management). |
| 5 | How much k8s/AWS is shown | Hide everything (magical until it breaks, then opaque) · Show the plumbing (powerful, intimidating) · Progressive disclosure (workspace verbs up front; "Show details / logs / pod" one click away). |
| 6 | Provisioning config | Repo .fresh/k8s.json only · User-global only · Layered: zero-config attach → repo config (shareable) → user-global, and reuse devcontainer.json where present (don't reinvent environment definition). |
| 7 | Connections per window | Multi-root in one session (breaks the modal Authority principle, huge complexity) · One session = one workspace/authority; the Orchestrator holds many sessions and the active one is connected — switching sessions retargets the authority atomically (existing session-swap machinery). |
| 8 | Idle / cost default | Off (simplest, surprise bills) · Conservative long timeout · On by default, sane timeout, visible countdown, one-click "keep awake" (protective without being patronizing). |
| 9 | Failure stance | Always-ask (safe, naggy) · Auto-everything (smooth, scary for destructive ops) · Auto-recover the transient (reconnect, re-resolve pod), always-ask the destructive (rebuild/destroy/resize). |
| 10 | Trust & spend prompts | Per-connect (naggy) · Off (unsafe) · Trust a cluster once (remembered), confirm spend once per workspace (matches WorkspaceTrust + devcontainer's remembered-decision pattern). |
| 11 | Persistent vs. ephemeral workspaces | Force one model · Make it a per-workspace policy: persistent volume + stop/resume = "VDI-style" long-lived; destroy-on-disconnect = throwaway-per-branch — same primitives, a config flag. |
EKS: Connect, Stop, Resume,
Rebuild, Forward Port, Disconnect, Destroy, Show Workspaces).● acme-api · running · ~$0.40/hr · idle 12m;
click → panel. Color = state. Mirrors today's SSH/devcontainer status.fresh eks://context/namespace/workspace (and bare fresh
picking up a repo's .fresh/k8s.json), paralleling fresh user@host:path.Three forks are now settled. They define v1's scope and shape.
Fresh owns the whole lifecycle end to end with a batteries-included
default: a solo developer with nothing but an AWS account + a cluster
gets a working workspace with zero config. Fresh's built-in
provisioning knows how to create the EBS-live/S3-sync pod, apply
Karpenter/Spot-friendly + Pod-Identity-scoped specs, and run the full
stop / resume / rebuild / resize / destroy / idle-stop state machine.
This is not a reversal of "bring-your-own-flow." The
command/Terraform Provider remains — it is now the deliberate
override for platform teams (point at your Terraform repo and Fresh
drives it), not the thing every user must configure first. Default path:
Fresh just does it. Power path: hand Fresh your flow.
Consequence — Fresh ships an opinionated provisioning engine, not just
provider plumbing. The plugin doc's "built-in providers" become
real, Fresh-owned default templates + lifecycle logic. See
K8S_WORKSPACE_PLUGIN_DESIGN.md.
Consequence — state is authoritative but can still drift. Because
Fresh provisions, it knows the intended state; but a user kubectl delete or an out-of-band Spot reclaim can diverge it. v1 must
reconcile on connect (query real pod/volume state before acting) and
expose a "refresh state" action, so Fresh never bills against a phantom
stopped or attaches to a phantom running.
Keep the data volume, release compute on stop/idle, resume fast. This is the "VDI-style" model and the cheap-but-quick default. Persistent vs. ephemeral becomes a per-workspace policy flag (row 11): persistent = stop/resume; ephemeral = destroy-on-disconnect. Build the suspend/resume machinery in v1.
A direct consequence of D4. Warm background sessions each hold a live
backend, so the Authority (its filesystem, spawners, keepalive) must be
owned per Session/Window, with exactly one active — not one per
process as today. The active session's authority is still the sole
router; background ones are dormant-but-connected. Switching sessions
activates an authority instead of restarting the process, and
install_authority retargets the active session, not the whole Editor.
Reassessed against the current tree: the per-window authority field
already exists (WindowResources.authority / Window::authority()),
so this isn't a from-scratch move. What's actually missing is (a) live
multi-session (the active session is still pinned to WindowId(1)), (b) a
per-window keepalive so a background window keeps its live backend, and
(c) replacing the destructive install_authority restart with per-window
activation. It also pulls WorkspaceTrust, EnvProvider, and the
daemon's single session_keepalive/startup_authority slots toward
per-session ownership. Full write-up: AUTHORITY_DESIGN.md
§"Evolution: per-session authority".
When a cloud session is not the active one in the Orchestrator, its
kubectl exec channel stays connected (the RemoteKeepalive bundle
— agent, reconnect task, runtime — is held per session). Switching back
is instant; no reconnect, no resume.
Consequences:
exec stream
gets dropped by ELB/NAT timeouts and "instant switch-back" becomes
"switch back, then wait for a reconnect." The heartbeat is what makes
warm actually warm.running; the workspace-lifecycle
idle timer (D2) still measures per-session inactivity and can stop
the pod (closing its channel). A backgrounded session that goes idle
long enough suspends; switching back to it then resumes. Warm keeps
switching snappy; idle-stop keeps the bill honest. They don't fight —
warm is about the live connection, idle-stop is about the pod.kubectl
subprocess + an in-pod agent + a tokio task. Fine for a handful; with
many sessions it accumulates. Keep all warm by default (the decision),
but cap the warm set at a configurable max (suspend the
least-recently-active beyond it) as a safety valve.The original D3 ("ship a Workspaces panel") is superseded: a cloud
workspace is a Session, and the
Orchestrator's session list / Control Room is the management surface.
There is no second panel. See §"Orchestrator integration" — this is the
load-bearing structural decision, so it gets its own section.
The home-base requirements still hold (list, state dots, rough cost, idle countdown, verbs, progressive "show logs/pod/details") — they are met by extending the existing Orchestrator session row with an optional remote facet, not by building new chrome.
~$/hr needs an
instance→price source; can Fresh estimate it itself from the node type,
or does it require a provider hook? Decide before promising a number.