crates/protocol/VERSIONING.md
This explains how microsandbox keeps the program inside a sandbox and the program on your machine talking to each other, even when they were built at different times. You don't need to know the codebase to read this. The precise, implementer-facing details are in the last section.
Two programs talk over this protocol:
agentd) runs inside the sandbox. It is built into the sandbox when the
sandbox is created, and it never changes after that. A sandbox that has been running for a
week is still using the runtime from a week ago.So the awkward situation is normal: a freshly upgraded host talks to a sandbox whose runtime is old and frozen. We can't go back and upgrade the runtime inside a running sandbox. The protocol has to cope with that gap.
Our goal: the old runtime never has to understand anything new. The newer host does all the adapting. The connection keeps working, and if the host wants to use a brand-new feature the old runtime simply doesn't have, only that one feature fails, with a clear message. The rest of the session carries on.
Think of two people on a phone call who each speak some version of a language. At the very start of the call they establish the highest version they both know, and they speak that for the rest of the call. If one of them knows a newer word the other doesn't, they just don't use it.
That is exactly how this works:
sequenceDiagram
participant H as Host (just upgraded)
participant S as Sandbox (old runtime)
H->>S: connect
S-->>H: handshake: I speak up to version 3
Note over H,S: both use version 3 (the lower of the two)
H->>S: run command (a version-1 feature) ✓
H-->>H: TCP forward needs version 4 → fail locally, never sent
The single version number is called the generation. It's just a counter that goes up by one every time we add something to the protocol. The handshake agrees on one generation per connection, and everything else is decided by that one number.
This is worth stating plainly because the code has two things that look like separate version numbers but aren't:
One number, agreed once. Hold onto that.
There are three kinds of change, from easiest to hardest. Almost everything we ever do is the first kind.
flowchart TD
A[I'm changing the protocol] --> B{What kind of change?}
B -->|Add info to an existing message| C[Make the new field optional]
C --> C1[No version bump.
Old peers ignore it, new peers default it.]
B -->|Add a brand-new message type| D[Bump the generation,
label the type with it]
D --> D1[Host checks the agreed version before sending.
Too old? That one feature errors, cleanly.]
B -->|Change the on-wire format itself| E[Fork the codec at the handshake]
E --> E1[Newer side keeps the old format's code.
Rare and costly, avoid if possible.]
Say a "run this command" message gains an optional timeout field.
The rule: new fields are always optional. Because of that:
timeout to an old runtime. The old runtime has never heard of timeout,
so it simply ignores it. No crash.timeout to a new host. The new host notices it's
missing and fills in a sensible default. No crash.So adding fields needs no version bump at all. It just works in both directions.
One thing to decide each time: if the old runtime ignores your new field, the feature quietly does
nothing on old sandboxes (an old runtime that ignores timeout just runs the command with no
timeout). If that silent "nothing happens" is fine, you're done. If you'd rather the user get a
loud "your sandbox is too old for this," treat it as the next kind of change instead.
Say we add TCP port forwarding, which needs a new "open a TCP connection" message that older runtimes have never seen.
Here the host checks the agreed generation before sending:
This is the key to "old runtime understands nothing new": the host never sends an old runtime something it can't handle. It knows the runtime's version from the handshake, so it decides up front.
The first two kinds change what's inside messages. This kind changes the shape of the container every message travels in: how messages are framed and laid out on the wire.
This is the hard one, because a program has to understand the container's shape before it can read anything inside it. You can't put "which shape is this?" inside the message, because you'd have to already know the shape to find it. The only place to sort this out is the handshake, before any normal message flows. So:
Because this is costly, we avoid it. We keep the container as small and stable as possible and put almost all change into kinds 1 and 2, which never touch it.
The pattern across all three kinds is the same: the newer side always adapts to the older side. The old runtime is frozen, so it can't do anything else; all the work lives in the newer host.
The only time a connection is refused outright is when the sandbox is so old that the host no longer carries the code to speak its format at all (because we dropped support for versions that old). That's a clean, upfront error telling you to recreate the sandbox, not a confusing failure mid-session.
These are the promises that make all of the above hold together:
A fair worry about "just add optional fields forever" is that it gets hard to see what a message used to look like at, say, version 4. We solve that with a generated, checked-in file per version (a schema snapshot) that lists every message's exact shape at that version. It's produced automatically from the code and verified by a test, so it can never drift out of date, and when someone bumps the version the change shows up as a simple diff in review. You get a precise per-version reference without complicating the actual code.
A tempting alternative is to make the message type itself carry the version (one Rust type per version). We don't, for a few reasons: it would force an old runtime to fail on the entire message rather than gracefully ignore the new parts; it would spread version-handling into every corner of the code; and it can't even express the hardest change (the format/container one), because that lives at a lower level than the message types. The one place a per-version type genuinely helps is a rare breaking change to a single message (a field's meaning changes, not just a new field added). There, and only there, we introduce a distinct type for the old and new shapes plus a small converter. We apply that narrowly, to the one message that broke, never to the whole protocol up front.
The wire layout, in two nested layers:
[ length ][ id ][ flags ] <- fixed binary header, read first, never changes shape
CBOR { v, t, p } <- the envelope: version, message type, payload bytes
p = CBOR { ...fields } <- the payload for that message type (ExecRequest, etc.)
v is the generation, echoed onto each message. Same number negotiated at the handshake;
not a per-message version. Don't gate behavior by reading it per message.MessageType::min_protocol_version() (lib/message.rs) is the per-type label: the
generation that introduced the type. It has no wildcard arm, so adding a MessageType won't
compile until you assign its generation (and bump PROTOCOL_VERSION to match). Core and exec
types are generation 1 (the pre-0.5 legacy runtime handles them); the Fs* types are generation
2, because filesystem streaming did not exist in the legacy protocol.crates/microsandbox/lib/agent/client.rs). At
handshake the client computes negotiated_version = min(our PROTOCOL_VERSION, the generation the sandbox echoed in its ready frame). Every typed send checks min_protocol_version() against it
and rejects too-old sandboxes with AgentClientError::UnsupportedOperation. The error's message
advises restarting the sandbox, which re-provisions agentd at the current version (agentd is a
host build artifact, not baked into the sandbox image). The name is direction-neutral so the same
error can later cover the reverse skew (a newer runtime feature an older SDK can't use). Callers
that can't gate by sending (the SSH/SFTP layer, the filesystem fail-fast) consult
AgentClient::supports(MessageType) or AgentClient::ensure_version_compat(MessageType), the single predicate
over the same mechanism, instead of inspecting the protocol generation directly.MessageType::is_available_at(peer_generation). The guest
can gate a guest-initiated message the same way, because it already receives each peer's generation
on every message (the v field). The send-site enforcement on the guest lands with the first
feature that needs it — reverse port forwarding, where the guest opens a channel to the host — since
no guest-initiated message type is above generation 1 yet.AgentProtocol (Current / LegacyV1) selects the wire codec (the container
format). negotiated_version drives the capability gate. These are the two consumers of the
one generation number.[length][id][flags] is immutable. The relay routes on id/flags
without parsing the CBOR, and it bridges a host and guest that may be different generations, so
changing the header would force the relay to translate. Keep all change inside the CBOR body.flags is a 1-byte field (3 of 8 bits used) carrying lifecycle hints the relay needs without
decoding CBOR. New bits are append-only and must be safe for an old relay to ignore. Anything
that isn't safe to ignore belongs behind the capability gate, not a flag bit.AgentProtocol generation and
carrying the old one until a support horizon (see TODO(upgrade-0.6)), keeping the binary header
fixed so the relay stays format-agnostic.crates/protocol/tests/schema_snapshot.rs): generate the current
protocol surface (the PROTOCOL_VERSION, the frame constants and flag bits, and every
MessageType with its introducing generation, iterated via MessageType::ALL) as deterministic
JSON and diff it against the checked-in crates/protocol/schema/gen-<PROTOCOL_VERSION>.json. Fail
on mismatch. Re-bless an intended change with
UPDATE_PROTOCOL_SCHEMA=1 cargo test -p microsandbox-protocol --test schema_snapshot; the
generator only ever writes the current generation's file, so prior-generation files stay frozen,
and a generation bump shows up as a reviewable diff.message.rs, client.rs): an unknown extra field decodes via serde(default) in
both directions; a too-new message type is rejected on send with the typed UnsupportedOperation
error; the negotiated generation is the lower of the two sides; every type is sendable to a current
peer; wire strings are unique and round-trip.gen-N.json may only add message types
versus gen-(N-1).json) once a second generation exists to compare against.