packages/agent/docs/durable-harness.md
Durable AgentHarness / session design notes.
A fully durable AgentHarness is not realistic by itself because important dependencies are runtime JS supplied by the host app:
Tool registries are runtime dependencies. The harness should persist serializable tool configuration, such as active tool names, but not concrete tool implementations.
The practical target is a semi-durable harness:
Treat session as all durable agent state, not just transcript history.
Existing session state already includes harness state:
That suggests continuing with one durable session log rather than adding harness sidecars. Sidecars may still be useful for large blobs, but the session entry should remain the source-of-truth reference.
The app must recreate compatible runtime dependencies:
Harness can validate stable IDs/versions/hashes when available, but it cannot serialize these dependencies itself.
Constructor options remain explicit runtime configuration and do not read session state. Hidden async restore in a constructor would make failure handling ambiguous.
A future async builder/factory should own durable restore:
const harness = await AgentHarness.builder()
.env(env)
.session(session)
.model(defaultModel)
.tools(runtimeTools)
.defaultActiveTools(["read", "edit"])
.restore({ missingActiveTools: "fail" });
restore() should read the active branch, reduce durable harness configuration, apply defaults for missing entries, validate against app-supplied runtime dependencies, construct the harness, and optionally emit source: "restore" update events after construction.
For active tools:
active_tools_change entries are branch-scoped durable config.active_tools_change exists on the branch, restore uses builder defaults, or all registered tools if no default active names were supplied.Minimum useful durability entries:
Potential entries:
type DurableHarnessEntry =
| QueueEnqueuedEntry
| QueueConsumedEntry
| PendingWriteEnqueuedEntry
| PendingWriteAppliedEntry
| OperationStartedEntry
| OperationFinishedEntry
| OperationInterruptedEntry
| TurnStartedEntry
| TurnFinishedEntry
| ProviderRequestStartedEntry
| ProviderRequestFinishedEntry
| ToolCallStartedEntry
| ToolCallFinishedEntry;
Every accepted mutation must be durable before the public API resolves.
On startup:
Provider streams are not resumable. Recovery can only retry from a durable boundary or mark the operation interrupted.
Default conservative policy:
Optional policy:
recovery: "mark_interrupted" | "retry_unfinished"
retry_unfinished must be guarded around non-idempotent tool calls.
queue_enqueued: message was not accepted.queue_enqueued: message is restored.turn_started or equivalent before they are considered consumed.pending_write_enqueued: write was not accepted.