packages/agent/docs/agent-harness.md
AgentHarness is the orchestration layer above the low-level agent loop. It owns session persistence, runtime configuration, resource resolution, operation locking, and extension-facing mutation semantics.
This document describes the current direction and implemented behavior. Some extension/session-facade details are planned and called out explicitly.
Harness listeners and hooks should be able to close over the AgentHarness instance and call public harness APIs from any event where those APIs are documented as allowed. Those calls must not corrupt in-flight turn snapshots, reorder persisted transcript entries, lose pending writes, deadlock settlement, or leave the harness in the wrong phase.
The intended rule is:
waitForIdle() during the active run, they can deadlock. A future facade should expose runWhenIdle() instead.A final lifecycle hardening pass should prove these guarantees with a broad listener/hook reentrancy test suite.
The current split is:
Result<TValue, TError> where expected failures are contained and must not throw, such as ExecutionEnv, filesystem/shell operations, shell-output capture, resource loading, and compaction helpersSession and AgentHarness reject/throw instead of returning bare results that can be ignoredAgentHarness failures are normalized to AgentHarnessError where practical; subsystem errors are preserved as causeHarness events observe committed state. Public mutators validate required input and persistence before committing when practical, then await notifications. If a hook or subscriber fails after commit, the state change is not rolled back and the public method rejects with AgentHarnessError code "hook".
The harness separates state into four categories.
Harness config is the latest runtime configuration set by the application or extensions:
Getters return harness config. They do not return the snapshot used by an in-flight provider request.
Setters update harness config immediately, including while a turn is in flight. Changes affect the next turn snapshot, not the currently running provider request.
setResources() accepts concrete resources and emits resources_update on every call with shallow-copied current and previous resources. Applications own loading/reloading resources from disk or other sources and should call setResources() with new values.
getResources() returns shallow-copied current resources. It is a live config read, not the last turn snapshot.
A turn snapshot is the concrete state used for one LLM turn. It is created by createTurnState() and contains:
Static option values are used directly. System-prompt provider callbacks are invoked once per createTurnState() call. All logic for that turn uses the same snapshot.
Resource arrays are shallow-copied when a snapshot is created. Individual skill and prompt-template objects are not deep-copied.
Stream options are shallow-copied when a snapshot is created. headers and metadata maps are shallow-copied; their values are not deep-copied. Credentials from getApiKeyAndHeaders() are resolved per provider request so expiring tokens can refresh, but the configured stream options and derived session id come from the current turn snapshot.
The session contains persisted entries only. Session reads return persisted state and do not include queued writes.
Session storage implementations must persist leaf changes as leaf entries. setLeafId() is not an in-memory-only cursor update; it appends a durable entry whose targetId is the active tree leaf or null for root. Reopening storage must reconstruct the current leaf from the latest persisted leaf-affecting entry.
Session writes requested while an operation is active are queued as pending session writes. Pending writes are based on session-entry shapes without generated fields (id, parentId, timestamp).
Pending session writes are always persisted. They are flushed at save points, at operation settlement, and in failure cleanup.
A public pending-writes/session-facade API is planned but not implemented yet.
The harness has an explicit phase:
type AgentHarnessPhase = "idle" | "turn" | "compaction" | "branch_summary" | "retry";
Structural operations require phase === "idle" and synchronously set the phase before the first await:
promptskillpromptFromTemplatecompactnavigateTreeStarting another structural operation while the harness is not idle rejects with AgentHarnessError code "busy".
The following operations are allowed during a turn where appropriate:
steerfollowUpnextTurnabortPhase/settlement semantics are still provisional and need a full lifecycle pass.
prompt, skill, and promptFromTemplate follow the same flow:
"turn".createTurnState().executeTurn().skill and promptFromTemplate resolve their resource from the same snapshot that is passed to the turn. They do not resolve resources separately.
steer, followUp, and nextTurn accept text plus optional images and create user messages internally. nextTurn messages are inserted before the new user message on the next user-initiated turn.
Queue modes are live, not turn-snapshotted:
getSteeringMode() / setSteeringMode()getFollowUpMode() / setFollowUpMode()Changing a queue mode during a run affects the next queue drain. Queue drains happen at safe points.
A save point occurs after an assistant turn and its tool-result messages have completed.
At a save point the harness:
This lets model, thinking level, tool, resource, stream option, and system prompt changes made during a turn affect the next turn in the same run, while never mutating an in-flight provider request. The loop callbacks are not recreated at save points.
The low-level loop converts harness ThinkingLevel to provider reasoning at the provider boundary:
"off" -> undefinedNo state refresh is needed on agent_end except flushing leftover pending session writes and clearing the operation phase. The exact settled event timing is still under review.
If the system-prompt callback throws while starting prompt, skill, or promptFromTemplate, the operation rejects with AgentHarnessError and the harness returns to idle. If it throws from the save-point snapshot created by prepareNextTurn, the low-level agent run records an assistant error message.
The target hook system is described in hooks.md.
Summary:
AgentHarness emits typed hook events and consumes typed results.on() API; the event result type determines whether a handler may return a result.Event payloads describe what is happening. Harness getters describe latest config for future snapshots.
Extensions should eventually interact with a harness-scoped HarnessSession facade rather than the raw session. The facade should wrap the internal session and enforce harness pending-write ordering semantics. Once this exists, hooks and event listeners can receive a context that exposes the full AgentHarness plus the session facade without giving direct access to unordered raw session writes.
Planned read semantics:
Planned write semantics:
A planned diagnostics API may expose pending writes explicitly:
getPendingWrites(): readonly PendingSessionWrite[]
Agent-emitted messages are persisted on message_end to preserve transcript ordering. Pending extension/session writes flush after those messages at save points.
Abort is allowed during a turn. It aborts the low-level run and clears steering/follow-up queues.
Abort does not clear nextTurn messages. Messages queued with nextTurn() survive abort and are inserted before the user message on the next user-initiated turn.
Abort does not discard pending session writes. Pending writes flush at the next save point if reached, at agent_end, or in operation failure cleanup.
Abort barrier semantics still need an audit.
Compaction and tree navigation are structural session mutations.
They are allowed only while idle and are not queued. They operate on persisted session state. The next prompt creates a fresh turn snapshot.
Branch summary generation is part of the tree navigation operation.
Auto-compaction and retry decision points are not implemented in AgentHarness yet.
Harness tests should stay focused by area instead of growing one large catch-all file.
Current structure:
packages/agent/test/harness/agent-harness.test.ts: core lifecycle and public API behavior.packages/agent/test/harness/agent-harness-stream.test.ts: stream options and provider hook semantics.Preferred future structure:
agent-harness-resources.test.ts: resource snapshot/loading semantics.agent-harness-tools.test.ts: tool registry getters, active-tool semantics, and update events.agent-harness-lifecycle.test.ts: phase/save-point/settled/reentrancy behavior.Use the pi-ai faux provider (registerFauxProvider, fauxAssistantMessage) for deterministic harness/provider tests. Faux response factories can inspect StreamOptions, invoke options.onPayload, and return scripted assistant messages without real provider APIs or network access.
Harness coverage is configured separately from the default package test run:
npm run test:harness
npm run coverage:harness
coverage:harness runs test/harness/**/*.test.ts and reports coverage for src/harness/**/*.ts plus the non-harness runtime files it directly exercises (src/agent.ts and src/agent-loop.ts) into coverage/harness. Type-only dependencies such as src/types.ts are not included because they have no meaningful runtime coverage.
This list tracks the remaining work before treating AgentHarness as migration-ready. Active/planned items are ordered from easiest to hardest. Completed items are archived at the bottom.
Status: In progress
Done:
setTools(tools, activeToolNames?).setActiveTools(toolNames).AgentHarnessError.AgentHarness<TSkill, TPromptTemplate, TTool>.QueueMode from core types.AgentHarnessOptions.steeringMode and followUpMode.getSteeringMode() / setSteeringMode() and getFollowUpMode() / setFollowUpMode().Remaining:
getTools() semantics.getActiveTools() semantics.Notes:
AgentHarness model registryStatus: Planned
Done:
setModel() behavior is preserved.Remaining:
Model objects, model references, or both.AgentHarness lifecycle/state passStatus: In progress
Done:
void syncFromTree(), syncFromTree(), liveOperationId, and shell().createTurnState(), applyTurnState(), and executeTurn().phase in place of boolean idle state.steer, followUp, and nextTurn create user messages from text plus optional images.nextTurn messages are inserted before the new user prompt.finally.AgentHarnessError.message_end persistence happens before subscriber notification.abort() signals cancellation before notifications and still waits for idle through notification errors.setLeafId() persists durable leaf entries so tree navigation survives storage reopen.Remaining:
settled can fire too early.settled callbacks deterministic.agent_end.before_agent_start hook semantics against coding-agent.before_agent_start needs more turn info such as tools/tool snippets.abort() barrier semantics.Status: Designed in hooks.md, not implemented
Done:
AgentHarnessContext.emitHook(event) derives the hook type from event.type.Remaining:
HookEvent, ResultOf, registration options with generic source metadata, and the single AgentHarnessHooks implementation.AgentHarness into reducer functions.AgentHarnessEvent has reducer semantics.AgentHarness accept and expose the concrete hooks instance with constructor inference for app-specific hooks.Notes:
Status: Planned
Done:
Remaining:
Notes:
Status: Planned
Done:
Remaining:
settled writes.Status: Planned
Done:
Remaining:
Agent dependency from AgentHarnessStatus: Done
Done:
AgentHarness calls runAgentLoop() directly.Remaining:
Notes:
Status: Done
Done:
AgentHarnessOptions.streamOptions, getStreamOptions(), and setStreamOptions().streamSimple() and keeps lifecycle-owned signal and reasoning from the low-level loop.getApiKeyAndHeaders() resolves credentials per provider request.before_provider_request, before_provider_payload, and after_provider_response hooks are implemented.agent-harness-stream.test.ts covers forwarding, auth merge, hook patching/deletion/chaining, payload hooks, and busy/save-point snapshot behavior.Remaining:
Result cleanupStatus: Done
Done:
Result<TValue, TError> plus helpers.ExecutionEnv and NodeExecutionEnv to return typed results for filesystem/process operations.ExecutionEnv.appendFile() for streaming append use cases.ExecutionEnv results.ExecutionEnv, including full-output spill via appendFile().NodeExecutionEnv from browser-safe root exports.Buffer usage in generic truncation utilities with runtime-neutral UTF-8 handling.readTextLines() so JSONL metadata loading reads only the header line.SessionError.not_found filesystem failures.NodeExecutionEnv tests for file operations, exec errors, aborts, callbacks, timeouts, and shell-output spill.Remaining:
Notes:
Result.SessionError.AgentHarnessError.src/harness/env/nodejs.ts, Node-backed storage/session implementations, or explicit Node-only entry points.ExecutionEnv and shell-output contract tests as APIs evolve.