docs/agent-runner-details.md
Implementation-level details for the agent-runner inside the container. See architecture.md for the high-level design.
The agent-runner has two layers:
Agent-runner core — owns the poll loop, message formatting, DB reads/writes, MCP tool implementations, routing, status management, media handling. This is NanoClaw-specific and shared across all providers.
Agent provider — owns the SDK interaction. Takes formatted prompts, pushes them to the SDK, yields events back. Trunk ships the claude provider; additional providers (OpenCode, Codex, etc.) are installed by /add-<provider> skills from the providers branch.
The boundary: the agent-runner decides what to send and what to do with results. The provider decides how to talk to the SDK.
interface AgentProvider {
/** Start a new query. Returns a handle for streaming input and output. */
query(input: QueryInput): AgentQuery;
}
interface QueryInput {
/** Initial prompt (already formatted by agent-runner).
* String for text-only. ContentBlock[] for multimodal (images, PDFs, audio). */
prompt: string | ContentBlock[];
/** Session ID to resume, if any */
sessionId?: string;
/** Resume from a specific point in the session (provider-specific, may be ignored) */
resumeAt?: string;
/** Working directory inside the container */
cwd: string;
/** MCP server configurations (normalized format — provider translates) */
mcpServers: Record<string, McpServerConfig>;
/** System prompt / developer instructions */
systemPrompt?: string;
/** Environment variables for the SDK process */
env: Record<string, string | undefined>;
/** Additional directories the agent can access */
additionalDirectories?: string[];
}
interface McpServerConfig {
command: string;
args: string[];
env: Record<string, string>;
}
interface AgentQuery {
/** Push a follow-up message into the active query */
push(message: string): void;
/** Signal that no more input will be sent */
end(): void;
/** Output event stream */
events: AsyncIterable<ProviderEvent>;
/** Force-stop the query (e.g., container shutting down) */
abort(): void;
}
type ProviderEvent =
| { type: 'init'; sessionId: string }
| { type: 'result'; text: string | null }
| { type: 'error'; message: string; retryable: boolean; classification?: string }
| { type: 'progress'; message: string };
allowedTools. Codex uses approvalPolicy. OpenCode uses permission. Each provider configures this internally based on the same intent: "allow everything, no prompting."sessionId and resumeAt.init — emitted once per query when the provider establishes or resumes a session. The agent-runner captures sessionId for future resume.result — emitted when the agent produces a complete response. May be emitted multiple times per query (e.g., Claude's multi-turn with subagents). The agent-runner writes each result to messages_out.error — emitted on failure. retryable indicates whether the agent-runner should retry. classification is optional detail (e.g., 'quota', 'auth', 'transport').progress — optional, for logging. The agent-runner logs these but doesn't act on them.Only the claude provider ships in trunk. The Codex and OpenCode sections below document the provider interface for reference and for skills that install additional providers — they are not baked into the core image.
Wraps @anthropic-ai/claude-agent-sdk's query().
class ClaudeProvider implements AgentProvider {
query(input: QueryInput): AgentQuery {
const stream = new MessageStream(); // AsyncIterable<SDKUserMessage>
stream.push(input.prompt);
const sdkQuery = query({
prompt: stream,
options: {
cwd: input.cwd,
resume: input.sessionId,
resumeSessionAt: input.resumeAt,
systemPrompt: input.systemPrompt
? { type: 'preset', preset: 'claude_code', append: input.systemPrompt }
: undefined,
mcpServers: input.mcpServers, // already the right shape
additionalDirectories: input.additionalDirectories,
env: input.env,
allowedTools: NANOCLAW_TOOL_ALLOWLIST,
permissionMode: 'bypassPermissions',
allowDangerouslySkipPermissions: true,
hooks: {
PreCompact: [{ hooks: [preCompactHook] }],
PreToolUse: [{ matcher: 'Bash', hooks: [sanitizeBashHook] }],
},
},
});
return {
push: (msg) => stream.push(msg),
end: () => stream.end(),
abort: () => sdkQuery.close(),
events: translateClaudeEvents(sdkQuery),
};
}
}
translateClaudeEvents is an async generator that maps SDK messages to ProviderEvent:
message.type === 'system' && message.subtype === 'init' → { type: 'init', sessionId }message.type === 'result' → { type: 'result', text }message.type === 'system' && message.subtype === 'api_retry' → { type: 'error', retryable: true }message.type === 'system' && message.subtype === 'rate_limit_event' → { type: 'error', retryable: false, classification: 'quota' }message.type === 'system' && message.subtype === 'task_notification' → { type: 'progress', message }Claude-specific features preserved inside the provider:
MessageStream for async iterable input (push-based)resumeSessionAt for resume at specific message UUIDadditionalDirectories for multi-directory accessWraps @openai/codex-sdk.
class CodexProvider implements AgentProvider {
query(input: QueryInput): AgentQuery {
const codex = new Codex(this.buildOptions(input));
const thread = input.sessionId
? codex.resumeThread(input.sessionId, this.threadOptions(input))
: codex.startThread(this.threadOptions(input));
const abortController = new AbortController();
let pendingFollowUp: string | null = null;
return {
push: (msg) => {
// Codex doesn't support streaming input.
// Store the follow-up and abort the current turn.
pendingFollowUp = msg;
abortController.abort();
},
end: () => { /* no-op — Codex turns end naturally */ },
abort: () => abortController.abort(),
events: this.run(thread, input.prompt, abortController, () => pendingFollowUp),
};
}
private async *run(thread, prompt, abortController, getPendingFollowUp): AsyncIterable<ProviderEvent> {
let currentPrompt = prompt;
while (true) {
try {
const streamed = await thread.runStreamed(currentPrompt, {
signal: abortController.signal,
});
let sessionId: string | undefined;
let resultText = '';
for await (const event of streamed.events) {
if (event.type === 'thread.started') {
sessionId = event.thread_id;
yield { type: 'init', sessionId };
}
if (event.type === 'item.completed' && event.item.type === 'agent_message') {
resultText = event.item.text || resultText;
}
if (event.type === 'turn.failed') {
yield { type: 'error', message: event.error.message, retryable: false };
return;
}
}
yield { type: 'result', text: resultText || null };
// Check if a follow-up was queued during this turn
const followUp = getPendingFollowUp();
if (followUp) {
currentPrompt = followUp;
// Reset for next iteration
continue;
}
return;
} catch (err) {
if (abortController.signal.aborted && getPendingFollowUp()) {
// Aborted because of follow-up — restart with new prompt
currentPrompt = getPendingFollowUp();
abortController = new AbortController();
continue;
}
throw err;
}
}
}
}
Codex-specific behavior inside the provider:
developer_instructions for system prompt (loaded from CLAUDE.md)git init in workspace (Codex requires a git repo)sandboxMode, approvalPolicy, networkAccessEnabled from env varsWraps @opencode-ai/sdk.
class OpenCodeProvider implements AgentProvider {
query(input: QueryInput): AgentQuery {
// OpenCode runs a local server — create it once, reuse across queries
const { client, server } = await createOpencode({ config: this.buildConfig(input) });
const { stream } = await client.event.subscribe();
let aborted = false;
let pendingFollowUp: string | null = null;
return {
push: (msg) => {
pendingFollowUp = msg;
server.close(); // interrupt current query
},
end: () => { /* no-op */ },
abort: () => { aborted = true; server.close(); },
events: this.run(client, server, stream, input, () => pendingFollowUp),
};
}
private async *run(client, server, stream, input, getPendingFollowUp): AsyncIterable<ProviderEvent> {
const session = await client.session.create();
yield { type: 'init', sessionId: session.data.id };
await client.session.promptAsync({
path: { id: session.data.id },
body: { parts: [{ type: 'text', text: input.prompt }] },
});
for await (const event of stream) {
if (event.type === 'session.idle') {
// Collect result text from accumulated message parts
const resultText = this.extractResult(event);
yield { type: 'result', text: resultText };
const followUp = getPendingFollowUp();
if (followUp) {
await client.session.promptAsync({
path: { id: session.data.id },
body: { parts: [{ type: 'text', text: followUp }] },
});
continue;
}
return;
}
if (event.type === 'session.error') {
yield { type: 'error', message: event.properties?.error?.data?.message, retryable: false };
return;
}
}
}
}
OpenCode-specific behavior inside the provider:
server.close())OPENCODE_PROVIDER, OPENCODE_MODEL)type: 'local', command: [cmd, ...args], environment)Everything below is handled by the agent-runner, not the provider.
┌─────────────────────────────────────────┐
│ │
│ 1. Query messages_in for pending rows │
│ WHERE status = 'pending' │
│ AND (process_after IS NULL │
│ OR process_after <= now()) │
│ │
│ 2. If rows found: │
│ a. Set status = 'processing' │
│ b. Format messages by kind │
│ c. Strip routing fields │
│ d. Call provider.query(prompt) │
│ e. Process provider events │
│ f. Write results to messages_out │
│ g. Set status = 'completed' │
│ │
│ 3. While query is active: │
│ - Continue polling messages_in │
│ - New messages → provider.push() │
│ │
│ 4. When query finishes: │
│ - Back to step 1 │
│ - If no messages, sleep + re-poll │
│ │
└─────────────────────────────────────────┘
Concurrent polling during active query: While the provider is running a query, the agent-runner continues polling messages_in on a short interval (~500ms). New pending messages are formatted and pushed into the active query via provider.push(). This lets follow-up messages arrive while the agent is processing — Claude handles this natively, Codex/OpenCode handle it via abort+restart internally.
Idle behavior: When no messages are pending and no query is active, the agent-runner sleeps briefly (1s) and re-polls. The container stays warm until the host kills it (idle timeout).
Idle detection exceptions: The container should NOT be considered idle when:
ask_user_question tool call is pending (waiting for user response in messages_in)The agent-runner signals "busy" status to the host. The mechanism for this is provider-specific — for Claude, the query AsyncGenerator is still yielding events. For others, the agent-runner can write a heartbeat or status indicator to the session DB that the host checks before killing.
The agent-runner transforms messages_in rows into a prompt string. The provider receives a ready-to-send string — it doesn't know about message kinds or routing.
Routing field stripping: platform_id, channel_type, thread_id are never included in the prompt. They're stored as context for writing messages_out.
Single message formatting by kind:
chat — format into message XML:
<message sender="John" time="2024-01-01 10:00">
Check this PR
</message>
chat-sdk — extract fields from serialized Chat SDK message:
<message sender="John (john@slack)" time="2024-01-01 10:00">
Check this PR
[image: screenshot.png — https://signed-url...]
</message>
Attachments are listed inline. Images/PDFs that Claude handles natively are passed as content blocks (see Media Handling below).
task — task prompt, optionally with script output:
[SCHEDULED TASK]
Script output:
{"data": ...}
Instructions:
Review open PRs
webhook — webhook payload:
[WEBHOOK: github/pull_request]
{"action": "opened", "pull_request": {...}}
system — host action result (response to an earlier system request):
[SYSTEM RESPONSE]
Action: register_agent_group
Status: success
Result: {"agent_group_id": "ag-456"}
Batch formatting: Multiple pending messages are combined into one prompt:
<context timezone="America/Los_Angeles">
<messages>
<message sender="John" time="10:00">Check this PR</message>
<message sender="Jane" time="10:01">Already on it</message>
</messages>
Mixed kinds (e.g., a chat message + a system response) are combined with clear delimiters. Each section is labeled by kind.
Command detection: Messages starting with / are checked against a command list. Recognized commands bypass formatting and are passed raw to the provider (for Claude's slash command handling) or intercepted by the agent-runner (for NanoClaw-level commands like session reset).
When the agent-runner picks up messages_in rows, it captures the routing fields from the batch:
interface RoutingContext {
platformId: string | null;
channelType: string | null;
threadId: string | null;
inReplyTo: string | null; // messages_in.id of the triggering message
}
When writing messages_out (either from provider results or MCP tool calls), the agent-runner copies this routing context by default. The agent never sees routing fields — it just produces text. The routing is implicit: "respond to whoever sent the message."
MCP tools that target a different destination (e.g., send_to_agent, send_message with explicit channel) override the routing context for that specific messages_out row.
The agent-runner manages the status and status_changed fields on messages_in:
pending → processing → completed
→ failed (if provider returns error and max retries exhausted)
UPDATE messages_in SET status = 'processing', status_changed = now(), tries = tries + 1 WHERE id IN (...)UPDATE messages_in SET status = 'completed', status_changed = now() WHERE id IN (...)failed — it leaves the message as processing. The host detects stale processing via status_changed and handles retry logic (reset to pending with backoff). This keeps retry policy on the host side.The agent-runner runs an MCP server that exposes NanoClaw tools to the agent. All tools write to the session DB.
DB path: The MCP server receives the session DB path via environment variable. It opens a second connection to the same SQLite file (WAL mode allows concurrent access).
Send a chat message to the current conversation (or a specified destination).
{
name: 'send_message',
params: {
text: string, // message content
channel?: string, // optional: target channel type (default: reply to origin)
platformId?: string, // optional: target platform ID
threadId?: string, // optional: target thread ID
}
}
Implementation: write a messages_out row with kind: 'chat'. If channel/platformId/threadId are provided, use those as routing. Otherwise, copy from the current routing context.
Send a file to the current conversation.
{
name: 'send_file',
params: {
path: string, // file path (relative to /workspace/agent/ or absolute)
text?: string, // optional accompanying message
filename?: string, // display name (default: basename of path)
}
}
Implementation:
outbox/{messageId}/ directorymessages_out row with files: [filename] in the contentSend a structured card (interactive or display-only).
{
name: 'send_card',
params: {
card: CardElement, // card structure (title, children, actions)
fallbackText?: string, // text fallback for platforms without card support
}
}
Implementation: write a messages_out row with kind: 'chat-sdk' and the card structure in content.
Send an interactive question and wait for the user's response. This is a blocking tool call — the tool doesn't return until the user responds.
{
name: 'ask_user_question',
params: {
title: string, // short card title, e.g. "Confirm deletion"
question: string,
options: (string | { label: string; selectedLabel?: string; value?: string })[],
timeout?: number, // seconds (default: 300)
}
}
Implementation:
questionIdmessages_out row with operation: 'ask_question', the question, options, and questionIdmessages_in for a row with matching questionId in contentselectedOption as the tool resultThe agent's execution is paused at this tool call. The provider's query keeps running (Claude holds the tool call open). The agent-runner polls for the response in a separate loop.
Edit a previously sent message.
{
name: 'edit_message',
params: {
messageId: string, // integer ID as shown to the agent
text: string, // new content
}
}
Implementation: write a messages_out row with operation: 'edit', the message ID, and new text.
Add an emoji reaction to a message.
{
name: 'add_reaction',
params: {
messageId: string, // integer ID as shown to the agent
emoji: string, // emoji name (e.g., 'thumbs_up')
}
}
Implementation: write a messages_out row with operation: 'reaction'.
Send a message to another agent group.
{
name: 'send_to_agent',
params: {
agentGroupId: string, // target agent group
text: string, // message content
sessionId?: string, // optional: target specific session
}
}
Implementation: write a messages_out row with channel_type: 'agent', platform_id: agentGroupId, thread_id: sessionId.
Schedule a one-shot or recurring task.
{
name: 'schedule_task',
params: {
prompt: string, // task prompt
processAfter: string, // ISO timestamp for first run
recurrence?: string, // cron expression (optional)
script?: string, // pre-agent script (optional)
}
}
Implementation: write a messages_in row (to self) with kind: 'task', process_after, and optionally recurrence. The host sweep picks it up when due.
List active scheduled/recurring tasks.
{
name: 'list_tasks',
params: {}
}
Implementation: query messages_in WHERE recurrence IS NOT NULL AND status != 'failed'.
Modify a scheduled task.
{
name: 'cancel_task',
params: { taskId: string }
}
// pause_task: set status = 'paused' (new status value for recurring tasks)
// resume_task: set status = 'pending'
// update_task: merge { prompt?, recurrence?, processAfter?, script? } into the live row
Implementation: cancel/pause/resume update the live row(s) directly. update_task is sent as a system action — the host reads current content, merges supplied fields, and writes back. All four match by (id = ? OR series_id = ?) AND kind='task' AND status IN ('pending','paused'), so they reach the live next occurrence of a recurring task even when the agent passes the original (now-completed) id.
Register a new agent group (admin only).
{
name: 'register_agent_group',
params: {
name: string,
folder: string,
platformId: string, // messaging group to wire to
channelType: string,
triggerRules?: object,
sessionMode?: 'shared' | 'per-thread',
}
}
Implementation: write a messages_out row with kind: 'system', action: 'register_agent_group'. The host reads, validates admin permission, creates the entity rows in the central DB, and writes a system messages_in response.
The agent-runner inspects attachments in chat/chat-sdk messages and handles them based on type and provider capability:
Provider-native content blocks:
| Type | Claude | Codex / OpenCode |
|---|---|---|
| Images (JPEG, PNG, GIF, WebP) | Native image content block | Save to disk |
| PDFs | Native document content block | Save to disk |
| Audio | Native audio content block | Save to disk |
| Other files (code, data, video, archives) | Save to disk | Save to disk |
"Save to disk" means: download to /workspace/downloads/{messageId}/, reference in the prompt text:
<message sender="John" time="10:00">
Check this spreadsheet
[file available at: /workspace/downloads/msg-123/data.xlsx]
</message>
The agent can use tools (Read, Bash) to access saved files.
For channels where direct download isn't possible (e.g., WhatsApp buffered streams), the channel adapter serves the media via a local URL. The agent-runner downloads from that URL.
Content block construction (Claude): The agent-runner builds multi-part MessageParam content: [{ type: 'image', source: { type: 'base64', media_type, data } }, { type: 'text', text: '...' }]. The prompt passed to the provider is not a plain string in this case — the QueryInput.prompt field needs to support structured content for Claude. The provider's query() method handles the format-specific construction.
Content block construction (Codex/OpenCode): Everything is text. File references are inlined in the prompt string. The provider receives a plain string prompt.
Handled via the send_file MCP tool (see above). The agent explicitly decides to send a file — the agent-runner doesn't scan output for file references.
For task kind messages with a script field in the content:
bash (30s timeout){ wakeAgent: boolean, data?: unknown }wakeAgent === false: mark message as completed, don't invoke the providerwakeAgent === true: enrich the prompt with script output, then invoke the providerThe agent-runner archives conversation transcripts before context compaction. For Claude, this is handled via the PreCompact hook (provider-internal). For other providers that don't have hooks, the agent-runner archives after each query completes based on the provider's output.
Archive location: /workspace/agent/conversations/{date}-{summary}.md
The agent-runner tracks sessionId and resumeAt across queries:
sessionId — captured from ProviderEvent { type: 'init' }. Passed back to QueryInput.sessionId on the next query.resumeAt — Claude-specific (last assistant message UUID). Stored by the agent-runner, passed to QueryInput.resumeAt. Providers that don't support this ignore it.These are ephemeral to the container's lifetime. When the container is killed and restarted, the host passes the stored sessionId from the central DB's sessions table. resumeAt is lost on container restart (the provider resumes from the end of the session).
The agent-runner receives configuration via:
AGENT_PROVIDER (claude/codex/opencode), NANOCLAW_ADMIN_USER_ID, provider-specific vars (API keys, model overrides), TZ/workspace/session.db. Agent group folder at /workspace/agent/. System prompt from /workspace/agent/CLAUDE.md and /workspace/global/CLAUDE.md./workspace/config.json) for things like the session ID to resume, assistant name, and admin user ID. This avoids overloading environment variables.The agent-runner reads config, creates the provider, and enters the poll loop. No stdin, no initial prompt — messages are already in the session DB.
type ProviderName = 'claude' | string;
function createProvider(name: ProviderName, config: ProviderConfig): AgentProvider {
// Trunk registers 'claude'; additional providers self-register when installed via skills.
const factory = providerRegistry.get(name);
if (!factory) throw new Error(`Unknown provider: ${name}`);
return factory(config);
}
The provider name comes from the container's environment (AGENT_PROVIDER env var), set by the host based on agent_groups.agent_provider or sessions.agent_provider.
ProviderConfig contains provider-specific settings (API keys, model overrides, etc.) passed via environment variables — not via the interface. Each provider reads what it needs from env.
mcpServers config)systemPrompt/workspace/extra/*)[agent-runner] ...)