docs/architecture.md
Each agent session has a mounted SQLite DB. The DB is the one and only IO mechanism between host and container. No IPC files, no stdin piping. Two tables: messages_in (host → agent-runner) and messages_out (agent-runner → host). Everything is a message.
Central DB (host process):
Per-session DB (mounted into container):
An agent group has its own filesystem — folder, CLAUDE.md, skills, container config. Multiple sessions can share the same agent group (same filesystem, same skills) but each session gets its own DB mounted at a known path. Each session = a separate container with the same agent group's filesystem but a different session DB.
Platform event
→ Channel adapter (trigger check, ID extraction)
→ Returns: { platformChannelId, platformThreadId, triggered }
→ Host maps platformChannelId + platformThreadId → agent group + session
→ Host writes message to session's DB
→ Host calls wakeUpAgent(session)
→ Container spins up (or is already running)
→ Agent-runner polls its session DB, finds new messages
→ Agent-runner processes with Claude
→ Agent-runner writes response to session DB
→ Host polls active session DBs for responses
→ Host reads response, looks up conversation, delivers through channel adapter
Channel adapters are responsible for:
The channel adapter does NOT know about agent group IDs or session IDs. It returns platform-level identifiers. The host maps those to the entity model.
The two-level ID scheme (channel ID + thread ID) gives flexibility:
Adapters are stateless — they receive config from the host at setup time, not from the DB directly.
What lives in code (per channel type, doesn't change at runtime):
These are decisions made when setting up the channel adapter. Change them = change the code.
What lives in the DB (per group, varies group to group):
The host reads per-group config from the DB and passes it to the adapter at setup. If config changes at runtime (admin agent registers a new group, changes a trigger), the host calls the adapter's update method.
When the adapter forwards a message from an unknown group, the host needs to decide whether to create the group and a session for it.
The adapter controls whether to forward unknown messages — based on its code-level auto-registration rules (sender allowlist, group-add detection, etc.). If the adapter forwards it, the host creates the group + session.
Session creation for known groups:
The code-level rules are channel-specific:
No channel_configs table — channel-type-level behavior is baked into the adapter code.
Chat SDK adapters are wrapped per-channel:
Chat SDK's subscription model:
Chat SDK has its own thread-level subscription concept (distinct from NanoClaw's channel-level registration):
onNewMention / onNewMessage(regex) — fires on first contact (e.g., @mention in a Slack thread)thread.subscribe() — opts into all future messages in that threadonSubscribedMessage — fires for all messages in subscribed threadsThis is sub-channel granularity. NanoClaw registers at the channel level ("listen to this Discord channel"). Chat SDK subscribes at the thread level ("track this specific Slack thread"). The bridge lets Chat SDK manage its own subscriptions internally — NanoClaw doesn't interfere with or replicate this.
Platform capability differences:
Capabilities vary significantly across adapters (see Chat SDK adapter docs):
The host/bridge handles graceful degradation — if an agent posts a card on a platform that doesn't support cards, it falls back to text.
Non-Chat-SDK channels (WhatsApp via Baileys, Gmail, custom integrations) implement the NanoClaw channel interface directly — no bridge, no Chat SDK types.
The host is an orchestrator:
When a container spins up, the agent-runner immediately starts polling its session DB. Messages are already there waiting.
Media is not downloaded by the host. Instead:
Native content blocks (provider-dependent):
The agent-runner detects file types and passes supported types as native content blocks where the provider supports it:
| Type | Claude | Codex | OpenCode |
|---|---|---|---|
| Images (JPEG, PNG, GIF, WebP) | Native image content block | Save to disk, reference in prompt | Save to disk, reference in prompt |
| PDFs | Native document content block | Save to disk | Save to disk |
| Audio | Native audio content block | Save to disk | Save to disk |
| Other files (code, data, video, archives) | Save to disk | Save to disk | Save to disk |
"Save to disk" means downloaded to /workspace/downloads/{messageId}/ and referenced in the prompt text as an available file path. The agent can use tools (Read, Bash) to access it.
The agent-runner builds the prompt differently per provider. For Claude, it constructs multi-part MessageParam content with image/document blocks. For Codex/OpenCode, everything is text with file path references.
Outbound file delivery is tool-based. The agent calls a tool (e.g., send_file) with a file path. The agent-runner moves the file to the outbox and writes the messages_out row.
/workspace/
outbox/
{message_id}/ ← one dir per messages_out row
chart.png
report.pdf
messages_out content references filenames only:
{ "text": "Here's the chart", "files": ["chart.png", "report.pdf"] }
No paths in the DB — the convention is the contract. The host reads files from outbox/{message_id}/ in the mounted session folder and delivers them via the adapter (Chat SDK FileUpload with buffer data, or platform-specific upload for native channels). Host cleans up the outbox directory after successful delivery.
Outbound files use a dedicated send_file MCP tool (separate from send_message). See agent-runner-details.md for the tool interface.
Dedup is the channel adapter's responsibility. Chat SDK handles this internally. Native adapters track platform message IDs as needed. The host does not deduplicate — if the adapter forwards it, the host writes it.
Two tables. JSON blobs for content — schema-free, format varies by kind.
-- Host writes, agent-runner reads
CREATE TABLE messages_in (
id TEXT PRIMARY KEY,
kind TEXT NOT NULL, -- 'chat' | 'chat-sdk' | 'task' | 'webhook' | 'system'
timestamp TEXT NOT NULL,
status TEXT DEFAULT 'pending', -- 'pending' | 'processing' | 'completed' | 'failed'
status_changed TEXT, -- ISO timestamp of last status change
process_after TEXT, -- ISO timestamp. NULL = process immediately.
recurrence TEXT, -- cron expression. NULL = one-shot.
tries INTEGER DEFAULT 0, -- number of processing attempts
-- routing (agent-runner copies to messages_out; agent never sees these)
platform_id TEXT,
channel_type TEXT,
thread_id TEXT,
-- payload (structure depends on kind)
content TEXT NOT NULL -- JSON blob
);
-- Agent-runner writes, host reads
CREATE TABLE messages_out (
id TEXT PRIMARY KEY,
in_reply_to TEXT, -- references messages_in.id (optional)
timestamp TEXT NOT NULL,
delivered INTEGER DEFAULT 0,
deliver_after TEXT, -- ISO timestamp. NULL = deliver immediately.
recurrence TEXT, -- cron expression. NULL = one-shot.
-- routing (default: copied from messages_in by agent-runner)
kind TEXT NOT NULL, -- 'chat' | 'chat-sdk' | 'task' | 'webhook' | 'system'
platform_id TEXT,
channel_type TEXT,
thread_id TEXT,
-- payload (format matches kind)
content TEXT NOT NULL -- JSON blob
);
One-shot and recurring tasks use the same tables — no separate scheduler.
One-shot: process_after (inbound) or deliver_after (outbound) with recurrence = NULL.
Recurring: Same, plus a recurrence cron expression. After the host marks a row as handled/delivered, if recurrence is set, it inserts a new row with process_after/deliver_after advanced to the next cron occurrence. Next time is computed from the scheduled time (not wall clock) to prevent drift.
Host sweep (every ~60s across all session DBs):
messages_in WHERE status = 'pending' AND (process_after IS NULL OR process_after <= now()) → wake agentmessages_in WHERE status = 'processing' AND status_changed < (now - stale_threshold) → stale detection, increment tries, reset to pending with backoffmessages_out WHERE delivered = 0 AND (deliver_after IS NULL OR deliver_after <= now()) → deliverrecurrence, insert next occurrenceActive container poll (~1s) checks the same conditions but only for sessions with running containers.
Agent-runner creates schedules by writing messages_in (to itself) or messages_out (reminders/notifications) with process_after and optionally recurrence.
chat — simple NanoClaw format. Any channel can produce this.
{
"sender": "John",
"senderId": "user123",
"text": "Check this PR",
"attachments": [{ "type": "image", "url": "https://signed-url..." }],
"isFromMe": false
}
chat-sdk — full Chat SDK SerializedMessage, passed through from bridge adapter. Includes author, text, formatted (mdast AST), attachments, isMention, links, metadata.
task — scheduled task firing.
{ "prompt": "Review open PRs", "script": "scripts/review.sh" }
webhook — raw webhook payload.
{ "source": "github", "event": "pull_request", "payload": { ... } }
system — host action result (response to a system action the agent requested).
{ "action": "register_group", "status": "success", "result": { "agent_group_id": "ag-456" } }
Output kind determines the format and delivery adapter. Default: agent-runner copies kind and routing fields from the messages_in row it's responding to.
chat — simple NanoClaw format. NanoClaw channel delivers via sendMessage(text).
{ "text": "LGTM, merging now" }
chat-sdk — Chat SDK AdapterPostableMessage. Bridge adapter delivers via thread.post(). Can be markdown, card, or raw — adapter handles platform conversion.
{ "markdown": "## Review\n**LGTM**", "attachments": [...] }
{ "card": { "type": "card", "title": "Review", "children": [...] }, "fallbackText": "..." }
task — task result. Host logs and optionally notifies.
{ "result": "3 PRs reviewed", "status": "success" }
webhook — webhook response. Host sends HTTP response or notifies.
{ "response": { "status": 200, "body": { ... } } }
system — host action request (register group, reset session, etc.). Host reads, validates permissions, executes, writes result back as a system messages_in row.
{ "action": "reset_session", "payload": { "session_id": "sess-123" } }
All interactive operations flow through messages_in/out — the DB is the only IO boundary for the container. The agent uses MCP tools; the agent-runner translates tool calls into structured messages_out rows; the host delivers through the appropriate adapter method.
Cards with user interaction (e.g., "Ask User Question"):
ask_user_question tool with question + optionsThe agent-runner holds the tool call open while waiting for the user's response in messages_in. The round-trip goes: agent → messages_out → host → platform → user clicks → platform → host → messages_in → agent-runner → agent.
Approvals:
Two patterns, both handled at the host level:
In both cases, the approval and action execution happen on the host side, not the agent side.
Approval routing: Privilege is a user-level concept. user_roles records owner (global only — first user to pair becomes owner) and admin (global or scoped to a specific agent_group_id). When an action requires approval, pickApprover(agentGroupId) returns candidates in order: scoped admins for that agent group → global admins → owners (deduplicated). pickApprovalDelivery then takes the first candidate reachable via ensureUserDm (with a same-channel-kind tie-break so a Discord approval request prefers a Discord-using approver). The approval card lands in the approver's DM messaging group, not the origin chat. Delivery is resolved through the Chat SDK's openDM for resolution-required channels (Discord/Slack/…) or the user's handle directly for direct-addressable channels (Telegram/WhatsApp/…), and the mapping is cached in user_dms for subsequent requests. See src/access.ts, src/user-dm.ts.
Editing a sent message:
Agent calls an edit_message tool with the message ID and new content. Agent-runner writes messages_out with an edit operation. Host calls adapter.editMessage(). Messages in the agent's context include integer IDs so the agent can reference them.
Reactions:
Agent calls add_reaction tool with message ID and emoji. Agent-runner writes messages_out with a reaction operation. Host calls adapter.addReaction().
Operations in messages_out content:
// Normal message (default)
{ "text": "LGTM" }
// Interactive card
{ "operation": "ask_question", "title": "Deploy", "question": "Approve deployment?", "options": ["Yes", "No", "Defer"] }
// Edit existing message
{ "operation": "edit", "messageId": "3", "text": "Updated: LGTM with minor comments" }
// Reaction
{ "operation": "reaction", "messageId": "5", "emoji": "thumbs_up" }
The host reads the operation field (if present) and calls the right adapter method. No operation field = normal message delivery. Platform capabilities vary — the host/bridge handles graceful degradation (e.g., reaction on a platform that doesn't support it → skip or send as text).
Sending a message to another agent uses the same routing fields as channel delivery. The agent-runner sets channel_type: 'agent' and platform_id to the target agent group ID. Optionally, thread_id can target a specific session (null = find or create the default session).
From the sending agent's perspective, it's the same mechanism as sending to Slack or WhatsApp — just a messages_out row with different routing. The host reads it, checks that this agent group has permission to message the target, resolves the target session, and writes a messages_in row to that session's DB.
// messages_out routing fields
{ "kind": "chat", "channel_type": "agent", "platform_id": "pr-worker", "thread_id": null }
// messages_out content
{ "text": "Reset your session and re-review", "sender": "Supervisor", "senderId": "agent:pr-admin" }
The receiving agent gets a normal chat message. It doesn't need to know the source is another agent unless that's relevant context.
Default behavior: Agent-runner copies routing fields (kind, platform_id, channel_type, thread_id) from the messages_in row to messages_out. Response goes back where it came from.
Host validation: Before delivering, the host checks that this agent group is permitted to send to the destination. The agent-runner copies routing; the host validates.
Multi-destination pattern (customization): An agent may need to send to a different channel than the origin (e.g., a webhook triggers a Slack notification). This is supported via custom code, not built into the core:
destinations table to the session DB mapping logical names to routing fieldsThis is documented as a pattern, not a built-in feature.
messages_in / messages_out) — no stdin piping, no IPC filesmessages_out rows with kind: 'system'messages_outprocess_after / deliver_after + recurrence on the same message tables/add-<channel> skills)Session DB location: Not in the agent group folder. Separate directory (e.g., sessions/{session_id}/). Each session gets its own folder containing session.db and the Claude SDK's .claude/ directory. The session identity IS the folder — no need to track Claude SDK session IDs.
Container mount structure:
/workspace/ ← mount: session folder (read-write)
.claude/ ← Claude SDK session data (auto-created)
session.db ← session SQLite DB
outbox/ ← agent-runner writes outbound files here
agent/ ← mount: agent group folder (nested, read-write)
CLAUDE.md ← agent instructions
skills/ ← agent skills
... working files
Two directory mounts: session folder at /workspace, agent group folder at /workspace/agent/. The agent-runner CDs into /workspace/agent/ to run the agent. Claude SDK writes .claude/ at /workspace/.claude/ (root of the workspace). The session DB is at /workspace/session.db.
This works on both Docker (nested bind mounts) and Apple Container (directory mounts only — no file-level mounts, but nested directory mounts are supported).
Session DB concurrent access: The host writes messages_in, the agent-runner writes messages_out. Both access the same SQLite file simultaneously. WAL mode handles this — SQLite allows concurrent readers, and the two sides write to different tables so writer contention is minimal. The host enables WAL mode when creating the session DB.
Session management: Host-managed. The host creates session folders and mounts them. The container only sees its own session folder.
Session creation (no race condition):
The central DB session row creation is the serialization point. No Claude SDK session ID to coordinate — the SDK discovers its own session data in .claude/ when the agent runs.
System actions: The agent uses MCP tools (register group, reset session, schedule task, etc.). The agent-runner handles these tool calls and writes a structured, deterministic messages_out row with kind: 'system'. This is not natural language — it's a programmatic, structured payload that the host processes deterministically. Host validates permissions, executes, and writes the result back as a system messages_in row.
Container lifecycle: No warm pool. Containers are spawned on demand (wakeUpAgent) and torn down from the outside by the host when idle. Existing idle detection + teardown mechanism carries over.
NanoClaw does not stream tokens to users. The Claude Agent SDK's query() yields complete results. The agent-runner writes one complete message to messages_out per result. The host delivers complete messages to channels.
Message editing is supported as an explicit operation (agent calls an edit_message tool), not as a streaming mechanism.
Typing indicators: host sets typing when a container is active for a session, clears when the container exits or a response appears in messages_out.
When multiple messages arrive while the container is down, they accumulate as handled = 0 rows in messages_in. When the container wakes up, the agent-runner queries all unhandled messages and processes them as a batch — multiple messages are formatted into a single <messages> XML block.
pending → processing → completed
→ failed (after max retries)
process_after is null or past).status_changed is set to now. Prevents other polls from re-picking the same message.Stale detection: If a message is processing but status_changed is too old (e.g., >10 minutes), the host assumes the container crashed. It resets the message to pending, increments tries, and sets process_after with exponential backoff.
Retries use process_after with exponential backoff. Each retry increments tries and pushes process_after further out:
failedThe host computes this — not the agent-runner. When the host detects a stale processing message or the container exits with an error, it increments tries, computes the next process_after, and resets status to pending.
Output-sent protection: If messages_out already has delivered rows for a batch, don't retry (prevents duplicate messages to user).
Two tiers:
process_after / deliver_after timestamps, handle recurrenceThe architecture is flexible for code changes, not configurable for everything. Advanced setups (like the PR Factory below) use custom routing logic and host-side hooks — not database config columns.
NanoClaw is customized via skills — branches that get merged into the user's installation. Different skills add different capabilities (channels, integrations, behaviors). The code must be structured so that:
Different customizations don't conflict. Adding Slack and adding Telegram should not produce merge conflicts. Adding a new MCP tool should not conflict with adding a channel. Each type of customization should touch its own file(s).
Core blocks of functionality are in separate files. Channel registration, message formatting, MCP tools, routing logic, container management — each in its own file. A skill that changes how messages are formatted doesn't touch the file that handles container spawning.
The index file is thin. It wires things together (init DB, start adapters, start poll loops) but contains no business logic. All logic lives in purpose-specific modules that skills can modify independently.
Don't over-split. A simple change (e.g., adding a new message kind) shouldn't require edits across 5 files. Group related logic together. The goal is that each skill touches 1-2 files for its core change.
Registration patterns over switch statements. Channels, MCP tools, and providers should use registration/plugin patterns. A skill adds a channel by adding a file and a registration call — not by editing a central switch statement alongside every other channel.
Practical example: Adding a new channel via skill should require:
channels/index.ts) to import the self-registering moduleAnalysis of 33 skill branches shows these files cause the most merge conflicts:
| Hotspot | Why it conflicts | Solution |
|---|---|---|
src/index.ts (2000 LOC) | Every skill patches the main loop, imports, init logic | Thin index that wires modules. Logic lives in purpose-specific files (router, delivery, session-manager, host-sweep). |
src/config.ts | Every skill adds env vars to a central file | Config declared where it's used. Each module reads its own env vars. No central config registry that every skill edits. |
src/container-runner.ts | Channel skills add mounts, env vars, credential setup | Declarative mount registration. Channels declare their mounts in their own file. Container runner reads from a registry, not a hardcoded list. |
src/db.ts (750 LOC) | Schema, migrations, and all CRUD in one file | Split by entity. Numbered migrations. Skills add a migration file + edit one entity file. |
container/agent-runner/src/index.ts | Agent protocol, IPC handling, formatting all in one file | Split into poll-loop, formatter, providers/, mcp-tools/. Session DB replaces IPC. |
src/ipc.ts | Every MCP tool addition patches one file | mcp-tools/ directory with barrel. Skills add a tool file + barrel line. |
src/channels/index.ts | Every channel adds an import line at the same location | Barrel file with comment slots per channel (current pattern works, keep it). |
Mount registration pattern: Instead of every channel skill editing buildVolumeMounts(), channels declare mounts that the container runner collects:
// channels/gmail.ts
registerChannel('gmail', {
factory: createGmailAdapter,
mounts: [
{ hostPath: '~/.gmail-mcp', containerPath: '/home/node/.gmail-mcp', readonly: false }
],
env: ['GMAIL_OAUTH_TOKEN'],
});
The container runner reads registered mounts from the channel registry — no need to edit container-runner.ts.
Config pattern: Skills don't patch config.ts or .env.example. Skill-specific env vars are documented in the skill's SKILL.md — the setup process reads those instructions. Each module reads its own env vars directly:
// channels/discord.ts
const DISCORD_TOKEN = process.env.DISCORD_BOT_TOKEN;
// channels/gmail.ts
const GMAIL_CREDS = process.env.GMAIL_CREDENTIALS_PATH;
Shared config (DATA_DIR, TIMEZONE, MAX_CONCURRENT_CONTAINERS) stays in config.ts. Channel/skill-specific config stays in the module that uses it.
Line width: 120 characters. Most statements fit on one line without sacrificing readability.
Concise logging. A thin wrapper keeps every log call on one line:
log.info('IPC message sent', { chatJid, sourceGroup });
log.warn('Unauthorized IPC attempt', { chatJid });
log.error('Error processing', { file, err });
The DB layer is split by entity rather than kept in one monolithic file:
src/db/
connection.ts ← singleton, init, WAL mode
schema.ts ← CREATE TABLE statements (current state, for reference)
migrations/
index.ts ← runner: checks version, applies pending
001-initial.ts ← initial schema
002-pending-questions.ts ← example: adds pending_questions table
... ← skills append new numbered files
agent-groups.ts ← CRUD for agent_groups
messaging-groups.ts ← CRUD for messaging_groups + messaging_group_agents
sessions.ts ← CRUD for sessions + pending_questions
index.ts ← barrel: re-exports everything
Principles:
messaging-groups.ts — doesn't touch sessions or agent groups.schema.ts documents what the DB looks like now (read this to understand the schema). Migrations are append-only numbered files that describe how we got here.schema_version table replaces try { ALTER TABLE } catch { /* exists */ } blocks. On startup, it checks the current version and applies pending migrations in order. Each migration is a function: (db: Database) => void.Agent-runner session DB uses the same pattern but lighter — no migrations needed since session DBs are created fresh by the host:
container/agent-runner/src/db/
connection.ts ← open session.db at fixed path, WAL mode
messages-in.ts ← read pending, update status
messages-out.ts ← write results, outbox queries
index.ts ← barrel
These are the building blocks. None require special abstractions — they fall out of per-session DBs, host-managed routing, and messages_out with kind: 'system':
Multiple agent groups on the same channel with content-based routing. Different messages in the same thread can route to different agent groups based on content (e.g., @mention routes to supervisor, normal messages route to worker). The channel adapter's routing logic — custom code — decides.
Per-thread sessions from a shared agent group. Multiple sessions share the same agent group (filesystem, skills, CLAUDE.md) but each gets its own session DB. Standard for worker pools.
Session reset and replay. Create a new session for the same thread. Mark old messages as unhandled so the poll picks them up again. Old output stays visible in the platform (e.g., Discord thread) for comparison. This is an action an agent can request — not automatic.
Cross-session read access. Some agents can query other sessions' data. Different access levels: manager sees messages_in/messages_out (review content). Supervisor sees full internals (agent logs, tool calls, debug traces). This is just filesystem/DB access — mount or query the right paths.
Context duplication into new sessions. When a supervisor is invoked in a worker's thread, a new session is created with relevant messages copied in. Custom host-side code handles this.
Agent-initiated host actions. The agent uses MCP tools (reset session, update skills, etc.). The agent-runner handles the tool call and writes a structured system messages_out row. The host reads and executes with permission checks. The agent can request, but the host decides.
Three agent groups, one Discord channel (PR Factory), plus an admin channel:
| Role | Agent Group | Where | Session model |
|---|---|---|---|
| Worker | pr-worker | PR Factory threads | One session per thread (per PR) |
| Manager | pr-manager | PR Factory channel | Single session, queries across worker sessions |
| Supervisor | pr-admin | Admin channel + PR Factory (when @tagged) | Main session in admin channel; per-thread session when invoked in worker threads |
Worker flow: GitHub PR → Discord thread → worker agent reviews (triage, review, test plan). Each thread gets a session from the shared pr-worker group.
Feedback flow: User @tags supervisor in worker threads → custom routing sends to supervisor with a new session containing the thread's messages (duplicated). Supervisor collects feedback to filesystem. Worker doesn't see supervisor messages.
Iteration flow: User discusses feedback with supervisor in admin channel → supervisor suggests skill changes (shown as rich card with diff) → user approves → supervisor applies changes via host action → supervisor requests session reset + replay → workers re-review same PRs with updated skills in same threads but fresh sessions → user compares reviews side by side.
Manager flow: User talks to manager in PR Factory main channel (not in threads). Manager can search across all worker session DBs (messages_in/messages_out) to answer questions like "how many PRs today?" or "what topics are trending?" Can request actions (close PR, re-open).
What's custom code vs. base architecture:
| Capability | Base architecture | Custom code (PR Factory) |
|---|---|---|
| Per-thread sessions | ✓ platformThreadId → session | |
| Shared agent group across sessions | ✓ Multiple sessions, one group | |
| Writing messages to session DB | ✓ Standard flow | |
| @mention routing to different agent | ✓ Channel adapter routing logic | |
| Context duplication into supervisor session | ✓ Host-side hook on supervisor invocation | |
| Session reset + replay | ✓ Primitives (new session, mark unhandled) | ✓ Supervisor action triggers it |
| Skill updates | ✓ Filesystem writes | ✓ Supervisor action applies changes |
| Cross-session queries | ✓ DB/filesystem access | ✓ Manager's tools know where to look |
| Rich card output | ✓ Structured output in messages_out |
The central DB handles routing and entity management. All content and execution state lives in per-session DBs.
-- Agent workspaces: folder, skills, CLAUDE.md, container config
CREATE TABLE agent_groups (
id TEXT PRIMARY KEY,
name TEXT NOT NULL,
folder TEXT NOT NULL UNIQUE,
agent_provider TEXT, -- default for sessions (null = system default)
container_config TEXT, -- JSON: { additionalMounts, timeout }
created_at TEXT NOT NULL
);
-- Platform groups/channels (WhatsApp group, Slack channel, Discord channel, email thread, etc.)
CREATE TABLE messaging_groups (
id TEXT PRIMARY KEY,
channel_type TEXT NOT NULL, -- 'whatsapp', 'slack', 'discord', 'telegram', 'email'
platform_id TEXT NOT NULL, -- platform-specific ID (JID, channel ID, etc.)
name TEXT,
is_group INTEGER DEFAULT 0,
unknown_sender_policy TEXT NOT NULL DEFAULT 'strict', -- 'strict' | 'request_approval' | 'public'
created_at TEXT NOT NULL,
UNIQUE(channel_type, platform_id)
);
-- Users (messaging platform identities, namespaced "<channel_type>:<handle>")
CREATE TABLE users (
id TEXT PRIMARY KEY, -- e.g. 'telegram:123456', 'discord:1470...'
kind TEXT NOT NULL, -- mirrors the channel_type prefix
display_name TEXT,
created_at TEXT NOT NULL
);
-- Roles (owner is global only; admin can be global or scoped to an agent_group)
CREATE TABLE user_roles (
user_id TEXT NOT NULL REFERENCES users(id),
role TEXT NOT NULL, -- 'owner' | 'admin'
agent_group_id TEXT REFERENCES agent_groups(id), -- NULL for global
granted_by TEXT,
granted_at TEXT NOT NULL,
PRIMARY KEY (user_id, role, agent_group_id)
);
-- owner rows must have agent_group_id = NULL (enforced in db/user-roles.ts)
-- Membership (explicit non-privileged access; admin/owner imply membership)
CREATE TABLE agent_group_members (
user_id TEXT NOT NULL REFERENCES users(id),
agent_group_id TEXT NOT NULL REFERENCES agent_groups(id),
added_by TEXT,
added_at TEXT NOT NULL,
PRIMARY KEY (user_id, agent_group_id)
);
-- DM resolution cache (so cold DMs aren't re-resolved every time)
CREATE TABLE user_dms (
user_id TEXT NOT NULL REFERENCES users(id),
channel_type TEXT NOT NULL,
messaging_group_id TEXT NOT NULL REFERENCES messaging_groups(id),
resolved_at TEXT NOT NULL,
PRIMARY KEY (user_id, channel_type)
);
-- Which agent groups handle which messaging groups, with what rules
CREATE TABLE messaging_group_agents (
id TEXT PRIMARY KEY,
messaging_group_id TEXT NOT NULL REFERENCES messaging_groups(id),
agent_group_id TEXT NOT NULL REFERENCES agent_groups(id),
trigger_rules TEXT, -- JSON: { pattern, mentionOnly, excludeSenders, includeSenders }
response_scope TEXT DEFAULT 'all', -- 'all' | 'triggered' | 'allowlisted'
session_mode TEXT DEFAULT 'shared', -- 'shared' | 'per-thread'
priority INTEGER DEFAULT 0, -- higher = checked first when multiple agents match
created_at TEXT NOT NULL,
UNIQUE(messaging_group_id, agent_group_id)
);
-- Sessions: one folder = one session = one container when running
-- Folder path is derived: sessions/{agent_group_id}/{session_id}/
CREATE TABLE sessions (
id TEXT PRIMARY KEY,
agent_group_id TEXT NOT NULL REFERENCES agent_groups(id),
messaging_group_id TEXT REFERENCES messaging_groups(id), -- null for internal/spawned sessions
thread_id TEXT, -- platform thread ID (null for shared session mode)
agent_provider TEXT, -- override per session (null = inherit from agent_group)
status TEXT DEFAULT 'active', -- 'active' | 'closed'
container_status TEXT DEFAULT 'stopped', -- 'running' | 'idle' | 'stopped'
last_active TEXT, -- last message activity timestamp
created_at TEXT NOT NULL
);
CREATE INDEX idx_sessions_agent_group ON sessions(agent_group_id);
CREATE INDEX idx_sessions_lookup ON sessions(messaging_group_id, thread_id);
-- Pending interactive questions (cards waiting for user response)
-- Host writes when delivering a question card, deletes when response received
CREATE TABLE pending_questions (
question_id TEXT PRIMARY KEY,
session_id TEXT NOT NULL REFERENCES sessions(id),
message_out_id TEXT NOT NULL, -- the messages_out row that sent the card
platform_id TEXT, -- where the card was delivered
channel_type TEXT,
thread_id TEXT,
created_at TEXT NOT NULL
);
When the host delivers a messages_out row with operation: 'ask_question':
pending_questions row mapping question_id → session_idWhen a Chat SDK ActionEvent (button click) arrives:
actionId from the eventpending_questions by question_id (derived from actionId — the bridge maintains the mapping)questionId + selectedOptionpending_questions rowThis avoids scanning session DBs. The central DB is the routing lookup — same pattern as message routing.
Also used for host-generated approval cards: when the host sends an approval request to the admin's DM, it writes a pending_questions row. The admin's response is routed back to the originating session.
stopped → running → idle → stopped
↗
idle → running (new message while warm)
The agent-runner is the process inside the container. It mediates between the session DB and the Claude SDK — polling for work, formatting messages for the agent, translating tool calls into DB rows, and managing the agent lifecycle.
All IO goes through the session DB. No stdin, no stdout markers, no IPC files.
messages_inmessages_out rowsmessages_in WHERE status = 'pending' AND (process_after IS NULL OR process_after <= now())status = 'processing', status_changed = now() on eachmessages_out rowsstatus = 'completed'Agent-runner strips routing fields (platform_id, channel_type, thread_id) before formatting. The agent never sees routing info — it only sees content.
chat — format into <messages> XML blockchat-sdk — extract text, author, attachments from serialized message; format into <messages> XMLtask — format as [SCHEDULED TASK] prefix + prompt. Run pre-script if present.webhook — format as [WEBHOOK: source/event] + JSON payloadsystem — host action results (e.g., "register_group succeeded"). Format as system context, not chat.Mixed batches (e.g., a chat message + a system result both pending) are combined into one prompt with clear delimiters.
MCP tools write directly to the session DB.
Core tools:
| Tool | What it does |
|---|---|
send_message | Write messages_out row, kind: 'chat' |
send_file | Move file to outbox/{msg_id}/, write messages_out with filenames |
schedule_task | Write messages_in row (to self) with process_after + recurrence. Or messages_out with deliver_after for outbound reminders. |
list_tasks | Query messages_in WHERE recurrence IS NOT NULL |
pause_task / resume_task / cancel_task | Modify messages_in rows (update status, clear/set recurrence) |
register_agent_group | Write messages_out, kind: 'system', action: 'register_agent_group' |
New tools:
| Tool | What it does |
|---|---|
ask_user_question | Write messages_out with question card. Hold tool call open, poll messages_in for response matching questionId. Return selection as tool result. |
edit_message | Write messages_out with operation: 'edit' |
add_reaction | Write messages_out with operation: 'reaction' |
send_to_agent | Write messages_out with channel_type: 'agent', platform_id: '{target}' |
send_card | Write messages_out with card structure |
See agent-runner-details.md for full MCP tool parameter definitions.
Agent-initiated (outbound): Tool-based. Agent calls ask_user_question (interactive card with options) or send_card (structured card). Agent-runner writes the card structure to messages_out. Host/adapter handles platform-specific rendering (Slack Block Kit, Discord embeds, Telegram inline keyboard, text fallback).
Host-initiated (approval cards): When an action requires approval, the host generates a standardized approval card and sends it to the admin's DM. These are not agent-initiated — the agent doesn't know about the approval step. The card format is fixed (action description + approve/deny buttons).
Inbound (card responses): Not a card — it's a messages_in row with questionId + selectedOption in the content. Agent-runner matches to the pending ask_user_question tool call and returns the selection as the tool result.
Messages starting with / are checked against three lists:
Whitelisted commands (pass-through to agent):
<messages> XML wrappingAdmin-only commands (require admin sender):
/remote-control — remote control session/clear — clear session context/compact — force context compactionFiltered commands (dropped entirely):
The command lists are hardcoded in the agent-runner. Admin verification happens host-side before the message ever reaches the container: src/command-gate.ts queries user_roles (owner / global admin / scoped-admin-of-this-agent-group) and either passes the message through, drops it, or routes it elsewhere. The container has no notion of admin identity — no env var, no DB query, no per-message check.
The agent-runner processes recurring task messages like any other messages_in row. After the agent-runner marks a recurring message as completed, the host handles inserting the next occurrence (new messages_in row with process_after advanced to next cron time). The agent-runner doesn't manage recurrence — it just processes what it finds.
Pre-scripts: if a task message has a script field, run it first. If wakeAgent = false, mark completed without invoking Claude.
Outbound: Agent calls send_to_agent tool → agent-runner writes messages_out with channel_type: 'agent', platform_id = target agent group ID. Host validates permissions and writes to target session's messages_in.
Inbound: Messages from other agents arrive as normal chat messages_in rows. The content includes sender and senderId (e.g., "senderId": "agent:pr-admin"). No special formatting — the agent sees it as a chat message.
claude provider; additional providers like OpenCode install via /add-<provider> skills)ask_user_question is waiting for a response, the container should not be considered idle. Also need to detect when the agent is still working (active tool calls, subagents) and avoid killing the container even if no messages_out have been written recently.