website/docs/developer-guide/agent-loop.md
The core orchestration engine is run_agent.py's AIAgent class — a large file that handles everything from prompt assembly to tool dispatch to provider failover.
AIAgent is responsible for:
prompt_builder.py# Simple interface — returns final response string
response = agent.chat("Fix the bug in main.py")
# Full interface — returns dict with messages, metadata, usage stats
result = agent.run_conversation(
user_message="Fix the bug in main.py",
system_message=None, # auto-built if omitted
conversation_history=None, # auto-loaded from session if omitted
task_id="task_abc123"
)
chat() is a thin wrapper around run_conversation() that extracts the final_response field from the result dict.
Hermes supports three API execution modes, resolved from provider selection, explicit args, and base URL heuristics:
| API mode | Used for | Client type |
|---|---|---|
chat_completions | OpenAI-compatible endpoints (OpenRouter, custom, most providers) | openai.OpenAI |
codex_responses | OpenAI Codex / Responses API | openai.OpenAI with Responses format |
anthropic_messages | Native Anthropic Messages API | anthropic.Anthropic via adapter |
The mode determines how messages are formatted, how tool calls are structured, how responses are parsed, and how caching/streaming works. All three converge on the same internal message format (OpenAI-style role/content/tool_calls dicts) before and after API calls.
Mode resolution order:
api_mode constructor arg (highest priority)anthropic provider → anthropic_messages)api.anthropic.com → anthropic_messages)chat_completionsEach iteration of the agent loop follows this sequence:
run_conversation()
1. Generate task_id if not provided
2. Append user message to conversation history
3. Build or reuse cached system prompt (prompt_builder.py)
4. Check if preflight compression is needed (>50% context)
5. Build API messages from conversation history
- chat_completions: OpenAI format as-is
- codex_responses: convert to Responses API input items
- anthropic_messages: convert via anthropic_adapter.py
6. Inject ephemeral prompt layers (budget warnings, context pressure)
7. Apply prompt caching markers if on Anthropic
8. Make interruptible API call (_interruptible_api_call)
9. Parse response:
- If tool_calls: execute them, append results, loop back to step 5
- If text response: persist session, flush memory if needed, return
All messages use OpenAI-compatible format internally:
{"role": "system", "content": "..."}
{"role": "user", "content": "..."}
{"role": "assistant", "content": "...", "tool_calls": [...]}
{"role": "tool", "tool_call_id": "...", "content": "..."}
Reasoning content (from models that support extended thinking) is stored in assistant_msg["reasoning"] and optionally displayed via the reasoning_callback.
The agent loop enforces strict message role alternation:
User → Assistant → User → Assistant → ...Assistant (with tool_calls) → Tool → Tool → ... → Assistanttool role can have consecutive entries (parallel tool results)Providers validate these sequences and will reject malformed histories.
API requests are wrapped in _interruptible_api_call() which runs the actual HTTP call in a background thread while monitoring an interrupt event:
┌────────────────────────────────────────────────────┐
│ Main thread API thread │
│ │
│ wait on: HTTP POST │
│ - response ready ───▶ to provider │
│ - interrupt event │
│ - timeout │
└────────────────────────────────────────────────────┘
When interrupted (user sends new message, /stop command, or signal):
When the model returns tool calls:
ThreadPoolExecutor
clarify) force sequential executionfor each tool_call in response.tool_calls:
1. Resolve handler from tools/registry.py
2. Fire pre_tool_call plugin hook
3. Check if dangerous command (tools/approval.py)
- If dangerous: invoke approval_callback, wait for user
4. Execute handler with args + task_id
5. Fire post_tool_call plugin hook
6. Append {"role": "tool", "content": result} to history
Some tools are intercepted by run_agent.py before reaching handle_function_call():
| Tool | Why intercepted |
|---|---|
todo | Reads/writes agent-local task state |
memory | Writes to persistent memory files with character limits |
session_search | Queries session history via the agent's session DB |
delegate_task | Spawns subagent(s) with isolated context |
These tools modify agent state directly and return synthetic tool results without going through the registry.
AIAgent supports platform-specific callbacks that enable real-time progress in the CLI, gateway, and ACP integrations:
| Callback | When fired | Used by |
|---|---|---|
tool_progress_callback | Before/after each tool execution | CLI spinner, gateway progress messages |
thinking_callback | When model starts/stops thinking | CLI "thinking..." indicator |
reasoning_callback | When model returns reasoning content | CLI reasoning display, gateway reasoning blocks |
clarify_callback | When clarify tool is called | CLI input prompt, gateway interactive message |
step_callback | After each complete agent turn | Gateway step tracking, ACP progress |
stream_delta_callback | Each streaming token (when enabled) | CLI streaming display |
tool_gen_callback | When tool call is parsed from stream | CLI tool preview in spinner |
status_callback | State changes (thinking, executing, etc.) | ACP status updates |
The agent tracks iterations via IterationBudget:
agent.max_turns)delegation.max_iterations (default 50) — total iterations across parent + subagents can exceed the parent's capWhen the primary model fails (429 rate limit, 5xx server error, 401/403 auth error):
fallback_providers list in configThe fallback system also covers auxiliary tasks independently — vision, compression, and web extraction each have their own fallback chain configurable via the auxiliary.* config section.
compression.protect_last_n, default: 20)After each turn:
hermes_state.py)MEMORY.md / USER.md/resume or hermes chat --resume| File | Purpose |
|---|---|
run_agent.py | AIAgent class — the complete agent loop |
agent/prompt_builder.py | System prompt assembly from memory, skills, context files, personality |
agent/context_engine.py | ContextEngine ABC — pluggable context management |
agent/context_compressor.py | Default engine — lossy summarization algorithm |
agent/prompt_caching.py | Anthropic prompt caching markers and cache metrics |
agent/auxiliary_client.py | Auxiliary LLM client for side tasks (vision, summarization) |
model_tools.py | Tool schema collection, handle_function_call() dispatch |