website/docs/user-guide/features/tool-search.md
When you have many MCP servers or non-core plugin tools attached to a session, their JSON schemas can consume a substantial fraction of the context window on every turn — even when only a few of them are relevant to what the user actually asked for.
Tool Search is Hermes' opt-in progressive-disclosure layer for that problem. When activated, MCP and plugin tools are replaced in the model-visible tools array by three bridge tools, and the model loads each specific tool's schema on demand.
:::info Built-in Hermes tools never defer
The tools that make up Hermes' core capability set (terminal,
read_file, write_file, patch, search_files, todo, memory,
browser_*, web_search, web_extract, clarify, execute_code,
delegate_task, session_search, send_message, and the rest of
_HERMES_CORE_TOOLS) are always loaded directly. Only MCP tools and
non-core plugin tools are eligible for deferral.
:::
When Tool Search activates for a turn, the model sees three new tools in place of the deferred ones:
tool_search(query, limit?) — search the deferred-tool catalog
tool_describe(name) — load the full schema for one tool
tool_call(name, arguments) — invoke a deferred tool
A typical interaction looks like:
Model: tool_search("create a github issue")
→ { matches: [{ name: "mcp_github_create_issue", ... }, ...] }
Model: tool_describe("mcp_github_create_issue")
→ { parameters: { type: "object", properties: { ... } } }
Model: tool_call("mcp_github_create_issue", { title: "...", body: "..." })
→ { ok: true, issue_number: 42 }
When the model invokes tool_call, Hermes unwraps the bridge and
dispatches the underlying tool exactly as if the model had called it
directly. Pre-tool-call hooks, guardrails, approval prompts, and
post-tool-call hooks all run against the real tool name — not against
tool_call. The activity feed in the CLI and gateway also unwraps so you
see the underlying tool, not the bridge.
By default Tool Search runs in auto mode: it activates only when the
deferrable tool schemas would consume at least 10% of the active model's
context window. Below that, the tools-array assembly is a pure
pass-through and you pay no overhead.
This decision is re-evaluated every time the tools array is built, so:
tools:
tool_search:
enabled: auto # auto (default), on, or off
threshold_pct: 10 # percentage of context — only used in auto mode
search_default_limit: 5
max_search_limit: 20
| Key | Default | Meaning |
|---|---|---|
enabled | auto | auto activates above threshold; on always activates if there's at least one deferrable tool; off disables entirely. |
threshold_pct | 10 | Percentage of context length at which auto mode kicks in. Range 0–100. |
search_default_limit | 5 | Hits returned when the model calls tool_search without a limit. |
max_search_limit | 20 | Hard upper bound the model can request via limit. Range 1–50. |
You can also flip the legacy boolean shape:
tools:
tool_search: true # equivalent to {enabled: auto}
Tool Search trades a fixed per-turn token cost (the three bridge tool schemas, ~300 tokens) and at least one extra round trip (search → describe → call) for the savings on the deferred schemas. It's a clear win when you have many tools and use few per turn; it's overhead when you have few tools total.
The auto default handles this for you. If you set enabled: on
unconditionally, expect a slight per-turn cost on small toolsets.
These come from the prompt-cache integrity invariant — they are inherent to any progressive-disclosure design, not specific to this implementation:
tool_describe
result enters the conversation history (so it does get cached on
subsequent turns) but it never benefits from the system-prompt cache
prefix."github" against a
catalog where every tool name contains "github").Map. This avoids
the class of bug where a stored catalog drifts out of sync with the
live tool registry.tool_search,
tool_describe, and tool_call only ever see and invoke tools the
session was actually granted. A subagent, kanban worker, or gateway
session restricted to a subset of toolsets cannot use the bridge to
discover or call a tool outside that subset — the deferred catalog is
the deferrable slice of the session's own enabled/disabled toolsets,
not the whole process registry.tools/tool_search.py — the implementationtests/tools/test_tool_search.py — the regression suiteopenclaw-tool-search-report PDF in the original implementation
PR for the research that shaped the design