v3/docs/adr/ADR-112-mcp-tool-discoverability.md
Status: Accepted (2026-05-11) Date: 2026-05-11 Authors: claude (drafted with rUv) Related: #1748, #1896, AlphaSignal external audit Supersedes: nothing
Ruflo ships 285 MCP tools across the CLI package. Claude Code already exposes a native toolset (Read, Write, Edit, Bash, Grep, Glob, Task, TodoWrite, WebFetch, WebSearch, etc.) that overlaps with many of ours. When Claude picks between two ways to do the same thing, the deciding signal is the tool description it sees in its system prompt — there is no other context.
Measurement on current main (2026-05-11):
$ node scripts/audit-tool-descriptions.mjs
Total MCP tools scanned: 285
With Use-when guidance: 9
Without guidance: 276
Worse than the AlphaSignal article reported (it estimated 237/300 lack guidance). 97% of our tool descriptions are essentially "here is what this thing does" rather than "here is when to use this instead of <native equivalent>".
The downstream effect: Claude defaults to native tools for anything where they could plausibly work. Memory tools lose to file Read. agent_spawn loses to Task. workflow_execute loses to sequential Bash. Ruflo's value-add (cost attribution, learning loops, swarm coordination, witness chain) never fires because Claude doesn't know it should call us.
Every MCP tool description MUST include "Use when … is wrong because …" guidance. The format:
<one-line what it does>. Use when <native equivalent> is wrong because
<concrete value-add: cost tracking | learning persistence | coordination
| witness chain | sandbox isolation | …>. For <inverse case>, native
<tool> is fine. <Optional: Pair with X first.>
For each of the 285 tools, before merging any new description we ask:
| Class | Tool count | Native overlap | Priority |
|---|---|---|---|
agent_* | 8 | Task | P1 — Claude defaults to Task |
memory_* | 11 | Read/Write (file paths) | P1 — memory loses to files |
agentdb_* | 14 | memory_* (internal overlap!) | P1 — disambiguate the two |
workflow_* | 6 | sequential Bash + TodoWrite | P1 |
hooks_* | 17 | mostly no overlap | P2 |
swarm_* | 6 | Task (multi-agent) | P2 |
embeddings_* | 9 | none direct | P3 |
claims_* | 4 | none direct | P3 |
task_* | 6 | TodoWrite | P2 |
| everything else | ~204 | varies | P3 |
P1 is ~39 tools — the ones where Claude's choice happens most often.
❌ "Spawn an agent"
❌ "Store data in memory"
❌ "Search the audit log"
❌ "Manage WASM sandboxes"
❌ "Configure routing"
These tell Claude what the tool does, never when to call it over a native alternative.
✅ "Spawn a Ruflo-tracked agent with cost attribution + memory persistence + swarm coordination. Use when native Task tool is wrong because you need (a) cost tracking per agent in the cost-tracking namespace, (b) cross-session learning via the patterns namespace, or (c) coordination with other agents in a swarm topology (hierarchical / mesh / consensus). For one-shot subtasks with no learning loop, native Task is fine. Pair with hooks_route to pick the right model first."
That description exists in tree today (it's agent_spawn). It's the template every other description should match.
39 tool descriptions edited to follow the template. One file at a time:
agent-tools.ts (8 tools)memory-tools.ts (11 tools)agentdb-tools.ts (14 tools)workflow-tools.ts (6 tools)~30 tools across hooks-*, swarm-*, task-*, coordination-*.
~216 long-tail tools. Bulk pattern-match + ML-suggested descriptions reviewed by human before commit.
scripts/audit-tool-descriptions.mjs + a tool-descriptions-audit CI job that:
MCPTool definition in src/mcp-tools/*.ts./Use when/i, /Prefer .* over/i, /Pair with/i, /fall back/i, /use over native/i.verification/mcp-tool-baseline.json — auditable, monotone-decreasing.This guard prevents new tools from shipping without guidance and prevents accidental regression of edited descriptions.
This ADR closes when:
scripts/.Subsequent phases tracked in their own PRs, but the CI baseline is what guarantees forward progress.
agent_spawn description in v3/@claude-flow/cli/src/mcp-tools/agent-tools.ts:183 — the template every other description should match