Back to Ruflo

ADR-099: Dossier-Investigator — Recursive Parallel Multi-Source Research for `ruflo-goals`

v3/docs/adr/ADR-099-dossier-investigator-recursive-parallel-research.md

3.6.307.9 KB
Original Source

ADR-099: Dossier-Investigator — Recursive Parallel Multi-Source Research for ruflo-goals

Status: Proposed Date: 2026-05-03 Version: target v3.6.x (additive — plugins/ruflo-goals minor bump) Supersedes: nothing Related: ADR-098 (plugin capability sync), plugins/ruflo-goals/agents/deep-researcher.md, plugins/ruflo-knowledge-graph, plugins/ruflo-rag-memory

Context

The ruflo-goals plugin currently ships three agents:

AgentPatternOutput
goal-plannerGOAP / A* over state spaceAction plan
deep-researcherLinear multi-source synthesis with evidence gradingGraded findings document
horizon-trackerLong-horizon objective tracking with drift detectionMilestone state

Inspired by maigret (3,000+ source parallel username enumeration with recursive expansion and structured dossier reporting), we identified a structurally distinct research pattern that none of the three existing agents implements as a first-class loop:

  1. Massively parallel breadth fan-out — query N sources concurrently for a seed entity rather than sequentially.
  2. Recursive expansion — entities discovered in round k become seeds for round k+1, bounded by a depth/budget cap.
  3. Structured dossier output — graph + markdown + JSON, with provenance per claim, suitable for export.

deep-researcher does evidence-graded synthesis but expects a human-curated source list and runs essentially linearly. It has no recursive seeding loop and no parallel fan-out primitive.

Ruflo already ships every primitive needed to assemble a maigret-style investigator without adding new external dependencies:

CapabilityExisting tool
Hybrid sparse+dense semantic searchmcp__claude-flow__memory_search_unified, ruflo-rag-memory:memory-search
Vector search (HNSW, RaBitQ)mcp__claude-flow__embeddings_search, embeddings_rabitq_search
Pattern recallmcp__claude-flow__agentdb_pattern-search, agentdb_hierarchical-recall
Knowledge-graph traversal + extractionruflo-knowledge-graph:kg-traverse, kg-extract
Web search & fetchWebSearch, WebFetch
Codebase queriesGrep, Glob, Read
ADR index lookupruflo-adr:adr-index
Git intelligenceruflo-jujutsu:diff-analyze
Parallel agent fan-outruflo-swarm:swarm-init (mesh topology)
Trajectory recordingmcp__claude-flow__hooks_intelligence_trajectory-*

Decision

Add a new agent dossier-investigator and a companion skill dossier-collect to plugins/ruflo-goals.

Agent: dossier-investigator

  • Model: sonnet (matches sibling agents; structured orchestration, not creative writing)
  • Role: Orchestrate parallel + recursive multi-source intelligence gathering on a seed entity, producing a graph-structured dossier with provenance.
  • Inputs: { seed: string, sources?: string[], maxDepth?: number=2, maxBreadth?: number=8, budget?: { tokens?, usd? } }
  • Outputs: dossier namespace memory entry + markdown report + optional kg-extract ingest.

Skill: dossier-collect

User-facing slash skill that drives the agent. Steps:

  1. Seed validation — normalize seed (entity type detection: file path / symbol / username / URL / ADR-id / concept).
  2. Source plan — pick which of the available tools apply to the seed type; user can override via --sources.
  3. Round 0 fan-out — issue all source queries in parallel via a single batched message (one Task call per source where useful, otherwise direct MCP calls).
  4. Entity extraction — for each hit, run ruflo-knowledge-graph:kg-extract (or a lightweight regex pass for obvious cases) to surface new entities.
  5. Recursive expansion — for each new entity not already in the dossier, schedule round k+1 until maxDepth or budget is hit. Apply de-duplication via embedding similarity (threshold 0.92).
  6. Aggregation — collapse hits into a graph: nodes = entities, edges = "discovered-via" with source provenance.
  7. Reporting — render markdown + JSON; optionally export to KG via kg-extract ingest.
  8. Persist — store in dossier namespace and record trajectory.

Rejected alternatives

OptionWhy rejected
Extend deep-researcherWould couple two structurally different loops (linear-graded vs parallel-recursive) into one prompt; would push the prompt past the 80-line guideline flagged in ADR-098.
Bundle Python maigret as an MCP wrapperAdds a Python runtime dependency, network egress to 3,000+ sites, and a privacy/abuse posture that's out of scope for a developer-research tool. We want maigret's pattern, not its target list.
Build into ruflo-knowledge-graphKG plugin is about graph operations on already-extracted data; investigator is about acquisition. Keeping it in ruflo-goals puts it next to its peers (deep-researcher, goal-planner).

Tradeoff: ~40% overlap with deep-researcher

deep-researcher and dossier-investigator will share the "query memory + KG + web" surface area. We accept this redundancy because:

  • The loops are different (linear evidence-grading vs parallel-recursive expansion).
  • Output formats differ (synthesis document vs entity graph + dossier).
  • Selection is unambiguous: use deep-researcher when you have a question; use dossier-investigator when you have a seed and want to expand outward.

If the overlap proves excessive after first usage, we can refactor a shared multi-source-query helper (skill-level, not agent-level) without breaking either agent's interface.

Consequences

Positive

  • Fills a real gap (recursive parallel investigation) without external deps.
  • Reuses the full ruflo tool surface — no new MCP servers.
  • Unblocks a class of tasks (investigate this symbol / module / dependency / ADR) that today require manual fan-out.
  • Trajectory recording feeds the SONA pattern store, so future investigations get faster routing.

Negative

  • Plugin agent count goes 3 → 4. ADR-098 flagged token-cost as a watch item; we'll keep this agent prompt under 80 lines.
  • Recursive expansion can blow up cost without a hard budget. The agent MUST honor budget.tokens and budget.usd and abort cleanly when hit, not silently truncate.
  • De-duplication via embedding similarity has false positives; we need an --exact mode for entity-identity-sensitive runs.

Neutral

  • No CLI surface change (/ruflo-goals:dossier-collect is the only new entry point).
  • No persistence schema change — uses existing dossier namespace under AgentDB memory.

Implementation plan

StepFileOwner
1. Agent promptplugins/ruflo-goals/agents/dossier-investigator.mdcoder
2. Skill markdownplugins/ruflo-goals/skills/dossier-collect/SKILL.mdcoder
3. Slash commandplugins/ruflo-goals/commands/goals.md (add dossier subcommand)coder
4. Plugin manifest bumpplugins/ruflo-goals/.claude-plugin/plugin.json (0.1.0 → 0.2.0)coder
5. README updateplugins/ruflo-goals/README.mdcoder
6. Smoke testtests/plugins/ruflo-goals/dossier.spec.tstester
7. Ship behind a flagdossierInvestigator.enabled defaulting true for first releasecoder

Acceptance criteria:

  • Agent prompt ≤ 80 lines (per ADR-098 guidance).
  • Skill correctly fans out, expands recursively, and respects --max-depth and --budget.
  • Output written to dossier namespace with valid JSON schema.
  • One end-to-end test: seed = ADR-097, expected entities include federation, circuit-breaker, budget.
  • Trajectory recorded; pattern stored on success.

References

  • maigret — https://github.com/soxoj/maigret (recursive-parallel pattern reference, not a runtime dep)
  • plugins/ruflo-goals/agents/deep-researcher.md (sibling agent)
  • ADR-098 (plugin token-cost guidelines)
  • ADR-026 (3-tier model routing — sonnet justified for orchestration)