Documentation/triage_system.md
Maps to rag_system/agent/loop.Agent._should_use_rag, _route_using_overviews, and the fast-path router in backend/server.py.
Determine, for every incoming query, whether it should be answered by:
| Signal | Source | Notes |
|---|---|---|
| Keyword/regex check | backend/server.py (fast path) | Hard-coded quick wins (what time, define, etc.). |
| Index presence | SQLite (session → indexes) | If no indexes linked, direct LLM. |
| Overview routing | _route_using_overviews() | Uses document overviews and enrichment model to predict relevance. |
| LLM router prompt | agent/loop.py lines 648-665 | Final arbitrator (Ollama call, JSON output). |
flowchart TD
Q["Incoming Query"] --> S1{Session\nHas Indexes?}
S1 -- no --> LLM["Direct LLM Generation"]
S1 -- yes --> S2{Fast Regex\nHeuristics}
S2 -- match--> LLM
S2 -- no --> S3{Overview\nRelevance > Ï„?}
S3 -- low --> LLM
S3 -- high --> S4[LLM Router\n(prompt @648)]
S4 -- "route: RAG" --> RAG["Retrieval Pipeline"]
S4 -- "route: DIRECT" --> LLM
handle_session_chat() builds router_prompt (line ~435) and makes a first pass decision before calling the heavy agent code._route_using_overviews())
qwen3:0.6b) with prompt: "Does this overview mention … ? " → returns yes/no.{ "route": "RAG" | "DIRECT" }.| Component | Calls / Data |
|---|---|
SQLite chat_sessions | Reads indexes column to know linked index IDs. |
| LanceDB Overviews | Reads index_store/overviews/<idx>.jsonl. |
OllamaClient | Generates LLM router decision. |
PIPELINE_CONFIGS.triage.enabled – global toggle.TRIAGE_OVERVIEW_THRESHOLD – min similarity score to prefer RAG (default 0.35).Keep this document updated whenever routing heuristics, thresholds, or prompt wording change.