RAG_POC/architecture-runtime-retrieval.md
This document describes how ProxySQL becomes a RAG retrieval engine at runtime. The companion document (Data Model & Ingestion) explains how content enters the SQLite index. This document explains how content is queried, how results are returned to agents/applications, and how hybrid retrieval works in practice.
It is written as an implementation blueprint for ProxySQL (and its MCP server) and assumes the SQLite schema contains:
rag_sources (control plane)rag_documents (canonical docs)rag_chunks (retrieval units)rag_fts_chunks (FTS5)rag_vec_chunks (sqlite3-vec vectors)ProxySQL becomes a RAG runtime by providing four capabilities in one bounded service:
Retrieval Index Host
Orchestration Layer
Stable API Surface (MCP-first)
Authoritative Row Refetch Gateway
doc_id / pk_json, ProxySQL can refetch the authoritative row from the source DB on-demand (optional).In production terms, this is not “ProxySQL as a general search engine.” It is a bounded retrieval service colocated with database access logic.
A typical RAG flow has two phases:
chunk_id, doc_id, score, and small metadata.pk_json.Canonical flow
rag.search_hybrid(query, filters, k) → returns top chunk ids and scoresrag.get_chunks(chunk_ids) → returns chunk text for prompt grounding/citationsrag.fetch_from_source(doc_id) → returns full row or selected columnsProxySQL should support two “consumption modes”:
Principle
FTS5 is used for:
Typical output
chunk_id, score_fts, optional highlights/snippetsRanking
bm25(rag_fts_chunks) is the default. It is fast and effective for term queries.Vector search is used for:
Typical output
chunk_id, score_vec (distance/similarity), plus join metadataImportant
source_id and joins to rag_chunks / rag_documents.Hybrid retrieval combines FTS and vector search for better quality than either alone. Two concrete modes should be implemented because they solve different problems.
Use when
Flow
chunk_idWhy RRF
RRF formula
score = w_fts/(k0 + rank_fts) + w_vec/(k0 + rank_vec)k0=60, w_fts=1.0, w_vec=1.0Use when
Flow
This mode behaves like a two-stage retrieval pipeline:
A RAG retrieval engine must be bounded. ProxySQL should enforce limits at the MCP layer and ideally also at SQL helper functions.
k returned: 50Filters should be applied consistently across retrieval modes.
Common filters:
source_id or source_nameImplementation note:
A retrieval response must be designed for downstream LLM usage:
Return a compact list of “evidence candidates”:
chunk_iddoc_idscores (fts, vec, fused)titleDo not return full bodies by default; that is what rag.get_chunks is for.
rag.get_chunks(chunk_ids) returns:
chunk_id, doc_idtitlebody (chunk text)rag.fetch_from_source(doc_id) returns:
This is the “authoritative fetch” boundary that prevents stale/partial index usage from being a correctness problem.
These are not the preferred agent interface, but they are crucial for debugging and for SQL-native apps.
SELECT
f.chunk_id,
bm25(rag_fts_chunks) AS score_fts
FROM rag_fts_chunks f
WHERE rag_fts_chunks MATCH 'json_extract mysql'
ORDER BY score_fts
LIMIT 10;
Join to fetch text:
SELECT
f.chunk_id,
bm25(rag_fts_chunks) AS score_fts,
c.doc_id,
c.body
FROM rag_fts_chunks f
JOIN rag_chunks c ON c.chunk_id = f.chunk_id
WHERE rag_fts_chunks MATCH 'json_extract mysql'
ORDER BY score_fts
LIMIT 10;
Vector syntax depends on how you expose query vectors. A typical pattern is:
rag_vec_chunks to return nearest neighborsExample shape (conceptual):
-- Pseudocode: nearest neighbors for :query_embedding
SELECT
v.chunk_id,
v.distance
FROM rag_vec_chunks v
WHERE v.embedding MATCH :query_embedding
ORDER BY v.distance
LIMIT 10;
In production, ProxySQL MCP will typically compute the query embedding and call SQL internally with a bound parameter.
This document does not define full schemas (that is in mcp-tools.md), but it defines what each tool must do.
rag.search_fts(query, filters, k)rag.search_vector(query_text | query_embedding, filters, k)rag.search_hybrid(query, mode, filters, k, params)
rag.get_chunks(chunk_ids)rag.get_docs(doc_ids)rag.fetch_from_source(doc_ids | pk_json, columns?, limits?)MCP-first principle
Run GenAI retrieval in a dedicated ProxySQL instance to reduce blast radius:
k and candidate sizesrag_sync_state)At runtime, ProxySQL RAG retrieval is implemented as: