plugins/ruflo-agentdb/skills/vector-search/SKILL.md
Two distinct vector-search paths live in this plugin. Pick the right one — they're not interchangeable.
| Path | Tool family | Backing | Capacity | Latency |
|---|---|---|---|---|
| Large-scale corpus | embeddings_* | @claude-flow/memory HNSW (Rust/Native) | up to millions of vectors | ~1.9× at N=20k, ~3.2×–4.7× at N=5k vs brute-force (measured; recall@10 ≈ 0.99). ANN wins above the crossover |
| Hot-path router | ruvllm_hnsw_* | WASM-backed router (v2.0.1) | ~11 patterns max (ruvllm-tools.ts:58) | sub-ms; designed for high-priority routing, not corpus search |
The "12,500×" headline applies to the large-scale embeddings_search path. The WASM router is not that path.
| Need | Path |
|---|---|
| Search a corpus of N ≥ 500 documents | embeddings_search |
| Memory-constrained corpus (≥5,000 vectors) | RaBitQ quantized — see "Quantized search" below |
| Compare two strings | embeddings_compare |
| Hierarchical / taxonomic data | embeddings_hyperbolic (Poincare ball) |
| Route a query to one of ≤11 hot patterns | ruvllm_hnsw_route |
| Cross-namespace search | memory_search_unified |
mcp__claude-flow__embeddings_status to verify the embedding engine.mcp__claude-flow__embeddings_init if not active.mcp__claude-flow__embeddings_generate for text input.mcp__claude-flow__embeddings_search with the query.mcp__claude-flow__embeddings_compare to measure similarity.mcp__claude-flow__memory_search_unified for cross-namespace.For corpora ≥5,000 vectors and/or memory-constrained environments, use the RaBitQ 1-bit quantization workflow. Below 5,000 vectors the rebuild cost outweighs the savings — use the standard path instead.
| Step | Tool | Purpose |
|---|---|---|
| 1 | embeddings_init | Engine warm |
| 2 | embeddings_rabitq_build | One-time build of the 1-bit index after corpus is loaded |
| 3 | embeddings_rabitq_search | Hamming-prefilter returns top-N candidate IDs (cheap) |
| 4 | embeddings_search | Optional exact rerank on the candidate set (full-precision) |
| 5 | embeddings_rabitq_status | Index health, memory footprint, build time |
Note:
embeddings_rabitq_searchreturns candidate IDs only — the rerank in step 4 is the user's responsibility (mirrors the docstring atembeddings-tools.ts:911). Without rerank, results are approximate; with rerank, you get full-precision quality at 32× lower memory.
HNSW exposes three knobs that trade recall against latency. The "12,500×" headline assumes defaults; tune deliberately for your workload:
| Profile | efSearch | M | When to use |
|---|---|---|---|
recall-first | 200 | 32 | Pattern recall during planning; quality matters more than ms |
balanced (default) | 64 | 16 | General-purpose semantic recall |
latency-first | 16 | 8 | Hot-path routing where p99 latency matters |
efSearch is passed via ruvllm_hnsw_create (ruvllm-tools.ts:64). M is registry-level today; raise as a follow-up if it should be MCP-tunable. efConstruction defaults to 200 in the lite index (hnsw-index.ts:537).
For routing a small number of high-priority patterns:
mcp__claude-flow__ruvllm_hnsw_create — create the WASM index (cap ~11)mcp__claude-flow__ruvllm_hnsw_add — add a patternmcp__claude-flow__ruvllm_hnsw_route — route an incoming queryThis is not a corpus index. Treat it as a fast classifier over a curated set of patterns.
For hierarchical data (code trees, org charts), use mcp__claude-flow__embeddings_hyperbolic which maps to Poincare ball space. Distance is geodesic, not cosine.
npx @claude-flow/cli@latest embeddings search --query "authentication patterns"
npx @claude-flow/cli@latest embeddings init
npx @claude-flow/cli@latest memory search --query "your query"
Measured numbers (source: scripts/benchmark-intelligence.mjs, ruvector NAPI backend; recall@10 ≈ 0.99). The older "150×–12,500×" figures were brute-force-fallback artifacts and have been retired — see project CLAUDE.md "V3 Performance Targets".
| Method | Measured speedup vs brute-force |
|---|---|
| Brute-force scan | Baseline |
| HNSW (N=5,000) | ~3.2×–4.7× faster |
| HNSW (N=20,000) | ~1.9× faster |
| HNSW (below crossover, small N) | ties/loses vs brute-force |
| RaBitQ quantization | 32× memory reduction; 0.60 ms/query at N≈14.7k |
ruvllm_hnsw_route (n≤11) | sub-ms per route, fixed cost |