Back to Ruflo

Vector Search

plugins/ruflo-agentdb/skills/vector-search/SKILL.md

3.6.304.4 KB
Original Source

Vector Search

Two distinct vector-search paths live in this plugin. Pick the right one — they're not interchangeable.

PathTool familyBackingCapacityLatency
Large-scale corpusembeddings_*@claude-flow/memory HNSW (Rust/Native)up to millions of vectors150×–12,500× faster than brute-force, depending on N and parameters
Hot-path routerruvllm_hnsw_*WASM-backed router (v2.0.1)~11 patterns max (ruvllm-tools.ts:58)sub-ms; designed for high-priority routing, not corpus search

The "12,500×" headline applies to the large-scale embeddings_search path. The WASM router is not that path.

When to use

NeedPath
Search a corpus of N ≥ 500 documentsembeddings_search
Memory-constrained corpus (≥5,000 vectors)RaBitQ quantized — see "Quantized search" below
Compare two stringsembeddings_compare
Hierarchical / taxonomic dataembeddings_hyperbolic (Poincare ball)
Route a query to one of ≤11 hot patternsruvllm_hnsw_route
Cross-namespace searchmemory_search_unified
  1. Check statusmcp__claude-flow__embeddings_status to verify the embedding engine.
  2. Initializemcp__claude-flow__embeddings_init if not active.
  3. Generatemcp__claude-flow__embeddings_generate for text input.
  4. Searchmcp__claude-flow__embeddings_search with the query.
  5. Comparemcp__claude-flow__embeddings_compare to measure similarity.
  6. Unified searchmcp__claude-flow__memory_search_unified for cross-namespace.

Quantized search (32× memory reduction)

For corpora ≥5,000 vectors and/or memory-constrained environments, use the RaBitQ 1-bit quantization workflow. Below 5,000 vectors the rebuild cost outweighs the savings — use the standard path instead.

StepToolPurpose
1embeddings_initEngine warm
2embeddings_rabitq_buildOne-time build of the 1-bit index after corpus is loaded
3embeddings_rabitq_searchHamming-prefilter returns top-N candidate IDs (cheap)
4embeddings_searchOptional exact rerank on the candidate set (full-precision)
5embeddings_rabitq_statusIndex health, memory footprint, build time

Note: embeddings_rabitq_search returns candidate IDs only — the rerank in step 4 is the user's responsibility (mirrors the docstring at embeddings-tools.ts:911). Without rerank, results are approximate; with rerank, you get full-precision quality at 32× lower memory.

Tuning

HNSW exposes three knobs that trade recall against latency. The "12,500×" headline assumes defaults; tune deliberately for your workload:

ProfileefSearchMWhen to use
recall-first20032Pattern recall during planning; quality matters more than ms
balanced (default)6416General-purpose semantic recall
latency-first168Hot-path routing where p99 latency matters

efSearch is passed via ruvllm_hnsw_create (ruvllm-tools.ts:64). M is registry-level today; raise as a follow-up if it should be MCP-tunable. efConstruction defaults to 200 in the lite index (hnsw-index.ts:537).

HNSW pattern router (WASM, ≤11 patterns)

For routing a small number of high-priority patterns:

  • mcp__claude-flow__ruvllm_hnsw_create — create the WASM index (cap ~11)
  • mcp__claude-flow__ruvllm_hnsw_add — add a pattern
  • mcp__claude-flow__ruvllm_hnsw_route — route an incoming query

This is not a corpus index. Treat it as a fast classifier over a curated set of patterns.

Hyperbolic embeddings

For hierarchical data (code trees, org charts), use mcp__claude-flow__embeddings_hyperbolic which maps to Poincare ball space. Distance is geodesic, not cosine.

CLI alternative

bash
npx @claude-flow/cli@latest embeddings search --query "authentication patterns"
npx @claude-flow/cli@latest embeddings init
npx @claude-flow/cli@latest memory search --query "your query"

Performance

MethodSpeed
Brute-force scanBaseline
HNSW (n=500, balanced)~150× faster
HNSW (n=10,000, balanced)~12,500× faster
RaBitQ + rerank (n=10,000)~12,500× search speed at 32× lower memory
ruvllm_hnsw_route (n≤11)sub-ms per route, fixed cost