v3/docs/adr/ADR-082-grid-search-retrieval-defaults.md
Status: Accepted — Implemented in ruflo 3.10.22 Date: 2026-05-30 Tracking: continuation of self-learning hardening cluster (ADR-077 → 078 → 079 → 080 → 081 → 082) Related: ADR-078 (hybrid retrieval), ADR-079 (multi-field BM25), ADR-080 (cross-encoder), ADR-081 (labelled corpus)
ADR-079's tuning (α=0.6, subjectWeight=3.0, mmrLambda=0.5) and ADR-080's tuning (hybridWeight=0.5, ceWeight=0.5) were both selected against the regex relevance proxy that ADR-081 then revealed was misleading. The defaults were never validated against ground truth.
ADR-081 shipped the labelled corpus. Now we can re-tune properly.
Built a grid-search harness (scripts/grid-search-retrieval.mjs) that sweeps the retrieval hyperparameter space against the ADR-081 labelled corpus and reports label nDCG@3, top-1, top-3, precision@3, MRR@3 per configuration.
Grid:
alpha ∈ {0.3, 0.5, 0.7}subjectWeight ∈ {2.0, 3.0, 5.0}mmrLambda ∈ {0.3, 0.5, 0.7}hybridWeight × ceWeight ∈ {(0.3, 0.7), (0.4, 0.6), (0.5, 0.5), (0.6, 0.4), (0.7, 0.3)} (rerank-only)32 configs total: 27 hybrid + 5 rerank.
The grid revealed:
subjectWeight × mmrLambda combinations. The BM25 signal carries more discriminating power than the bi-encoder cosine on this corpus.| Parameter | Old (ADRs 079-080) | New (ADR-082) | Why |
|---|---|---|---|
alpha | 0.6 | 0.5 | Grid: nDCG@3 0.900 → 0.963 |
subjectWeight | 3.0 | 2.0 | Grid: sw=2 dominates the row |
mmrLambda | 0.5 | 0.7 | Grid: mmr=0.7 beats 0.5 by ~0.02 nDCG |
bodyWeight | 1.0 | 1.0 | unchanged |
typePenaltyFactor | 1.0 | 1.0 | unchanged (opt-in) |
hybridWeight | 0.5 | 0.5 | unchanged pending joint re-grid |
ceWeight | 0.5 | 0.5 | unchanged pending joint re-grid |
Hybrid path (default, no opt-in rerank):
| Metric | 3.10.21 (old defaults) | 3.10.22 (ADR-082) | Δ |
|---|---|---|---|
| Label top-1 hit rate | 90% | 90% | tied |
| Label top-3 hit rate | 90% | 100% | +10pp |
| Label MRR@3 | 0.900 | 0.950 | +0.050 |
| Label precision@3 | 0.400 | 0.533 | +0.133 |
| Label nDCG@3 | 0.900 | 0.963 | +0.063 (+7%) |
| Label nDCG@5 | 0.875 | 0.938 | +0.063 |
| Avg query latency | 42 ms | 55 ms | +13 ms (still <100 ms) |
Rerank path (opt-in {rerank: true}, with new hybrid defaults underneath):
| Metric | 3.10.21 (old defaults) | 3.10.22 (ADR-082) | Δ |
|---|---|---|---|
| Label top-1 hit rate | 80% | 90% | +10pp |
| Label top-3 hit rate | 100% | 90% | -10pp |
| Label MRR@3 | 0.883 | 0.925 | +0.042 |
| Label precision@3 | 0.667 | 0.700 | +0.033 |
| Label nDCG@3 | 0.913 | 0.900 | -0.013 |
Rerank's trade-off: top-1/MRR/P3 up, nDCG@3/top-3 marginally down. Net: positive but not unambiguous. A joint re-grid (including hybridWeight/ceWeight × new α/sw) is tracked.
| Metric (labelled, canonical) | 3.10.17 | 3.10.19 | 3.10.20 | 3.10.22 |
|---|---|---|---|---|
| Label top-1 (hybrid) | 0% | 90% | 90% | 90% |
| Label top-3 (hybrid) | 0% | 90% | 90% | 100% |
| Label nDCG@3 (hybrid) | 0.000 | 0.900 | 0.900 | 0.963 |
| Label precision@3 (hybrid) | 0.000 | 0.400 | 0.400 | 0.533 |
| Label top-1 (rerank) | — | — | 80% | 90% |
| Label nDCG@3 (rerank) | — | — | 0.913 | 0.900 |
| Label precision@3 (rerank) | — | — | 0.667 | 0.700 |
scripts/grid-search-retrieval.mjs — sweeps the hyperparameter space, reports per-config metrics, picks winners by nDCG/top-1/precision@3. Re-runnable on any pretrained store. Includes --quick mode for fast iteration.src/mcp-tools/neural-tools.ts + schema descriptions.docs/benchmarks/runs/grid-search-retrieval-{ts,latest}.json with full config × metrics matrix.git clone https://github.com/ruvnet/ruflo && cd ruflo
npm install && ( cd v3/@claude-flow/cli && npx tsc )
# Pretrain
node v3/@claude-flow/cli/scripts/pretrain-from-github.mjs
# Grid-search (full grid, ~1 min)
cd v3/@claude-flow/cli && node scripts/grid-search-retrieval.mjs
# Confirm new defaults on canonical bench
BENCH_NO_WRITE=1 node scripts/benchmark-pretrained-retrieval.mjs # hybrid → nDCG@3 0.963
RERANK=1 BENCH_NO_WRITE=1 node scripts/benchmark-pretrained-retrieval.mjs # rerank → nDCG@3 0.900