v3/docs/adr/ADR-083-joint-rerank-grid.md
Status: Accepted — Implemented in ruflo 3.10.23 Date: 2026-05-30 Tracking: continuation of self-learning hardening cluster (ADR-077 → 078 → 079 → 080 → 081 → 082 → 083) Related: ADR-082 (single-axis grid-search)
ADR-082 grid-searched the hybrid (non-rerank) hyperparameter space against the ADR-081 labelled corpus and shipped α=0.5, sw=2, mmr=0.7 — pushing hybrid nDCG@3 from 0.900 to 0.963. The rerank path was left at hw=0.5/cw=0.5 because the original rerank grid had been tested against the OLD α=0.6/sw=3.0 baseline; with α/sw shifting underneath, the joint optimum needed measuring.
ADR-082 flagged this explicitly: "Rerank's trade-off: top-1/MRR/P3 up, nDCG@3/top-3 marginally down. Net: positive but not unambiguous. A joint re-grid (including hybridWeight/ceWeight × new α/sw) is tracked."
This is that re-grid.
Extended scripts/grid-search-retrieval.mjs with a joint rerank sweep:
hybridWeight × ceWeight ∈ {(0.2,0.8), (0.3,0.7), (0.4,0.6), (0.5,0.5), (0.6,0.4), (0.7,0.3), (0.8,0.2)}alpha ∈ {0.3, 0.5} (for the underlying hybrid stage when rerank is on)subjectWeight ∈ {2.0, 3.0}7 × 2 × 2 = 28 rerank configs, ~25 min wall-clock with the cross-encoder.
The rerank path wants different hybrid sub-params than the non-rerank path:
| Path | Best α | Best sw | Best mmr | Best hw/cw | nDCG@3 |
|---|---|---|---|---|---|
| Non-rerank (hybrid only) | 0.5 | 2.0 | 0.7 | — | 0.963 |
| Rerank | 0.5 | 3.0 | 0.7 | hw=0.7 cw=0.3 | 0.963 |
This makes intuitive sense: when the cross-encoder is doing the semantic understanding downstream, the hybrid stage can be more keyword-focused (higher subject weight). When hybrid is the final stage, lower subject weight gives the body tokens room to contribute relevance signal.
Implementation: subjectWeight default is now conditional on useRerank (3.0 if reranking, 2.0 otherwise). Explicit subjectWeight param still overrides.
| Parameter | Non-rerank | Rerank | Change vs 3.10.22 |
|---|---|---|---|
alpha | 0.5 | 0.5 | unchanged |
subjectWeight | 2.0 | 3.0 | rerank +1.0 (conditional default) |
mmrLambda | 0.7 | 0.7 | unchanged |
hybridWeight | — | 0.7 | rerank: 0.5 → 0.7 |
ceWeight | — | 0.3 | rerank: 0.5 → 0.3 |
Rerank path:
| Metric | 3.10.22 (old joint defaults) | 3.10.23 (ADR-083) | Δ |
|---|---|---|---|
| Label top-1 hit rate | 90% | 90% | tied |
| Label top-3 hit rate | 90% | 100% | +10pp |
| Label MRR@3 | 0.925 | 0.950 | +0.025 |
| Label precision@3 | 0.700 | 0.700 | tied |
| Label nDCG@3 | 0.900 | 0.963 | +0.063 (+7%) |
| Label nDCG@5 | 0.904 | 0.944 | +0.040 |
Hybrid (non-rerank) path: unchanged from 3.10.22 (nDCG@3 = 0.963).
| Metric (labelled) | 3.10.17 cosine | 3.10.19 hybrid | 3.10.20 +rerank | 3.10.22 | 3.10.23 |
|---|---|---|---|---|---|
| Hybrid nDCG@3 | 0.000 | 0.900 | 0.900 | 0.963 | 0.963 |
| Rerank nDCG@3 | — | — | 0.913 | 0.900 | 0.963 |
| Hybrid top-3 | 0% | 90% | 90% | 100% | 100% |
| Rerank top-3 | — | — | 100% | 90% | 100% |
| Rerank precision@3 | — | — | 0.667 | 0.700 | 0.700 |
Both retrieval paths now hit nDCG@3 = 0.963 on this corpus. The choice between them is now purely cost vs richness:
| Rank | Config | top-1 | top-3 | nDCG | P3 | MRR |
|---|---|---|---|---|---|---|
| 1 | hybrid α=0.5 sw=2 mmr=0.7 | 90% | 100% | 0.963 | 0.533 | 0.950 |
| 1 | rerank hw=0.7 cw=0.3 α=0.5 sw=3 | 90% | 100% | 0.963 | 0.700 | 0.950 |
| 1 | rerank hw=0.8 cw=0.2 α=0.3 sw=3 | 90% | 100% | 0.963 | 0.767 | 0.950 |
| 4 | hybrid α=0.5 sw=3 mmr=0.7 | 90% | 100% | 0.955 | 0.533 | 0.950 |
| 4 | rerank hw=0.8 cw=0.2 α=0.5 sw=3 | 90% | 100% | 0.955 | 0.700 | 0.950 |
Three configs tied at the corpus ceiling. Picked hw=0.7 cw=0.3 over hw=0.8 cw=0.2 because the latter underweights the cross-encoder's contribution (cw=0.2 leaves the CE doing almost nothing).
scripts/grid-search-retrieval.mjs — extended with joint rerank sweep (28 configs across hw/cw × α × sw). Re-runnable on any corpus.src/mcp-tools/neural-tools.ts — subjectWeight default depends on useRerank.docs/benchmarks/runs/grid-search-retrieval-latest.json with the full 48-config matrix.subjectWeight get consistent behaviour, but callers relying on defaults see different values based on rerank flag. This is documented in the schema but is a minor surprise.git clone https://github.com/ruvnet/ruflo && cd ruflo
npm install && ( cd v3/@claude-flow/cli && npx tsc )
node v3/@claude-flow/cli/scripts/pretrain-from-github.mjs
# Joint grid (~25 min)
cd v3/@claude-flow/cli && node scripts/grid-search-retrieval.mjs
# Verify new defaults
BENCH_NO_WRITE=1 node scripts/benchmark-pretrained-retrieval.mjs # hybrid → nDCG@3 0.963
RERANK=1 BENCH_NO_WRITE=1 node scripts/benchmark-pretrained-retrieval.mjs # rerank → nDCG@3 0.963 (was 0.900)