Back to Ruflo

ADR-090 — BGE Query Prefix: Mixed Results (Per-Dataset Win, Not a Default)

v3/docs/adr/ADR-090-bge-query-prefix-mixed.md

3.10.302.9 KB
Original Source

ADR-090 — BGE Query Prefix: Mixed Results (Per-Dataset Win, Not a Default)

Status: Accepted — Implemented in ruflo 3.10.29 (opt-in via BGE_QUERY_PREFIX=1) Date: 2026-05-30 Related: ADR-085 (BEIR harness), ADR-089 (3-dataset summary)

Context

BAAI's BGE-en-v1.5 documentation recommends prepending Represent this sentence for searching relevant passages: to query embeddings (only — not documents). The 0.030 gap between our BGE-large NFCorpus measurement (0.350) and BAAI's published number (0.380) made the prefix a likely partial cause.

Decision

Add embedQuery() method to the BGE embedder, exported BGE_QUERY_PREFIX constant, and BGE_QUERY_PREFIX=1 env flag wired through both BEIR runners. Default off; opt-in only after seeing the per-dataset data below.

Measured proof (3 BEIR datasets, dense-alone path, same BGE-base-en-v1.5)

DatasetNO prefixWITH prefixΔDirection
NFCorpus0.35170.3604+0.0087✓ helps
SciFact0.62560.6186-0.0070✗ hurts
ArguAna0.43110.4345+0.0034~ noise

Mixed result. The prefix is not a free win. Likely reason: NFCorpus queries are question-shaped ("Do cholesterol statin drugs cause breast cancer?"), which fits the prefix's "searching relevant passages" framing. SciFact queries are claim-shaped ("Statin use lowers cancer mortality") — the prefix's question-framing may mis-cue the encoder. ArguAna queries are argument-shaped (counter-arguments) — the prefix is neutral.

Decision: opt-in, not default

Because the prefix hurts SciFact (and SciFact is a major BEIR dataset where BM25 dominates dense), we cannot ship it as a default. Callers can enable per-deployment:

bash
BGE_QUERY_PREFIX=1 node scripts/run-beir-bge.mjs

The flag flows through run-beir-bge.mjs and run-beir-hybrid.mjs. The embedQuery() method is wired into the embedder type so future callers can use it programmatically.

Honest limits

  • N=3 datasets. A larger BEIR sweep would tighten the per-dataset characterization.
  • Question-vs-claim hypothesis is hand-waved. Real analysis would cluster queries by syntactic shape and measure the prefix effect within each cluster.
  • The prefix was designed for BAAI's published unquantized model. Our Xenova int8-quantized model may respond differently.

What ships

  • src/memory/bge-embedder.ts — adds embedQuery(text) method + exports BGE_QUERY_PREFIX
  • scripts/run-beir-bge.mjsBGE_QUERY_PREFIX=1 env flag
  • scripts/run-beir-hybrid.mjs — same flag

Verification

bash
# Reproduce all three numbers
for ds in nfcorpus scifact arguana; do
  cd /tmp/beir-$ds
  echo "=== $ds NO prefix ==="
  node /path/to/scripts/run-beir-bge.mjs | grep -E "^  nDCG@10"
  echo "=== $ds WITH prefix ==="
  BGE_QUERY_PREFIX=1 node /path/to/scripts/run-beir-bge.mjs | grep -E "^  nDCG@10"
done