rust/index/src/sparse/maxscore.md
This document walks through the query algorithm implemented in maxscore.rs.
It is a windowed, block-max variant of the classic MaxScore algorithm
(Turtle & Flood, 1995) adapted for our blocked posting-list layout.
A sparse vector query computes the dot-product between a query vector and every stored document vector. Naively this means touching every document for every query dimension — far too slow at 100M+ documents. MaxScore avoids this by proving that large swaths of documents cannot make the top-k, and skipping them entirely.
Each dimension's posting list is split into fixed-size blocks of
(offset, weight) entries (default 1024 per block), sorted by document offset.
Alongside the blocks a directory block stores per-block metadata:
Dimension 42
├── Block 0: [(off=0, w=0.3), (off=5, w=0.9), ...] max_offset=127, max_weight=0.9
├── Block 1: [(off=130, w=0.1), ...] max_offset=255, max_weight=0.6
└── Directory: max_offsets=[127, 255], max_weights=[0.9, 0.6]
The max_weight per block is the key to skipping: it lets us compute a tight
upper bound on how much any document in that block can contribute to a score,
without decompressing the block.
for each window of 4096 doc-IDs:
1. Partition query terms into ESSENTIAL vs NON-ESSENTIAL
2. Drain essential terms into an accumulator (Phase 1)
3. Merge-join non-essential terms with candidates (Phase 2)
4. Push surviving candidates into a top-k min-heap (Phase 3)
Open a PostingCursor for each query dimension that exists in the index.
Sort terms by max_score = query_weight × dimension_max (ascending).
Initialize a min-heap of size k and threshold = -∞.
For the current window [start, start+4095], recompute each term's
window upper bound — the max block-level weight across blocks overlapping
this window, multiplied by the query weight.
Re-sort terms by window score (ascending). Walk from smallest to largest,
accumulating a prefix sum. The first term whose prefix sum ≥ threshold
becomes the split point:
terms (sorted by window_score):
[ t_A=0.1, t_B=0.2, t_C=0.4, t_D=0.8 ]
prefix sums: 0.1 0.3 0.7 1.5
^
threshold=0.6
──────────── ──────────
non-essential essential
Essential terms (right of the split) might push documents into the top-k on their own, so we must score every document they contain. Non-essential terms (left of the split) are too weak — even their maximum possible contribution combined can't promote a zero-score document above the threshold.
Each essential term's cursor walks its blocks within the window, writing
accum[doc - window_start] += query_weight × value into a flat 4096-slot
accumulator array. A companion 64-word bitmap ([u64; 64], one bit per
slot) tracks which slots were touched.
Why a bitmap? After draining, we need to collect the touched slots
into sorted candidate arrays and later reset only those slots to zero.
Scanning the bitmap is more efficient than a full scan of the window —
the bitmap ([u64; 64]) makes both operations proportional to the number
of candidates rather than the window width.
window_start = 8192 window_end = 12287
┌─────────────────────────────────────────────────────────────────┐
│ accum[0..4095] (f32 × 4096) │
│ [0.0] [0.0] [0.72] [0.0] [0.0] [1.05] [0.0] [0.0] [0.31] ...│
└───┬──────┬─────┬──────┬─────┬──────┬─────┬──────┬─────┬────────┘
│ │ │ │ │ │ │ │ │
0 1 2 3 4 5 6 7 8 ← slot index
(doc = slot + 8192)
┌──────────────────────────────────────────────────────┐
│ bitmap[0] (u64, bits 0..63) │
│ ...0 0 1 0 0 1 0 0 1 0 0 ... │
│ ↑ ↑ ↑ │
│ bit 8 bit 5 bit 2 │
└──────────────────────────────────────────────────────┘
drain_essential (term "cat", qw=1.0):
cursor has doc 8194 (val 0.36) → slot 2: accum[2] += 1.0 × 0.36
bitmap[0] |= (1 << 2)
cursor has doc 8197 (val 0.53) → slot 5: accum[5] += 1.0 × 0.53
bitmap[0] |= (1 << 5)
drain_essential (term "food", qw=0.5):
cursor has doc 8194 (val 0.72) → slot 2: accum[2] += 0.5 × 0.72 (now 0.72)
cursor has doc 8200 (val 0.62) → slot 8: accum[8] += 0.5 × 0.62
bitmap[0] |= (1 << 8)
candidate extraction (bitmap walk):
word 0 = ...101000100₂
trailing_zeros() → bit 2 → cand_docs=[8194], cand_scores=[0.72]
clear bit, trailing_zeros() → bit 5 → cand_docs=[8194,8197], ...
clear bit, trailing_zeros() → bit 8 → cand_docs=[8194,8197,8200], ...
word 0 = 0 → next word ...
selective reset:
same walk zeros accum[2], accum[5], accum[8]; clears bitmap words
u64,
trailing_zeros() pops set bits in constant time, yielding only the
occupied slots in doc-id order.accum entries and clears the bitmap words,
avoiding a full 4096-entry memset.When the window is densely filled (many essential terms, dense posting lists) the bitmap scan approaches a linear sweep — no worse than brute force. When it's sparsely filled (common for high-dimensional SPLADE vectors where each term covers a small fraction of documents) the savings are significant.
This is a pure sequential scan — no random access, very cache-friendly.
Extract candidates from the bitmap into sorted cand_docs[] / cand_scores[]
arrays. Then process non-essential terms from strongest to weakest:
filter_competitive.cand_docs, adding query_weight × value to matching entries.Terms with window_score = 0 are skipped. If all candidates are pruned, the
remaining non-essential terms are skipped entirely.
Walk the surviving cand_docs / cand_scores. Push any candidate that beats
the threshold (or the heap isn't full yet) into the min-heap. If the heap
overflows past k, pop the minimum. Update the threshold from the new minimum.
Finally, bitmap-guided zeroing resets only the touched accumulator slots (not all 4096), keeping per-window cleanup O(touched) instead of O(window).
Advance window_start by 4096 and loop. After all windows, drain the heap
and sort descending by score.
| Technique | Effect |
|---|---|
| Window accumulator | Dense array + bitmap avoids hash-map overhead |
| Essential/non-essential split | Weak terms skip most documents entirely |
| Per-window repartition | Split adapts as threshold tightens |
| Budget pruning | Candidates are eliminated before scoring weak terms |
| Block-max upper bounds | Entire blocks are skipped when their max is too low |
| Bitmap-guided cleanup | Only touched slots are zeroed per window |
A small end-to-end trace showing every step. We search for
query = {cat: 1.0, food: 0.5, cute: 0.3} with k = 2.
doc 0: {cat: 0.9, cute: 0.4}
doc 1: {food: 0.8}
doc 2: {cat: 0.5, food: 0.6, cute: 0.7}
doc 3: {cat: 0.2, cute: 0.1}
doc 4: {food: 0.3}
Posting lists (one block each, all fit in a single window):
dim "cat": [(0, 0.9), (2, 0.5), (3, 0.2)] dim_max = 0.9
dim "food": [(1, 0.8), (2, 0.6), (4, 0.3)] dim_max = 0.8
dim "cute": [(0, 0.4), (2, 0.7), (3, 0.1)] dim_max = 0.7
Open cursors. Compute max_score = query_weight × dim_max:
term qw dim_max max_score
cute 0.3 0.7 0.21
food 0.5 0.8 0.40
cat 1.0 0.9 0.90
Sort ascending by max_score: [cute=0.21, food=0.40, cat=0.90].
Heap is empty, threshold = -∞.
All blocks fall in this window, so window_score = max_score for each
term. Prefix sums:
cute food cat
score 0.21 0.40 0.90
prefix 0.21 0.61 1.51
↑
threshold = -∞, so prefix ≥ threshold immediately at cute
Threshold is -∞, so every prefix sum ≥ threshold — the split is at
index 0. All three terms are essential (this is typical for the first
window before any candidates enter the heap).
Process each essential term, accumulating into accum[]:
drain cute (qw=0.3):
doc 0, val 0.4 → slot 0: accum[0] += 0.3 × 0.4 = 0.12 bitmap set bit 0
doc 2, val 0.7 → slot 2: accum[2] += 0.3 × 0.7 = 0.21 bitmap set bit 2
doc 3, val 0.1 → slot 3: accum[3] += 0.3 × 0.1 = 0.03 bitmap set bit 3
drain food (qw=0.5):
doc 1, val 0.8 → slot 1: accum[1] += 0.5 × 0.8 = 0.40 bitmap set bit 1
doc 2, val 0.6 → slot 2: accum[2] += 0.5 × 0.6 = 0.30 (now 0.51)
doc 4, val 0.3 → slot 4: accum[4] += 0.5 × 0.3 = 0.15 bitmap set bit 4
drain cat (qw=1.0):
doc 0, val 0.9 → slot 0: accum[0] += 1.0 × 0.9 = 0.90 (now 1.02)
doc 2, val 0.5 → slot 2: accum[2] += 1.0 × 0.5 = 0.50 (now 1.01)
doc 3, val 0.2 → slot 3: accum[3] += 1.0 × 0.2 = 0.20 (now 0.23)
Accumulator state:
slot: 0 1 2 3 4
accum: 1.02 0.40 1.01 0.23 0.15
bitmap: 1 1 1 1 1 (bits 0-4 set)
Walk set bits → build sorted candidate arrays:
cand_docs = [0, 1, 2, 3, 4 ]
cand_scores = [1.02, 0.40, 1.01, 0.23, 0.15]
No non-essential terms this window (all were essential), so this phase is skipped.
Walk candidates, pushing into the min-heap:
doc 0, score 1.02 → heap [1.02] (heap not full)
doc 1, score 0.40 → heap [0.40, 1.02] (heap full, threshold = 0.40)
doc 2, score 1.01 → 1.01 > 0.40 → push, pop min
heap [1.01, 1.02] (threshold = 1.01)
doc 3, score 0.23 → 0.23 < 1.01 → skip
doc 4, score 0.15 → 0.15 < 1.01 → skip
Bitmap walk zeros accum[0..4], clears bitmap. Ready for next window
(none remain).
Drain heap, sort descending:
rank 1: doc 0, score 1.02
rank 2: doc 2, score 1.01
Verify by brute force:
doc 0: 1.0×0.9 + 0.3×0.4 = 1.02 ✓
doc 1: 0.5×0.8 = 0.40
doc 2: 1.0×0.5 + 0.5×0.6 + 0.3×0.7 = 1.01 ✓
doc 3: 1.0×0.2 + 0.3×0.1 = 0.23
doc 4: 0.5×0.3 = 0.15