pg_search/src/postgres/customscan/joinscan/README.md
JoinScan intercepts PostgreSQL join planning and replaces the standard executor with a DataFusion-based pipeline that operates entirely on Tantivy's columnar fast fields. The core strategy is late materialization: execute the join using only index data, apply sorting and limits, then access the PostgreSQL heap only for the final K result rows.
For a typical SELECT ... FROM files JOIN documents ... ORDER BY title LIMIT K:
ProjectionExec
TantivyLookupExec ← materializes deferred strings for final K rows only
SegmentedTopKExec ← global threshold pruning + final sort + LIMIT K
HashJoinExec ← join on fast fields
PgSearchScan (documents) ← BM25 search
PgSearchScan (files) ← lazy scan, deferred columns, receives dynamic filters
SegmentedTopKExec publishes dynamic filter thresholds that are pushed down through the join to the probe-side scan, pruning rows at the scanner level. It also performs the final materialized sort and LIMIT, so TantivyLookupExec only decodes K rows (not K×segments).
JoinScan fires when all conditions are met: LIMIT present, equi-join keys exist, all columns are fast fields, all tables have BM25 indexes, and at least one @@@ predicate. See create_custom_path() for the full checklist.
The planner hook builds a JoinCSClause — a serializable IR capturing the RelNode join tree, predicates, ORDER BY, and LIMIT. This is stored in CustomScan.custom_private and deserialized at execution time.
build.rs — RelNode, JoinCSClause, JoinSourceplanning.rs — cost estimation, field validationpredicate.rs — Postgres expression translationprivdat.rs — serializationscan_state.rs builds a DataFusion logical plan from the JoinCSClause, then runs physical optimization:
SortMergeJoinEnforcer — converts HashJoin to SortMergeJoin when inputs are pre-sortedLateMaterializationRule — injects TantivyLookupExec to defer string materializationSegmentedTopKRule — injects SegmentedTopKExec for Top K on deferred columns, removes the now-redundant SortExec(TopK), wraps blocking nodes with FilterPassthroughExecSegmentedTopKExec's DynamicFilterPhysicalExpr down to the scanString columns are emitted as a 3-way UnionArray (doc_address | term_ordinal | materialized) so intermediate nodes work with cheap integer ordinals instead of decoded strings. The decision to defer is made in configure_deferred_outputs().
There are two primary pruning mechanisms for dynamic filters that are pushed down to the scan:
Query-Time Pushdown (Inverted Index): Filters that are static and known at the start of the scan (such as InList predicates generated from a HashJoin build side) are intercepted during the first poll_next of the scan stream. They are converted into native Tantivy queries (e.g., TermSetQuery) and ANDed into the main search query via try_dynamic_filter_pushdown. This allows Tantivy to use its inverted index to filter documents while executing the search, providing the highest possible pruning performance. The DataFusion expressions are then rewritten to lit(true) so they are not evaluated again.
Pre-Filter Pushdown (Fast Fields): For evolving thresholds, such as the global threshold from SegmentedTopKExec, the threshold is pushed down to the scan via filter pushdown. This works because SegmentedTopKExec and PgSearchScan share an Arc<DynamicFilterPhysicalExpr>. The scanner reads current() on every batch and applies the filter after the search but before Arrow column materialization. For strings, it translates literals to per-segment ordinal bounds via try_rewrite_binary and filters directly against the fetched term ordinals.
After all input is consumed, SegmentedTopKExec materializes sort column values, performs the final sort, and emits exactly K rows. TantivyLookupExec decodes deferred strings for those K rows only. JoinScanState extracts CTIDs and fetches heap tuples — the only point where the PostgreSQL heap is accessed.
| File | Purpose |
|---|---|
mod.rs | Lifecycle, activation checks, parallel support |
build.rs | RelNode, JoinCSClause, JoinSource |
scan_state.rs | DataFusion plan building, optimizer registration, result streaming |
planner.rs | SortMergeJoinEnforcer, FilterPassthroughExec usage |
planning.rs | Cost estimation, field validation, ORDER BY extraction |
predicate.rs | Postgres expression → JoinLevelExpr |
translator.rs | Postgres ↔ DataFusion expression mapping |
explain.rs | EXPLAIN output formatting |
Execution-layer files under pg_search/src/scan/:
| File | Purpose |
|---|---|
segmented_topk_exec.rs | SegmentedTopKExec — per-segment heaps, global heap, build_global_filter_expression |
segmented_topk_rule.rs | Optimizer rule, wrap_blocking_nodes |
tantivy_lookup_exec.rs | Dictionary decode + filter passthrough |
filter_passthrough_exec.rs | Transparent wrapper enabling filter pushdown through blocking nodes |
batch_scanner.rs | Scanner::next() — batch iteration, pre-filter, visibility |
execution_plan.rs | PgSearchScanPlan — dynamic filter integration |
pre_filter.rs | try_rewrite_binary, collect_filters |
deferred_encode.rs | 3-way UnionArray construction and unpacking |
| GUC | Default | Effect |
|---|---|---|
paradedb.enable_join_custom_scan | on | Master switch |
paradedb.enable_segmented_topk | true | SegmentedTopKExec injection |
paradedb.enable_columnar_sort | true | Enables SortMergeJoin path |