docs/changelog/0.17.0.mdx
COUNTWe're making rapid progress on our roadmap to fully push down aggregates to the BM25 index. The first aggregate we've pushed down
is COUNT.
This feature is in beta. To test it, first enable the feature flag:
SET paradedb.enable_aggregate_custom_scan TO ON;
With this feature enabled, any COUNT queries over a single table (no JOINs yet) where @@@ is present will be pushed down:
EXPLAIN SELECT COUNT(*) FROM mock_items
WHERE description @@@ 'shoes';
More aggregates (SUM, COUNT(DISTINCT), GROUP BY, etc.) are on their way!
Prior to this release, the compaction, or merging, of the index LSM tree happened in the foreground, blocking
INSERT/UPDATE transactions. While this is acceptable for smaller layers of the LSM tree, merging large layers can block transactions for
long period of time.
With this release, merging large layers happens in the background. This is configurable with the background_layer_sizes index option
and delivers significant improvements to write throughput for update-heavy tables.
Prior to this release, ParadeDB could only efficiently evaluate WHERE clauses if all the columns in those claused were present in the BM25 index.
If any clauses were not indexed, they would be applied as filters post index scan. Additionally, BM25 scoring and snippet
generation would be skipped if the WHERE clauses included non-indexed columns.
With this release, ParadeDB can now push down filters on non-indexed columns directly into the custom scan. This means:
Prior to this release, the BM25 index relied on the built-in Postgres free space map (FSM) to reclaim space during compaction. However, the Postgres FSM is not write-ahead logged. This means that if the instance terminates (i.e. during a failover), the FSM can get lost, preventing dead space in the index from being reclaimed by future writes.
To solve this, we implemented our own, write-ahead logged free space map that lives alongside the BM25 index. This FSM is also more optimized than the Postgres FSM for bulk writes, which has improved disk write patterns.
The full changelog is available on the GitHub Release.