docs/documentation/aggregates/limitations.mdx
In order for ParadeDB to push down an aggregate, a ParadeDB text search operator must be present in the query.
<CodeGroup> ```sql SQL -- Not pushed down SELECT COUNT(id) FROM mock_items WHERE rating = 5;-- Pushed down SELECT COUNT(id) FROM mock_items WHERE rating = 5 AND id @@@ pdb.all();
```python Django
from paradedb import All, ParadeDB
# Not pushed down — no ParadeDB operator
MockItem.objects.filter(rating=5).count()
# Pushed down — ParadeDB operator triggers aggregate pushdown
MockItem.objects.filter(rating=5, id=ParadeDB(All())).count()
from sqlalchemy import func, select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import search
# Not pushed down.
count_without_operator_stmt = select(func.count(MockItem.id)).where(MockItem.rating == 5)
# Pushed down.
count_with_operator_stmt = select(func.count(MockItem.id)).where(
MockItem.rating == 5,
search.all(MockItem.id),
)
with Session(engine) as session:
{
"count_without_operator": session.execute(count_without_operator_stmt).scalar_one(),
"count_with_operator": session.execute(count_with_operator_stmt).scalar_one(),
}
# Not pushed down — no ParadeDB operator
MockItem.where(rating: 5).count
# Pushed down — ParadeDB operator triggers aggregate pushdown
MockItem.search(:id).match_all.where(rating: 5).count
If your query does not contain a ParadeDB operator, a way to "force" aggregate pushdown is to append the all query to the query's
WHERE clause.
Aggregate pushdown works across joins as well as single tables. When every participating table has a BM25 index and the custom aggregate scan is enabled, ParadeDB computes the result directly from the index's columnar storage, without scanning the underlying table rows.
SET paradedb.enable_aggregate_custom_scan TO on;
The following join shapes are supported:
| Feature | Supported |
|---|---|
| Join types | INNER, LEFT, RIGHT, FULL OUTER |
| Number of tables | Two or more (arbitrary join trees) |
| Aggregate functions | COUNT, COUNT(DISTINCT ...), SUM, SUM(DISTINCT ...), AVG, AVG(DISTINCT ...), MIN, MAX, STDDEV, STDDEV_POP, VARIANCE, VAR_POP, BOOL_AND, BOOL_OR, ARRAY_AGG, STRING_AGG |
GROUP BY | Columns from any table in the join, including JSON sub-fields via metadata->>'key' |
HAVING clause | Comparisons against aggregate results and group columns |
Per-aggregate FILTER (WHERE ...) | Yes |
ORDER BY ... LIMIT K | Pushed down as TopK when there is a single ORDER BY column targeting an aggregate, a group column, or MIN(col) / MAX(col) |
ORDER BY inside STRING_AGG / ARRAY_AGG | Yes (produces deterministic element ordering) |
ParadeDB falls back to native Postgres execution when any of the following are true:
CROSS JOIN)GROUP BY columns, or aggregate arguments are not indexed columnsOVER ...), ROLLUP, CUBE, GROUPING SETS, LATERAL, or DISTINCT ONGROUP BY uses a scalar function like date_trunc(...) or lower(...) (JSON sub-field access via ->> is supported)COALESCE(SUM(...), 0) or a castpdb.agg() (use standard SQL aggregate functions instead)When a fallback happens, the query still runs correctly through Postgres' native planner. ParadeDB simply does not accelerate it.
<Note> Aggregate pushdown across joins is currently single-threaded. Parallel execution is on the [roadmap](/welcome/roadmap). </Note>NUMERIC columns do not support aggregate pushdown. Queries with aggregates on NUMERIC columns will automatically fall back to PostgreSQL for aggregation.
For numeric data that requires aggregate pushdown, use FLOAT or DOUBLE PRECISION instead:
-- Aggregates can be pushed down
CREATE TABLE products (
id SERIAL PRIMARY KEY,
price DOUBLE PRECISION
);
-- Aggregates fall back to PostgreSQL
CREATE TABLE products (
id SERIAL PRIMARY KEY,
price NUMERIC(10,2)
);