docs/documentation/sorting/topk.mdx
ParadeDB is highly optimized for quickly returning the Top K results out of the index. In SQL, this means queries that contain an ORDER BY...LIMIT:
from paradedb import Match, ParadeDB
MockItem.objects.filter(
description=ParadeDB(Match('running shoes', operator='OR'))
).order_by('rating').values('description', 'rating', 'category')[:5]
from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import search
stmt = (
select(MockItem.description, MockItem.rating, MockItem.category)
.where(search.match_any(MockItem.description, "running shoes"))
.order_by(MockItem.rating)
.limit(5)
)
with Session(engine) as session:
session.execute(stmt).all()
MockItem.search(:description)
.matching_any("running shoes")
.order(:rating)
.select(:description, :rating, :category)
.limit(5)
In order for a Top K query to be executed by ParadeDB vs. vanilla Postgres, all of the following conditions must be met:
ORDER BY fields must be indexed. If they are text fields, they must use the literal tokenizer.ORDER BY...LIMIT.LIMIT.lower, ordering by expressions is not supported -- only the raw fields themselves.To verify that ParadeDB is executing the Top K, look for a Custom Scan with a TopKScanExecState in the EXPLAIN output:
EXPLAIN SELECT description, rating, category
FROM mock_items
WHERE description ||| 'running shoes'
ORDER BY rating
LIMIT 5;
If any of the above conditions are not met, the query cannot be fully optimized and you will not see a TopKScanExecState in the EXPLAIN output.
To guarantee stable sorting in the event of a tie, additional columns can be provided to ORDER BY:
from paradedb import Match, ParadeDB
MockItem.objects.filter(
description=ParadeDB(Match('running shoes', operator='OR'))
).order_by('rating', 'id').values('description', 'rating', 'category')[:5]
from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import search
stmt = (
select(MockItem.description, MockItem.rating, MockItem.category)
.where(search.match_any(MockItem.description, "running shoes"))
.order_by(MockItem.rating, MockItem.id)
.limit(5)
)
with Session(engine) as session:
session.execute(stmt).all()
MockItem.search(:description)
.matching_any("running shoes")
.order(:rating, :id)
.select(:description, :rating, :category)
.limit(5)
If a text field is present in the ORDER BY clause, it must be indexed with the literal or
literal normalized tokenizer.
Sorting by lowercase text using lower(<text_field>) is also supported. To enable this, the expression lower(<text_field>) must be indexed
with either the literal or literal normalized tokenizer. See indexing expressions for more information.
CREATE INDEX search_idx ON mock_items
USING bm25 (id, (lower(description)::pdb.literal))
WITH (key_field='id');
This allows sorting by lowercase to be optimized.
<CodeGroup> ```sql SQL SELECT description, rating, category FROM mock_items WHERE description ||| 'sleek running shoes' ORDER BY lower(description) LIMIT 5; ```from django.db.models.functions import Lower
from paradedb import Match, ParadeDB
MockItem.objects.filter(
description=ParadeDB(Match('sleek running shoes', operator='OR'))
).order_by(Lower('description')).values('description', 'rating', 'category')[:5]
from sqlalchemy import func, select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import search
stmt = (
select(MockItem.description, MockItem.rating, MockItem.category)
.where(search.match_any(MockItem.description, "sleek running shoes"))
.order_by(func.lower(MockItem.description))
.limit(5)
)
with Session(engine) as session:
session.execute(stmt).all()
description = MockItem.arel_table[:description]
lower_description = Arel::Nodes::NamedFunction.new("LOWER", [description])
MockItem.search(:description)
.matching_any("sleek running shoes")
.order(Arel::Nodes::Ascending.new(lower_description))
.select(:description, :rating, :category)
.limit(5)
Ordering by JSON subfield is on the roadmap but not yet supported. For example, this query will not receive an optimized Top K scan:
SELECT id, description, metadata
FROM mock_items
WHERE description ||| 'sleek running shoes'
ORDER BY metadata->'weight'
LIMIT 5;