doc/user/content/transform-data/idiomatic-materialize-sql/top-k.md
The "Top-K in group" query pattern groups by some key and return the first K elements within each group according to some ordering.
{{< callout >}}
{{< idiomatic-sql/materialize-window-functions >}}
{{</ callout >}}
Idiomatic Materialize SQL: For Top-K queries where K >= 1, use a subquery to SELECT DISTINCT on the grouping key and perform a LATERAL join (by the grouping key) with another subquery that specifies the ordering and the limit K.
<table> <thead> <tr> <th></th> <th></th> </tr> </thead> <tbody> <tr> <td><blue>Idiomatic Materialize SQL</blue></td> <td class="copyableCode">Use a subquery to
SELECT DISTINCT on the grouping key (e.g.,
fieldA), and perform a LATERAL join
(by the grouping key fieldA) with another subquery that specifies the ordering
(e.g., fieldZ [ASC|DESC]) and the limit K.
SELECT fieldA, fieldB, ...
FROM (SELECT DISTINCT fieldA FROM tableA) grp,
LATERAL (SELECT fieldB, ... , fieldZ FROM tableA
WHERE fieldA = grp.fieldA
ORDER BY fieldZ ... LIMIT K) -- K is a number >= 1
ORDER BY fieldA, fieldZ ... ;
<red>Avoid the use of ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...) for Top-K queries.</red>
-- Anti-pattern. Avoid. --
SELECT fieldA, fieldB, ...
FROM (
SELECT fieldA, fieldB, ... , fieldZ,
ROW_NUMBER() OVER (PARTITION BY fieldA
ORDER BY fieldZ ... ) as rn
FROM tableA)
WHERE rn <= K -- K is a number >= 1
ORDER BY fieldA, fieldZ ...;
To further improve the memory usage of the idiomatic Materialize SQL, you can
specify a LIMIT INPUT GROUP SIZE query hint in the
idiomatic Materialize SQL.
SELECT fieldA, fieldB, ...
FROM (SELECT DISTINCT fieldA FROM tableA) grp,
LATERAL (SELECT fieldB, ... , fieldZ FROM tableA
WHERE fieldA = grp.fieldA
OPTIONS (LIMIT INPUT GROUP SIZE = ...)
ORDER BY fieldZ ... LIMIT K) -- K is a number >= 1
ORDER BY fieldA, fieldZ ... ;
For more information on setting LIMIT INPUT GROUP SIZE, see
Optimization.
Idiomatic Materialize SQL: For K = 1, use a SELECT DISTINCT
ON() on the grouping key (e.g., fieldA) and
order the results first by the DISTINCT ON key and then the Top-K ordering
key (e.g., fieldA, fieldZ [ASC|DESC]).
Alternatively, you can also use the more general Top-K where K >= 1 pattern, specifying 1 as the limit.
<table> <thead> <tr> <th></th> <th></th> </tr> </thead> <tbody> <tr> <td><blue>Idiomatic Materialize SQL</blue></td> <td class="copyableCode">SELECT DISTINCT ON(fieldA) fieldA, fieldB, ...
FROM tableA
ORDER BY fieldA, fieldZ ... ;
<red>Avoid the use of ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...) for Top-K queries.</red>
-- Anti-pattern. Avoid. --
SELECT fieldA, fieldB, ...
FROM (
SELECT fieldA, fieldB, ... , fieldZ,
ROW_NUMBER() OVER (PARTITION BY fieldA
ORDER BY fieldZ ... ) as rn
FROM tableA)
WHERE rn = 1
ORDER BY fieldA, fieldZ ...;
To further improve the memory usage of the idiomatic Materialize SQL, you can
specify a DISTINCT ON INPUT GROUP SIZE query hint
in the idiomatic Materialize SQL.
SELECT DISTINCT ON(fieldA) fieldA, fieldB, ...
FROM tableA
OPTIONS (DISTINCT ON INPUT GROUP SIZE = ...)
ORDER BY fieldA, fieldZ ... ;
For more information on setting DISTINCT ON INPUT GROUP SIZE, see
EXPLAIN ANALYZE HINTS.
{{< note >}}
The example data can be found in the Appendix.
{{</ note >}}
Using idiomatic Materialize SQL, the following example finds the top 3 items (by
descending subtotal) in each order. The example uses a subquery to SELECT
DISTINCT on the grouping key (order_id), and
performs a LATERAL join (by the grouping
key) with another subquery that specifies the ordering (ORDER BY subtotal DESC) and limits its results to 3 (LIMIT 3).
SELECT order_id, item, subtotal
FROM (SELECT DISTINCT order_id FROM orders_view) grp,
LATERAL (SELECT item, subtotal FROM orders_view
WHERE order_id = grp.order_id
ORDER BY subtotal DESC LIMIT 3)
ORDER BY order_id, subtotal DESC;
<red>Avoid the use of ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...) for Top-K queries.</red>
-- Anti-pattern --
SELECT order_id, item, subtotal
FROM (
SELECT order_id, item, subtotal,
ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY subtotal DESC) as rn
FROM orders_view)
WHERE rn <= 3
ORDER BY order_id, subtotal DESC;
Using idiomatic Materialize SQL, the following example finds the top 1 item (by
descending subtotal) in each order. The example uses a query to SELECT DISTINCT
ON() on the grouping key (order_id) with an
ORDER BY order_id, subtotal DESC (i.e., ordering first by the DISTINCT ON/grouping key, then the descending subtotal). 1
SELECT DISTINCT ON(order_id) order_id, item, subtotal
FROM orders_view
ORDER BY order_id, subtotal DESC;
<red>Avoid the use of ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...) for Top-K queries.</red>
-- Anti-pattern --
SELECT order_id, item, subtotal
FROM (
SELECT order_id, item, subtotal,
ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY subtotal DESC) as rn
FROM orders_view)
WHERE rn = 1
ORDER BY order_id, subtotal DESC;
Alternatively, you can also use the idiomatic Materialize SQL for the more general Top K query, specifying 1 as the limit. ↩