docs/sql/window_functions.md
Window functions in Daft SQL allow you to perform calculations across a set of rows that are related to the current row, similar to aggregate functions but without collapsing the result into a single row.
!!! warning "Warning"
Window function support in Daft SQL is currently limited. Full SQL window function support is under development.
The general syntax for window functions in Daft is:
function_name([expr]) OVER (
[PARTITION BY expr_list]
[ORDER BY order_list]
[frame_clause]
)
Where:
function_name is the name of the window functionPARTITION BY divides the result set into partitions to which the window function is appliedORDER BY defines the logical order of rows within each partition
frame_clause defines a subset of rows in the current partition (called the window frame)The following window functions are currently supported:
ROW_NUMBER(): Returns the sequential row number starting from 1 within the partition.
SELECT
category,
value,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY value) as row_num
FROM sales
RANK(): Returns the rank of the current row within a partition, with gaps in the ranking sequence when there are ties.
SELECT
category,
value,
RANK() OVER (PARTITION BY category ORDER BY value) as rank
FROM sales
DENSE_RANK(): Returns the rank of the current row within a partition, without gaps in the ranking sequence when there are ties.
SELECT
category,
value,
DENSE_RANK() OVER (PARTITION BY category ORDER BY value) as dense_rank
FROM sales
LAG(value [, offset [, default]]): Returns the value from a row that is offset rows before the current row. If no such row exists, returns the default value. The offset parameter defaults to 1 if not specified.
SELECT
date,
value,
LAG(value, 1, 0) OVER (ORDER BY date) as previous_value
FROM time_series
LEAD(value [, offset [, default]]): Returns the value from a row that is offset rows after the current row. If no such row exists, returns the default value. The offset parameter defaults to 1 if not specified.
SELECT
date,
value,
LEAD(value, 1, 0) OVER (ORDER BY date) as next_value
FROM time_series
All Daft aggregate functions can be used as window functions. Common examples include:
SUM([expr]): Returns the sum of expression values.AVG([expr]): Returns the average of expression values.COUNT([expr]): Returns the count of non-null expression values.MIN([expr]): Returns the minimum expression value.MAX([expr]): Returns the maximum expression value.Example:
SELECT
category,
value,
SUM(value) OVER (PARTITION BY category) as category_total,
AVG(value) OVER (PARTITION BY category) as category_avg
FROM sales
!!! note "Note"
When using aggregate functions with both PARTITION BY and ORDER BY, the default window frame includes all rows from the start of the partition up to the current row — equivalent to ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
When using aggregate functions as window functions, you can specify a window frame to define which rows to include in the aggregation:
function_name([expr]) OVER (
[PARTITION BY expr_list]
[ORDER BY order_list]
[ROWS | RANGE]
BETWEEN frame_start AND frame_end
)
Where:
ROWS indicates that the frame is defined by physical row count
RANGE indicates that the frame is defined by logical value (not fully supported yet)
frame_start and frame_end can be one of:
UNBOUNDED PRECEDING: All rows before the current row (only valid for frame_start)n PRECEDING: n rows before the current rowCURRENT ROW: The current rown FOLLOWING: n rows after the current rowUNBOUNDED FOLLOWING: All rows after the current row (only valid for frame_end)Examples:
-- Running sum (includes all previous rows and current row)
SELECT
category,
value,
SUM(value) OVER (
PARTITION BY category
ORDER BY value
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as running_sum
FROM sales
-- Moving average of current row and 2 preceding rows
SELECT
date,
value,
AVG(value) OVER (
ORDER BY date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) as moving_avg
FROM time_series
PARTITION BY) are not yet supportedWINDOW clause) are not supportedIGNORE NULLS and RESPECT NULLS options are not supported