diff --git a/proto/substrait/algebra.proto b/proto/substrait/algebra.proto index 95889db0a..94b8b8ac9 100644 --- a/proto/substrait/algebra.proto +++ b/proto/substrait/algebra.proto @@ -1334,10 +1334,13 @@ message Expression { // The lower and upper bound specify how many rows before and after the current row // the window should extend. BOUNDS_TYPE_ROWS = 1; - // The lower and upper bound describe a range of values. The window should include all rows - // where the value of the ordering column is greater than or equal to (current_value - lower bound) - // and less than or equal to (current_value + upper bound). This bounds type is only valid if there - // is a single ordering column. + // The lower and upper bound describe a range of values. When using numeric offsets (Preceding + // or Following with offset > 0), the window includes all rows where the value of the ordering + // column is greater than or equal to (current_value - lower bound) and less than or equal to + // (current_value + upper bound). When ANY numeric offset is present as a bound, there must be + // EXACTLY ONE ordering column. + // UNBOUNDED and CURRENT ROW bounds work with 0 or more ordering columns. CURRENT ROW + // includes all rows with matching values across all ordering columns (peer rows). BOUNDS_TYPE_RANGE = 2; } diff --git a/site/docs/expressions/window_functions.md b/site/docs/expressions/window_functions.md index 260118062..fd4ceb2fc 100644 --- a/site/docs/expressions/window_functions.md +++ b/site/docs/expressions/window_functions.md @@ -15,11 +15,19 @@ Window function signatures contain all the properties defined for [aggregate fun When binding a window function, the binding must include the following additional properties beyond the standard aggregate binding properties: -| Property | Description | Required | -| ----------- | ------------------------------------------------------------ | ------------------------------------------------------------ | -| Partition | A list of partitioning expressions. | False, defaults to a single partition for the entire dataset | -| Lower Bound | Bound Following(int64), Bound Trailing(int64) or CurrentRow. | False, defaults to start of partition | -| Upper Bound | Bound Following(int64), Bound Trailing(int64) or CurrentRow. | False, defaults to end of partition | +| Property | Description | Required | +| ----------- | ------------------------------------------------------------ | -------- | +| Partition | A list of partitioning expressions. Empty list means a single partition for the entire dataset. | True | +| Order By | A list of ordering expressions with sort directions. Empty list means unordered. | True | +| Bounds Type | ROWS or RANGE. ROWS bounds count physical rows. RANGE bounds consider value equivalence based on ordering columns. | True | +| Lower Bound | Preceding(int64), Following(int64), CurrentRow, or Unbounded. | True | +| Upper Bound | Preceding(int64), Following(int64), CurrentRow, or Unbounded. | True | + +### RANGE Bounds with Multiple Ordering Columns + +When using RANGE bounds with numeric offsets (Preceding or Following with offset > 0), only a single ordering column is allowed. This is because numeric offsets require arithmetic on the ordering column values (e.g., current_value - offset), which is ambiguous with multiple columns. + +RANGE bounds with UNBOUNDED or CURRENT ROW work with any number of ordering columns. CURRENT ROW includes all rows with matching values across all ordering columns (peer rows). ## Aggregate Functions as Window Functions