diff --git a/proto/substrait/algebra.proto b/proto/substrait/algebra.proto index 6fa504dc8..0b6a3b20b 100644 --- a/proto/substrait/algebra.proto +++ b/proto/substrait/algebra.proto @@ -271,6 +271,23 @@ message JoinRel { JoinType type = 6; + // When true, the right input is evaluated once per row of the left input + // (lateral join / correlated subquery). The right input may reference fields + // from the current left row using FieldReference with OuterReference as the + // root_type. This JoinRel must have RelCommon.rel_anchor set so the right input can + // use OuterReference with rel_reference pointing to that anchor. + // + // The rel_reference resolves against the left input's output schema. + // + // When false (default), both inputs are independent. + // + // Right-oriented join types (RIGHT, RIGHT_SEMI, RIGHT_ANTI, RIGHT_SINGLE, + // RIGHT_MARK) and OUTER are not valid when lateral is true, because the + // right input has no independent existence — it is defined only in the + // context of a left row. Only INNER, LEFT, LEFT_SEMI, LEFT_ANTI, + // LEFT_SINGLE, and LEFT_MARK are permitted. + bool lateral = 7; + enum JoinType { JOIN_TYPE_UNSPECIFIED = 0; JOIN_TYPE_INNER = 1; @@ -296,6 +313,18 @@ message CrossRel { Rel left = 2; Rel right = 3; + // When true, the right input is evaluated once per row of the left input + // (lateral semantics). The right input may reference fields from the + // current left row using FieldReference with OuterReference as the + // root_type. This CrossRel must have RelCommon.rel_anchor set so the right input + // can use OuterReference with rel_reference pointing to that anchor. + // + // The rel_reference resolves against the left input's output schema. + // + // When false (default), both inputs are independent and the result is the + // standard Cartesian product. + bool lateral = 4; + substrait.extensions.AdvancedExtension advanced_extension = 10; } diff --git a/site/docs/relations/logical_relations.md b/site/docs/relations/logical_relations.md index 9d08e37f3..587f0a0ff 100644 --- a/site/docs/relations/logical_relations.md +++ b/site/docs/relations/logical_relations.md @@ -218,7 +218,8 @@ The cross product operation will combine two separate inputs into a single outpu | Property | Description | Required | | --------------- | ------------------------------------------------------------ | ---------------------------------- | | Left Input | A relational input. | Required | -| Right Input | A relational input. | Required | +| Right Input | A relational input. When `lateral` is true, this input may reference the current left row using `OuterReference` with `rel_reference` pointing to this CrossRel's `RelCommon.rel_anchor`. | Required | +| Lateral | When true, the right input is evaluated once per row of the left input (lateral semantics). This CrossRel must have `RelCommon.rel_anchor` set. The right input may reference fields from the current left row using `FieldReference` with `OuterReference` as the `root_type`, using `rel_reference` pointing to this CrossRel's `rel_anchor`. The `rel_reference` resolves against the left input's output schema. When false (default), both inputs are independent and the result is the standard Cartesian product. | Optional (default: false) | === "CrossRel Message" @@ -245,10 +246,11 @@ The join operation will combine two separate inputs into a single output, based | Property | Description | Required | | ---------------- | ------------------------------------------------------------ | ---------------------------------- | | Left Input | A relational input. | Required | -| Right Input | A relational input. | Required | +| Right Input | A relational input. When `lateral` is true, this input may reference the current left row using `OuterReference` with `rel_reference` pointing to this JoinRel's `RelCommon.rel_anchor`. | Required | | Join Expression | A boolean condition that describes whether each record from the left set "match" the record from the right set. Field references correspond to the input order of the data. | Required. Can be the literal True. | | Post-Join Filter | A boolean condition to be applied to each result record after the inputs have been joined, yielding only the records that satisfied the condition. | Optional | | Join Type | One of the join types defined below. | Required | +| Lateral | When true, the right input is evaluated once per row of the left input (lateral join / correlated subquery). This JoinRel must have `RelCommon.rel_anchor` set. The right input may reference fields from the current left row using `FieldReference` with `OuterReference` as the `root_type`, using `rel_reference` pointing to this JoinRel's id. The `rel_reference` resolves against the left input's output schema. When false (default), both inputs are independent. See [Lateral Joins](#lateral-joins) for details.| Optional (default: false) | ### Join Types @@ -274,6 +276,40 @@ The join operation will combine two separate inputs into a single output, based %%% proto.algebra.JoinRel %%% ``` +### Lateral Joins + +When the `lateral` flag is set to true, the join operates as a lateral (correlated) join. The right input is evaluated once per row of the left input, and the right input may reference fields from the current left row using a `FieldReference` with `OuterReference` as the `root_type`. + +This JoinRel must have `RelCommon.rel_anchor` set. The right input should use `OuterReference` with `rel_reference` pointing to this JoinRel's `rel_anchor`. The `rel_reference` resolves against the left input's output schema. See [Field References — Outer References](../expressions/field_references.md#outer-references) for more on outer references in correlated plans. + +For example, the SQL query: + +```sql +SELECT a, (SELECT MAX(b) FROM T2 WHERE T2.x = T1.a) FROM T1 +``` + +can be represented as an inner lateral join where `T1` is the left input and the scalar subquery `SELECT MAX(b) FROM T2 WHERE T2.x = T1.a` is the right input. The JoinRel has `RelCommon.rel_anchor` set, and inside the right input, `T1.a` is referenced via a `FieldReference` with `OuterReference { rel_reference = }` as the root. + +#### Permitted Join Types for Lateral + +Because the right input only exists in the context of a specific left row, only `INNER` and left-oriented join types (`LEFT`, `LEFT_SEMI`, `LEFT_ANTI`, `LEFT_SINGLE`, `LEFT_MARK`) are valid when `lateral` is true. Right-oriented types (`RIGHT`, `RIGHT_SEMI`, `RIGHT_ANTI`, `RIGHT_SINGLE`, `RIGHT_MARK`) and `OUTER` are invalid since the right input has no independent existence outside a left row context. + +#### Nested Lateral Joins + +Lateral joins can introduce multiple levels of correlated subqueries. Each JoinRel with `lateral=true` must have `RelCommon.rel_anchor` set so outer references can name the binding relation via `rel_reference`. The `rel_reference` resolves against the left input's output schema: + +``` + JoinRel (left, lateral=true) [rel_anchor=1] + / \ + Input1(a) JoinRel (inner, lateral=true) [rel_anchor=2] + / \ + Input2(b) Subquery + +OuterReference access within each scope: + Input2 : a [rel_reference=1] + Subquery : a [rel_reference=1], b [rel_reference=2] +``` + ## Set Operation @@ -560,6 +596,3 @@ The operator that defines modifications of a database schema (CREATE/DROP/ALTER %%% proto.algebra.DdlRel %%% ``` -???+ question "Discussion Points" - - * How should correlated operations be handled?