Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions proto/substrait/algebra.proto
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,23 @@ message JoinRel {

JoinType type = 6;

// When true, the right input is evaluated once per row of the left input
// (lateral join / correlated subquery). The right input may reference fields
// from the current left row using FieldReference with OuterReference as the
// root_type. This JoinRel must have RelCommon.rel_anchor set so the right input can
// use OuterReference with rel_reference pointing to that anchor.
//
// The rel_reference resolves against the left input's output schema.
//
// When false (default), both inputs are independent.
//
// Right-oriented join types (RIGHT, RIGHT_SEMI, RIGHT_ANTI, RIGHT_SINGLE,
// RIGHT_MARK) and OUTER are not valid when lateral is true, because the
// right input has no independent existence — it is defined only in the
// context of a left row. Only INNER, LEFT, LEFT_SEMI, LEFT_ANTI,
// LEFT_SINGLE, and LEFT_MARK are permitted.
bool lateral = 7;

enum JoinType {
JOIN_TYPE_UNSPECIFIED = 0;
JOIN_TYPE_INNER = 1;
Expand All @@ -296,6 +313,18 @@ message CrossRel {
Rel left = 2;
Rel right = 3;

// When true, the right input is evaluated once per row of the left input
// (lateral semantics). The right input may reference fields from the
// current left row using FieldReference with OuterReference as the
// root_type. This CrossRel must have RelCommon.rel_anchor set so the right input
// can use OuterReference with rel_reference pointing to that anchor.
//
// The rel_reference resolves against the left input's output schema.
//
// When false (default), both inputs are independent and the result is the
// standard Cartesian product.
bool lateral = 4;

substrait.extensions.AdvancedExtension advanced_extension = 10;
}

Expand Down
43 changes: 38 additions & 5 deletions site/docs/relations/logical_relations.md
Original file line number Diff line number Diff line change
Expand Up @@ -218,7 +218,8 @@ The cross product operation will combine two separate inputs into a single outpu
| Property | Description | Required |
| --------------- | ------------------------------------------------------------ | ---------------------------------- |
| Left Input | A relational input. | Required |
| Right Input | A relational input. | Required |
| Right Input | A relational input. When `lateral` is true, this input may reference the current left row using `OuterReference` with `rel_reference` pointing to this CrossRel's `RelCommon.rel_anchor`. | Required |
| Lateral | When true, the right input is evaluated once per row of the left input (lateral semantics). This CrossRel must have `RelCommon.rel_anchor` set. The right input may reference fields from the current left row using `FieldReference` with `OuterReference` as the `root_type`, using `rel_reference` pointing to this CrossRel's `rel_anchor`. The `rel_reference` resolves against the left input's output schema. When false (default), both inputs are independent and the result is the standard Cartesian product. | Optional (default: false) |


=== "CrossRel Message"
Expand All @@ -245,10 +246,11 @@ The join operation will combine two separate inputs into a single output, based
| Property | Description | Required |
| ---------------- | ------------------------------------------------------------ | ---------------------------------- |
| Left Input | A relational input. | Required |
| Right Input | A relational input. | Required |
| Right Input | A relational input. When `lateral` is true, this input may reference the current left row using `OuterReference` with `rel_reference` pointing to this JoinRel's `RelCommon.rel_anchor`. | Required |
| Join Expression | A boolean condition that describes whether each record from the left set "match" the record from the right set. Field references correspond to the input order of the data. | Required. Can be the literal True. |
| Post-Join Filter | A boolean condition to be applied to each result record after the inputs have been joined, yielding only the records that satisfied the condition. | Optional |
| Join Type | One of the join types defined below. | Required |
| Lateral | When true, the right input is evaluated once per row of the left input (lateral join / correlated subquery). This JoinRel must have `RelCommon.rel_anchor` set. The right input may reference fields from the current left row using `FieldReference` with `OuterReference` as the `root_type`, using `rel_reference` pointing to this JoinRel's id. The `rel_reference` resolves against the left input's output schema. When false (default), both inputs are independent. See [Lateral Joins](#lateral-joins) for details.| Optional (default: false) |

### Join Types

Expand All @@ -274,6 +276,40 @@ The join operation will combine two separate inputs into a single output, based
%%% proto.algebra.JoinRel %%%
```

### Lateral Joins

When the `lateral` flag is set to true, the join operates as a lateral (correlated) join. The right input is evaluated once per row of the left input, and the right input may reference fields from the current left row using a `FieldReference` with `OuterReference` as the `root_type`.

This JoinRel must have `RelCommon.rel_anchor` set. The right input should use `OuterReference` with `rel_reference` pointing to this JoinRel's `rel_anchor`. The `rel_reference` resolves against the left input's output schema. See [Field References — Outer References](../expressions/field_references.md#outer-references) for more on outer references in correlated plans.

For example, the SQL query:

```sql
SELECT a, (SELECT MAX(b) FROM T2 WHERE T2.x = T1.a) FROM T1
```

can be represented as an inner lateral join where `T1` is the left input and the scalar subquery `SELECT MAX(b) FROM T2 WHERE T2.x = T1.a` is the right input. The JoinRel has `RelCommon.rel_anchor` set, and inside the right input, `T1.a` is referenced via a `FieldReference` with `OuterReference { rel_reference = <JoinRel's id> }` as the root.

#### Permitted Join Types for Lateral

Because the right input only exists in the context of a specific left row, only `INNER` and left-oriented join types (`LEFT`, `LEFT_SEMI`, `LEFT_ANTI`, `LEFT_SINGLE`, `LEFT_MARK`) are valid when `lateral` is true. Right-oriented types (`RIGHT`, `RIGHT_SEMI`, `RIGHT_ANTI`, `RIGHT_SINGLE`, `RIGHT_MARK`) and `OUTER` are invalid since the right input has no independent existence outside a left row context.

#### Nested Lateral Joins

Lateral joins can introduce multiple levels of correlated subqueries. Each JoinRel with `lateral=true` must have `RelCommon.rel_anchor` set so outer references can name the binding relation via `rel_reference`. The `rel_reference` resolves against the left input's output schema:

```
JoinRel (left, lateral=true) [rel_anchor=1]
/ \
Input1(a) JoinRel (inner, lateral=true) [rel_anchor=2]
/ \
Input2(b) Subquery

OuterReference access within each scope:
Input2 : a [rel_reference=1]
Subquery : a [rel_reference=1], b [rel_reference=2]
```


## Set Operation

Expand Down Expand Up @@ -560,6 +596,3 @@ The operator that defines modifications of a database schema (CREATE/DROP/ALTER
%%% proto.algebra.DdlRel %%%
```

???+ question "Discussion Points"

* How should correlated operations be handled?