Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 23 additions & 2 deletions proto/substrait/algebra.proto
Original file line number Diff line number Diff line change
Expand Up @@ -1813,8 +1813,29 @@ message AggregateFunction {
}
}

// This rel is used to create references,
// in case we refer to a RelRoot field names will be ignored
// This rel is used to create references to other Rels within the same Plan,
// enabling DAGs of operations and multi-query optimization. When the
// referenced Rel contains outer references (free variables), the
// outer_reference_bindings field must be used to provide field references
// that resolve those references. In case we refer to a RelRoot, field names
// will be ignored.
message ReferenceRel {
// A zero-indexed positional reference to a Rel defined within the same Plan.
int32 subtree_ordinal = 1;

// Bindings for outer references within the referenced Rel subtree.
//
// When the referenced Rel (at the specified subtree_ordinal) contains
// FieldReference expressions with OuterReference root types, these bindings
// resolve those references. Each entry is a FieldReference evaluated in
// the relational context where this ReferenceRel appears.
//
// Entries are positionally matched to the outer_reference_declarations in
// the corresponding PlanRel: outer_reference_bindings[i] provides the
// concrete value for outer_reference_declarations[i]. The number of
// bindings must equal the number of declarations.
//
// If the referenced Rel contains no outer references, this field should
// be empty.
repeated Expression.FieldReference outer_reference_bindings = 2;
}
27 changes: 27 additions & 0 deletions proto/substrait/plan.proto
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,33 @@ message PlanRel {
// The root of a relation tree
RelRoot root = 2;
}

// Declarations of outer reference parameters for this PlanRel. Required
// when the Rel contains FieldReference expressions rooted on OuterReference.
// Any ReferenceRel that references this PlanRel's ordinal position must
// provide outer_reference_bindings whose entries positionally correspond
// to these declarations.
//
// This is used when the Rel within this PlanRel is a parameterized common
// subexpression — i.e., it contains FieldReference expressions rooted on
// OuterReference that act as free variables. Each declaration identifies the
// outer reference slot (via a FieldReference rooted on OuterReference) and
// its expected type.
//
// There must be exactly one declaration per distinct outer-reference-rooted
// FieldReference used within the Rel.
repeated OuterReferenceDeclaration outer_reference_declarations = 3;

// Declares the type of a single outer reference parameter.
message OuterReferenceDeclaration {
// The outer reference being declared. This must be a FieldReference with
// root_type set to OuterReference, matching a FieldReference used inside
// the Rel of this PlanRel.
Expression.FieldReference outer_reference = 1;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we could employ more explicit convention like OuterReferencePlaceHolder, which has (rel index, index in the declaration) to reference the declaration in the PlanRel.

The problem of this approach is that it will require rewriting the plan to replace the existing reference with the placeholders.


// The type of the value at this outer reference slot.
Type type = 2;
}
}

// Describe a set of operations to complete.
Expand Down
113 changes: 113 additions & 0 deletions site/docs/relations/logical_relations.md
Original file line number Diff line number Diff line change
Expand Up @@ -425,6 +425,112 @@ We could use the `ReferenceRel` to highlight the shared `A JOIN B` between the t
One expressing `A JOIN B` (in position 0 in the plan), one using reference as follows: `ReferenceRel(0) JOIN C` and a third one
doing `ReferenceRel(0) JOIN D`. This allows to avoid the redundancy of `A JOIN B`.

### Parameterized Common Subexpressions (Outer Reference Bindings)

When a common subexpression `Rel` (at some plan position) contains `OuterReference` field references — for example, from
a correlated subquery — those outer references become free variables once the `Rel` is extracted to the plan level.
Each `ReferenceRel` that references such a `Rel` must supply **outer reference bindings** — a list of `FieldReference`
values evaluated in the `ReferenceRel`'s local relational context — to resolve those free variables.

The `PlanRel` holding the common subexpression must include `OuterReferenceDeclaration` entries that declare each
outer-reference-rooted `FieldReference` and its expected type. Bindings in `ReferenceRel` are positionally matched:
`outer_reference_bindings[i]` provides the value for `outer_reference_declarations[i]`.

#### Example: Shared Correlated Subquery with Different Scope Levels and Field Indices

Consider three tables:

- `departments(dept_id: i64, budget_limit: i32)` — fields `[0: dept_id, 1: budget_limit]`
- `employees(emp_id: i64, dept_id: i64, salary: i32)` — fields `[0: emp_id, 1: dept_id, 2: salary]`
- `projects(proj_id: i64, dept_id: i64, cost: i32)` — fields `[0: proj_id, 1: dept_id, 2: cost]`
- `audit_records(record_id: i64, entity_id: i64, amount: i32)` — fields `[0: record_id, 1: entity_id, 2: amount]`

The following single query finds departments where **both** employees and projects have audit records
under the department's budget limit:

```sql
SELECT * FROM departments d
WHERE
-- Part 1: employees with audit records under budget
EXISTS (
SELECT 1 FROM employees e
WHERE e.dept_id = d.dept_id
AND EXISTS (
SELECT 1 FROM audit_records a
WHERE a.entity_id = e.emp_id -- references e.emp_id: 1 scope out, field 0
AND a.amount < d.budget_limit -- references d.budget_limit: 2 scopes out, field 1
)
)
AND
-- Part 2: projects with audit records under budget
EXISTS (
SELECT 1 FROM projects p
WHERE p.dept_id = d.dept_id
AND EXISTS (
SELECT 1 FROM audit_records a
WHERE a.entity_id = p.proj_id -- references p.proj_id: 1 scope out, field 0
AND a.amount < d.budget_limit -- references d.budget_limit: 2 scopes out, field 1
)
)
```

The two innermost `EXISTS` subqueries are structurally identical — both filter `audit_records` using:

- `OuterRef(steps_out=1, field=0)` — an entity id from the immediate parent (1 scope out, field index 0)
- `OuterRef(steps_out=2, field=1)` — a budget limit from the grandparent (2 scopes out, field index 1)

These outer references use **different `steps_out`** values (1 vs 2) and **different `field_index`** values (0 vs 1).

We extract the common subexpression to `PlanRel[0]`:

```
PlanRel[0]:
Rel: FilterRel(
input: ReadRel("audit_records"),
condition: AND(
a.entity_id = OuterRef(steps_out=1, field=0), // i64 — entity id from parent
a.amount < OuterRef(steps_out=2, field=1) // i32 — budget limit from grandparent
)
)
outer_reference_declarations: [
{ outer_reference: FieldRef(outer_reference(steps_out=1), field=0),
type: i64 },
{ outer_reference: FieldRef(outer_reference(steps_out=2), field=1),
type: i32 }
]
```

**Usage 1** — inside the employees subquery, the innermost `EXISTS` is replaced with:

```
ReferenceRel {
subtree_ordinal: 0,
outer_reference_bindings: [
// [0] → e.emp_id: field 0 of the enclosing employees scan
FieldRef(root_reference, field=0),
// [1] → d.budget_limit: field 1 of the departments scan
FieldRef(outer_reference(steps_out=1), field=1)
]
}
```

**Usage 2** — inside the projects subquery, the innermost `EXISTS` is replaced with:

```
ReferenceRel {
subtree_ordinal: 0,
outer_reference_bindings: [
// [0] → p.proj_id: field 0 of the enclosing projects scan
FieldRef(root_reference, field=0),
// [1] → d.budget_limit: field 1 of the departments scan
FieldRef(outer_reference(steps_out=1), field=1)
]
}
```

Both usages bind `OuterRef(steps_out=1).field[0]` to different entity id columns (`e.emp_id` vs `p.proj_id`),
while `OuterRef(steps_out=2).field[1]` resolves to the same `d.budget_limit` in both cases.

| Signature | Value |
| -------------------- |---------------------------------------|
| Inputs | 1 |
Expand All @@ -438,13 +544,20 @@ doing `ReferenceRel(0) JOIN D`. This allows to avoid the redundancy of `A JOIN B
| Property | Description | Required |
|-----------------------------|--------------------------------------------------------------------------------| --------------------------- |
| Referred Rel | A zero-indexed positional reference to a `Rel` defined within the same `Plan`. | Required |
| Outer Reference Bindings | A positionally-ordered list of `FieldReference` values that resolve outer references in the referenced `Rel`. Each entry corresponds to the declaration at the same index in the referenced `PlanRel`'s `outer_reference_declarations`. | Required if referenced Rel contains outer references |

=== "ReferenceRel Message"

```proto
%%% proto.algebra.ReferenceRel %%%
```

=== "PlanRel Message"

```proto
%%% proto.plan.PlanRel %%%
```

## Write Operator

The write operator is an operator that consumes one input and writes it to storage. This can range from writing to a Parquet file, to INSERT/DELETE/UPDATE in a database.
Expand Down
Loading