Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 12 additions & 13 deletions .claude/rules/pass-doc-ordering.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,19 +17,18 @@ Developers read pass docs sequentially to understand the compilation pipeline. I
| 02 | `02-ctrl_flow_transform.md` | 2nd pass |
| 03 | `03-convert_to_ssa.md` | 3rd pass |
| 04 | `04-flatten_call_expr.md` | 4th pass |
| 05 | `05-split_chunked_loops.md` | 5th pass |
| 06 | `06-interchange_chunk_loops.md` | 6th pass |
| 07 | `07-outline_incore_scopes.md` | 7th pass |
| 08 | `08-outline_cluster_scopes.md` | 8th pass |
| 09 | `09-convert_tensor_to_tile_ops.md` | 9th pass |
| 10 | `10-optimize_orch_tensors.md` | 10th pass |
| 11 | `11-flatten_tile_nd_to_2d.md` | 11th pass |
| 12 | *(no doc yet)* | 12th pass (`InferTileMemorySpace`) |
| 13 | *(no doc yet)* | 13th pass (`ResolveTransposeLayout`) |
| 14 | `14-expand_mixed_kernel.md` | 14th pass |
| 15 | `15-init_memref.md` | 15th pass |
| 16 | `16-memory_reuse.md` | 16th pass |
| 17 | `17-allocate_memory_addr.md` | 17th pass |
| 05 | `05-outline_hierarchy_scopes.md` | 5th pass (non-CORE_GROUP → `Opaque`) |
| 06 | `06-outline_incore_scopes.md` | 6th pass (CORE_GROUP → `InCore`, promote parent) |
| 07 | `07-outline_cluster_scopes.md` | 7th pass |
| 08 | `08-convert_tensor_to_tile_ops.md` | 8th pass |
| 09 | `09-optimize_orch_tensors.md` | 9th pass |
| 10 | `10-flatten_tile_nd_to_2d.md` | 10th pass |
| 11 | `11-expand_mixed_kernel.md` | 11th pass |
| 12 | `12-init_memref.md` | 12th pass |
| 13 | `13-memory_reuse.md` | 13th pass |
| 14 | `14-allocate_memory_addr.md` | 14th pass |
| 15 | `15-partial_unroll_tile_loops.md` | 15th pass |
| 16 | `16-reorder_unrolled_io.md` | 16th pass |
| 90 | `90-insert_sync.md` | Not in Default strategy |
| 91 | `91-utility_passes.md` | Not in Default strategy |
| 99 | `99-verifier.md` | Infrastructure (not a pipeline pass) |
Expand Down
4 changes: 1 addition & 3 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -147,9 +147,9 @@ set(PYPTO_SOURCES
src/ir/transforms/mutator.cpp
src/ir/transforms/normalize_stmt_structure_pass.cpp
src/ir/transforms/op_conversion_registry.cpp
src/ir/transforms/outline_incore_scopes_pass.cpp
src/ir/transforms/outline_cluster_scopes_pass.cpp
src/ir/transforms/outline_hierarchy_scopes_pass.cpp
src/ir/transforms/outline_incore_scopes_pass.cpp
src/ir/transforms/expand_mixed_kernel_pass.cpp
src/ir/transforms/split_vector_kernel_pass.cpp
src/ir/transforms/flatten_tile_nd_to_2d_pass.cpp
Expand All @@ -159,8 +159,6 @@ set(PYPTO_SOURCES
src/ir/transforms/resolve_transpose_layout_pass.cpp
src/ir/transforms/python_printer.cpp
src/ir/transforms/simplify_pass.cpp
src/ir/transforms/split_chunked_loops_pass.cpp
src/ir/transforms/interchange_chunk_loops_pass.cpp
src/ir/transforms/unroll_loops_pass.cpp
src/ir/transforms/partial_unroll_tile_loops_pass.cpp
src/ir/transforms/reorder_unrolled_io_pass.cpp
Expand Down
71 changes: 45 additions & 26 deletions docs/en/dev/ir/01-hierarchy.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,12 @@ This document provides a complete reference of all IR node types, organized by c
<return_stmt> ::= "return" [ <var_list> ]
<eval_stmt> ::= <expr>
<seq_stmts> ::= <stmt> { ";" <stmt> }
<scope_stmt> ::= "with" "pl.incore" "(" ")" ":" <stmt_list>
<scope_stmt> ::= "with" "pl.at" "(" "level" "=" <level> [ "," "role" "=" <role> ]
[ "," "optimizations" "=" "[" <optimization_list> "]" ] ")"
":" <stmt_list>
| "with" "pl.cluster" "(" ")" ":" <stmt_list>
| "with" "pl.spmd" "(" "core_num" "=" <expr>
[ "," "sync_start" "=" <expr> ] ")" ":" <stmt_list>
<break_stmt> ::= "break"
<continue_stmt> ::= "continue"

Expand Down Expand Up @@ -153,10 +158,8 @@ field from the `Stmt` base class. See [Leading comments on statements](#leading-
| **IfStmt** | `condition_`, `then_stmts_`, `else_stmts_`, `return_vars_` | Conditional branching |
| **ForStmt** | `loop_var_` (DefField), `start_`, `stop_`, `step_`, `iter_args_` (DefField), `body_`, `return_vars_` (DefField), `kind_` | For loop with optional iteration args |
| **WhileStmt** | `condition_`, `iter_args_` (DefField), `body_`, `return_vars_` (DefField) | While loop with condition and iteration args |
| **InCoreScopeStmt** | `name_hint_`, `body_`, `split_` (optional) | InCore region; outlined to `Function(InCore)` |
| **AutoInCoreScopeStmt** | `name_hint_`, `body_`, `split_` (optional) | Auto-InCore region; consumed by `InterchangeChunkLoops` |
| **ClusterScopeStmt** | `name_hint_`, `body_` | Cluster region; outlined to `Function(Group)` |
| **HierarchyScopeStmt** | `name_hint_`, `body_`, `level_`, `role_` (optional) | Pipeline-stage region for a given Level/Role |
| **HierarchyScopeStmt** | `name_hint_`, `body_`, `level_`, `role_` (optional), `split_` (optional) | Pipeline-stage region for a given Level/Role; outlined to `Function(InCore)` when `level_ == CORE_GROUP` and to `Function(Opaque)` otherwise |
| **SpmdScopeStmt** | `name_hint_`, `body_`, `core_num_`, `sync_start_` | SPMD launch region; outlined to `Function(Spmd)` |
| **YieldStmt** | `values_` | Yield values in loop iteration |
| **EvalStmt** | `expr_` | Evaluate expression for side effects |
Expand Down Expand Up @@ -252,32 +255,35 @@ while_stmt = ir.WhileStmt(condition, [x_iter], body, [x_final], span)
### ScopeStmt Details

`ScopeStmt` is an **abstract base class** that marks a region with a specific
execution context. The five concrete subclasses below each carry only the
execution context. The three concrete subclasses below each carry only the
fields valid for their kind — invalid combinations are unrepresentable at
construction. Use `s.scope_kind` (or `s.GetScopeKind()` in C++) to recover the
kind from a `ScopeStmt`-typed reference, or `isinstance(s, InCoreScopeStmt)`
kind from a `ScopeStmt`-typed reference, or `isinstance(s, HierarchyScopeStmt)`
to dispatch on the concrete type.

All five share the common base fields `name_hint_: str` and `body_: StmtPtr`.
Note that `pl.at(level=Level.CORE_GROUP)` lowers to `InCoreScopeStmt` /
`AutoInCoreScopeStmt`, not `HierarchyScopeStmt` — the parser rejects `role=`
at `CORE_GROUP`. `HierarchyScopeStmt` is reserved for non-`CORE_GROUP` levels
(host, cluster, global) and is not a general replacement for in-core scopes.
All three share the common base fields `name_hint_: str` and `body_: StmtPtr`.
`pl.at(level=...)` always lowers to `HierarchyScopeStmt` — including the
`level=Level.CORE_GROUP` form, which produces a `HierarchyScopeStmt` with
`level_ == CORE_GROUP` and an optional `split_`. `OutlineIncoreScopes`
later turns that `CORE_GROUP` scope into a `Function(InCore)` and re-types
the parent `Opaque` function as `Orchestration`. Non-`CORE_GROUP`
`HierarchyScopeStmt`s are outlined into `Function(Opaque)` by
`OutlineHierarchyScopes` (which runs immediately before `OutlineIncoreScopes`).

```python
# with pl.incore(): y = pl.add(x, x)
in_core = ir.InCoreScopeStmt(name_hint="", body=body, span=span)

# with pl.auto_incore(): (split is optional)
auto = ir.AutoInCoreScopeStmt(name_hint="", body=body, span=span)

# with pl.cluster():
cluster = ir.ClusterScopeStmt(name_hint="", body=body, span=span)

# with pl.at(level=Level.HOST, role=Role.Worker):
hier = ir.HierarchyScopeStmt(level=ir.Level.HOST, role=ir.Role.Worker,
name_hint="", body=body, span=span)

# with pl.at(level=Level.CORE_GROUP,
# optimizations=[pl.split(pl.SplitMode.UP_DOWN)]):
hier_core = ir.HierarchyScopeStmt(level=ir.Level.CORE_GROUP,
split=ir.SplitMode.UP_DOWN,
name_hint="", body=body, span=span)

# with pl.spmd(core_num=8):
spmd = ir.SpmdScopeStmt(core_num=8, sync_start=False,
name_hint="", body=body, span=span)
Expand All @@ -289,20 +295,33 @@ spmd = ir.SpmdScopeStmt(core_num=8, sync_start=False,
are not control flow (execute once, linearly).
- Required fields are enforced at construction: `HierarchyScopeStmt.level_`
is non-optional; `SpmdScopeStmt` rejects `core_num <= 0`.
- `InCoreScopeStmt` / `AutoInCoreScopeStmt` are scheduled for deprecation;
prefer `HierarchyScopeStmt` or other surviving kinds in new code.
- `HierarchyScopeStmt.split_` is optional and is only meaningful at
`Level.CORE_GROUP`. It is copied onto the outlined `InCore` function's
attrs so `ExpandMixedKernel` can read the hint.
- Pass behavior:
- `InterchangeChunkLoops` consumes `AutoInCoreScopeStmt`
- `OutlineIncoreScopes` extracts `InCoreScopeStmt` into `Function(InCore)`
- `OutlineHierarchyScopes` extracts every non-`CORE_GROUP`
`HierarchyScopeStmt` into a dedicated `FunctionType::Opaque` function.
Parent function types are preserved.
- `OutlineIncoreScopes` (runs immediately after) extracts every
`CORE_GROUP` `HierarchyScopeStmt` into a dedicated `FunctionType::InCore`
function. Parents that contained at least one `CORE_GROUP` scope are
re-typed from `Opaque` to `Orchestration`.
- `OutlineClusterScopes` extracts `ClusterScopeStmt` into `Function(Group)`
and standalone `SpmdScopeStmt` into `Function(Spmd)`
- `OutlineHierarchyScopes` extracts `HierarchyScopeStmt`
and standalone `SpmdScopeStmt` into `Function(Spmd)`.

**Transformation:**

```python
# Before: with pl.incore(): y = pl.add(x, x); return y
# After: main_incore_0(x) -> y; main(x): y = main_incore_0(x); return y
# Before:
# def main(x):
# with pl.at(level=pl.Level.CORE_GROUP):
# y = pl.add(x, x)
# return y
# After:
# def main_core_group_0(x) -> y: ... # FunctionType.InCore
# def main(x) -> y: # FunctionType.Orchestration
# y = main_core_group_0(x)
# return y
```

**Parallel for loop (ForKind):**
Expand Down Expand Up @@ -444,7 +463,7 @@ Functions stored in sorted map for deterministic ordering. GlobalVar names must
| **Unary Ops** | 5 | Abs, Neg, Not, BitNot, Cast |
| **Call/Access** | 2 | Call, TupleGetItemExpr |
| **Operations** | 2 | Op, GlobalVar |
| **Statements** | 15 | AssignStmt, IfStmt, ForStmt, WhileStmt, ReturnStmt, InCoreScopeStmt, AutoInCoreScopeStmt, ClusterScopeStmt, HierarchyScopeStmt, SpmdScopeStmt, YieldStmt, EvalStmt, SeqStmts, BreakStmt, ContinueStmt |
| **Statements** | 13 | AssignStmt, IfStmt, ForStmt, WhileStmt, ReturnStmt, ClusterScopeStmt, HierarchyScopeStmt, SpmdScopeStmt, YieldStmt, EvalStmt, SeqStmts, BreakStmt, ContinueStmt |
| **Types** | 6 | ScalarType, TensorType, TileType, TupleType, PipeType, UnknownType |
| **Functions** | 2 | Function, Program |

Expand Down
21 changes: 8 additions & 13 deletions docs/en/dev/language/00-python_syntax.md
Original file line number Diff line number Diff line change
Expand Up @@ -256,22 +256,17 @@ for i in pl.unroll(12, chunk=4):
body_statements
```

**Key points:** `chunk=C` splits the loop into an outer sequential loop and an inner loop of `C` iterations. The inner loop preserves the original kind (Sequential/Parallel/Unroll). `chunk` cannot be combined with `init_values`, and `chunk=` loops are only valid inside a `with pl.at(level=pl.Level.CORE_GROUP, optimizations=[pl.auto_chunk]):` — outside that scope the parser rejects them with an error. See [SplitChunkedLoops Pass](../passes/05-split_chunked_loops.md).
**Key points:** `chunk=C` splits the loop into an outer sequential loop and an inner loop of `C` iterations. The inner loop preserves the original kind (Sequential/Parallel/Unroll). `chunk` cannot be combined with `init_values`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The documentation for chunk=C still claims that it splits the loop into outer and inner loops. However, the SplitChunkedLoops pass has been removed in this PR. This documentation is now stale and misleading. It should be updated to reflect that compiler-driven chunking via the chunk argument is no longer supported, or removed if the feature is entirely gone.


### Scope Context Managers

| Form | Scope Kind | Notes |
| ---- | ---------- | ----- |
| `pl.at(level=pl.Level.CORE_GROUP)` | `InCore` | Fixed-boundary outline at CORE_GROUP |
| `pl.at(level=pl.Level.CORE_GROUP, optimizations=[pl.split(MODE)])` | `InCore` | InCore + cross-core split hint |
| `pl.at(level=pl.Level.CORE_GROUP, optimizations=[pl.auto_chunk])` | `AutoInCore` | Compiler-driven chunked loop split |
| `pl.at(level=pl.Level.CORE_GROUP, optimizations=[pl.auto_chunk, pl.split(MODE)])` | `AutoInCore` | AutoInCore + split hint (independent entries) |
| `pl.at(level=pl.Level.HOST)` *(or any non-`CORE_GROUP` level)* | `Hierarchy` | Distributed hierarchy scope |
| `pl.cluster()` | `Cluster` | Co-scheduled AIC+AIV group |
| `pl.incore()` *(deprecated)* | `InCore` | Use `pl.at(level=pl.Level.CORE_GROUP)` instead |
| `pl.auto_incore(split=...)` *(deprecated)* | `AutoInCore` | Use `pl.at(level=pl.Level.CORE_GROUP, optimizations=[pl.auto_chunk, pl.split(...)])` |
| `pl.at(..., optimization=pl.chunked_loop_optimizer[(split=...)])` *(deprecated)* | `AutoInCore` | Use `pl.at(..., optimizations=[pl.auto_chunk, pl.split(...)])` |
| `pl.at(..., split=...)` *(deprecated)* | `InCore` | Use `pl.at(..., optimizations=[pl.split(...)])` |
| Form | Produces | Notes |
| ---- | -------- | ----- |
| `pl.at(level=pl.Level.CORE_GROUP)` | `HierarchyScopeStmt` (level=CORE_GROUP) | Outlined to `Function(InCore)` by `OutlineIncoreScopes`; parent `Opaque` is promoted to `Orchestration` |
| `pl.at(level=pl.Level.CORE_GROUP, optimizations=[pl.split(MODE)])` | `HierarchyScopeStmt` (level=CORE_GROUP, split=MODE) | Same as above; the split hint is carried on the outlined function and consumed by `ExpandMixedKernel` |
| `pl.at(level=pl.Level.HOST)` *(or any non-`CORE_GROUP` level)* | `HierarchyScopeStmt` (level=HOST/...) | Outlined to `Function(Opaque)` by `OutlineHierarchyScopes`; parent type preserved |
| `pl.cluster()` | `ClusterScopeStmt` | Outlined to `Function(Group)` by `OutlineClusterScopes` |
| `pl.spmd(core_num=N[, sync_start=...])` | `SpmdScopeStmt` | Standalone (non-cluster) spmd is outlined to `Function(Spmd)`; inside a cluster the attrs are hoisted onto the Group function |

See [Language Guide](../../user/01-language_guide.md#incore-scopes) for examples.

Expand Down
25 changes: 12 additions & 13 deletions docs/en/dev/passes/00-pass_manager.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Framework for organizing and executing IR transformation passes on Programs with
| `NoNestedCalls` | No nested call expressions |
| `NormalizedStmtStructure` | Statement structure normalized |
| `NoRedundantBlocks` | No single-child or nested SeqStmts |
| `SplitIncoreOrch` | InCore scopes outlined into separate functions |
| `HierarchyOutlined` | `HierarchyScopeStmt` regions outlined into functions (`Opaque` for non-CORE_GROUP via `OutlineHierarchyScopes`; `InCore` for `CORE_GROUP` via `OutlineIncoreScopes`); parent re-typed as `Orchestration` when a `CORE_GROUP` scope was outlined. Produced by `OutlineIncoreScopes` (the second of the two outline passes). |
| `ClusterOutlined` | Cluster scopes outlined into Group functions |
| `HasMemRefs` | MemRef objects initialized on variables |
| `IncoreTileOps` | InCore functions use tile ops |
Expand Down Expand Up @@ -61,21 +61,20 @@ struct PassProperties {
| UnrollLoops | TypeChecked | TypeChecked | — |
| CtrlFlowTransform | TypeChecked | TypeChecked, StructuredCtrlFlow | — |
| ConvertToSSA | TypeChecked | TypeChecked, SSAForm | NormalizedStmtStructure |
| FlattenCallExpr | SSAForm | SSAForm, NoNestedCalls | NormalizedStmtStructure |
| SplitChunkedLoops | TypeChecked, SSAForm | TypeChecked, SSAForm | — |
| InterchangeChunkLoops | TypeChecked, SSAForm | TypeChecked, SSAForm | — |
| NormalizeStmtStructure | TypeChecked | TypeChecked, NormalizedStmtStructure | — |
| OutlineIncoreScopes | TypeChecked, SSAForm | SplitIncoreOrch | — |
| FlattenCallExpr | SSAForm | SSAForm, NoNestedCalls | NormalizedStmtStructure |
| OutlineHierarchyScopes | SSAForm | SSAForm | — |
| OutlineIncoreScopes | SSAForm | SSAForm, HierarchyOutlined | — |
| OutlineClusterScopes | TypeChecked, SSAForm | ClusterOutlined | — |
| ConvertTensorToTileOps | SplitIncoreOrch | IncoreTileOps | — |
| ConvertTensorToTileOps | HierarchyOutlined | IncoreTileOps | — |
| FlattenTileNdTo2D | SSAForm, IncoreTileOps | SSAForm, TileOps2D | — |
| ResolveBackendOpLayouts | SSAForm, IncoreTileOps, SplitIncoreOrch, TileOps2D | SSAForm, IncoreTileOps, SplitIncoreOrch, TileOps2D | NormalizedStmtStructure |
| ExpandMixedKernel | SSAForm, IncoreTileOps, SplitIncoreOrch, TileOps2D | SSAForm, MixedKernelExpanded | — |
| NormalizeReturnOrder | SplitIncoreOrch, IncoreTileOps | — | — |
| InitMemRef | TypeChecked, SSAForm, SplitIncoreOrch, IncoreTileOps, TileOps2D | HasMemRefs | SSAForm |
| MemoryReuse | TypeChecked, SplitIncoreOrch, IncoreTileOps, HasMemRefs, TileOps2D | — | — |
| InsertSync | TypeChecked, SplitIncoreOrch, IncoreTileOps, HasMemRefs, TileOps2D | — | — |
| AllocateMemoryAddr | TypeChecked, SplitIncoreOrch, IncoreTileOps, HasMemRefs, TileOps2D | AllocatedMemoryAddr | — |
| ResolveBackendOpLayouts | SSAForm, IncoreTileOps, HierarchyOutlined, TileOps2D | SSAForm, IncoreTileOps, HierarchyOutlined, TileOps2D | NormalizedStmtStructure |
| ExpandMixedKernel | SSAForm, IncoreTileOps, HierarchyOutlined, TileOps2D | SSAForm, MixedKernelExpanded | — |
| NormalizeReturnOrder | HierarchyOutlined, IncoreTileOps | — | — |
| InitMemRef | TypeChecked, SSAForm, HierarchyOutlined, IncoreTileOps, TileOps2D | HasMemRefs | SSAForm |
| MemoryReuse | TypeChecked, HierarchyOutlined, IncoreTileOps, HasMemRefs, TileOps2D | — | — |
| InsertSync | TypeChecked, HierarchyOutlined, IncoreTileOps, HasMemRefs, TileOps2D | — | — |
| AllocateMemoryAddr | TypeChecked, HierarchyOutlined, IncoreTileOps, HasMemRefs, TileOps2D | AllocatedMemoryAddr | — |
| FuseCreateAssembleToSlice | — | — | — |
| Simplify | — | — | — |

Expand Down
4 changes: 2 additions & 2 deletions docs/en/dev/passes/01-unroll_loops.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,10 +77,10 @@ class After:
UnrollLoops runs **once** in `Default` and `DebugTileOptimization`, before control flow structuring:

```text
UnrollLoops → CtrlFlowTransform → ConvertToSSA → FlattenCallExprSplitChunkedLoopsInterchangeChunkLoops → OutlineIncoreScopes → ...
UnrollLoops → CtrlFlowTransform → ConvertToSSA → NormalizeStmtStructureFlattenCallExprOutlineHierarchyScopes → OutlineIncoreScopes → OutlineClusterScopes → ...
```

UnrollLoops expands non-chunked `pl.unroll()` loops (skipping chunked unroll loops which retain `chunk` for `SplitChunkedLoops` to handle later).
UnrollLoops expands `pl.unroll()` loops into their inline body copies.

## Pass Properties

Expand Down
2 changes: 1 addition & 1 deletion docs/en/dev/passes/02-ctrl_flow_transform.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ while i < n and not __break_0:
CtrlFlowTransform runs after UnrollLoops and before ConvertToSSA:

```text
UnrollLoops -> CtrlFlowTransform -> ConvertToSSA -> FlattenCallExpr -> SplitChunkedLoops -> ...
UnrollLoops -> CtrlFlowTransform -> ConvertToSSA -> NormalizeStmtStructure -> FlattenCallExpr -> OutlineHierarchyScopes -> ...
```

## Pass Properties
Expand Down
2 changes: 1 addition & 1 deletion docs/en/dev/passes/03-convert_to_ssa.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ This pass transforms IR with multiple assignments to the same variable into SSA

**Requires**: `TypeChecked` property. `TypeChecked` is verified automatically at BASIC level once produced; use a `VerificationInstrument` via `PassContext` to validate required properties before this pass runs.

**When to use**: Run this pass before any optimization or analysis that requires SSA form (e.g., OutlineIncoreScopes, memory optimization passes).
**When to use**: Run this pass before any optimization or analysis that requires SSA form (e.g., OutlineHierarchyScopes, memory optimization passes).

## API

Expand Down
Loading
Loading