feat(ir): add is_binary parameter to tile.col_sum#1099
feat(ir): add is_binary parameter to tile.col_sum#1099Little-oil wants to merge 6 commits intohw-native-sys:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthrough
Changes
Sequence DiagramsequenceDiagram
participant DSL as DSL/Language
participant IR as IR Layer
participant Codegen as PTO Codegen
participant Runtime as Backend
DSL->>IR: pl.tile.col_sum(tile) or pl.tile.col_sum(tile, tmp_tile)
IR->>IR: Emit `tile.col_sum` IR call with 1 or 2 operands
Codegen->>IR: Read IR call (args count)
alt 2 args (tmp_tile provided)
Codegen->>Runtime: Emit `pto.tcolsum {isBinary = true}` with 2 operands
else 1 arg (no tmp_tile)
Codegen->>Runtime: Emit `pto.tcolsum` with 1 operand (no isBinary)
end
Runtime->>Runtime: Execute binary-tree or sequential reduction accordingly
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces an is_binary parameter to the col_sum operation, allowing users to choose between binary-tree and sequential reduction algorithms. The changes are implemented across the entire stack, including the Python API, IR definitions, backend codegen, and documentation, with corresponding updates to runtime and unit tests. Review feedback identifies a technical inaccuracy in the docstring regarding reduction complexity and suggests improving documentation consistency within the unified dispatch section.
- Align col_sum unified-dispatch signature with row_sum (use T, add tile-only tag) in both en and zh-cn docs. - Correct is_binary docstring: binary-tree depth is O(log M) along the reduced axis 0, not O(log N).
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/st/runtime/test_col_reduction.py (1)
36-82: LGTM — but naming asymmetry between the two 32x64 FP32 cases is slightly confusing.Pinning the existing
ColSum_32x64_FP32kernel tois_binary=Truecorrectly preserves the prior hardcoded-True behavior, and the newColSum_32x64_FP32_Sequentialexercises the new default (is_binary=False). The referencecompute_expectedusingtorch.sumis appropriate for both FP32 cases.Minor readability nit: the binary-tree variant is unlabeled while the sequential variant carries the
_Sequentialsuffix. Renaming the binary one toColSum_32x64_FP32_Binary(and the test class/method accordingly) would make the two paths symmetric at a glance. Non-blocking.Based on learnings, the FP32
torch.sumreference is intentional and precision-sufficient for ≤32-row reductions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/st/runtime/test_col_reduction.py` around lines 36 - 82, Rename the unlabeled binary-tree class ColSum_32x64_FP32 to ColSum_32x64_FP32_Binary (and update any corresponding test function/method names that reference ColSum_32x64_FP32) so it is symmetric with ColSum_32x64_FP32_Sequential; update all references to the old class name in the file to the new ColSum_32x64_FP32_Binary identifier to keep intent clear while preserving the is_binary=True behavior in the kernel.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@python/pypto/ir/op/tile_ops.py`:
- Around line 1796-1812: Update the col_sum function signature to make is_binary
a keyword-only parameter (so calls like col_sum(tile, tmp_tile, span) won't bind
span to is_binary) and keep span as the final optional positional/captured
param; adjust the docstring to state the reduction reduces axis=0 (first axis M)
and that the binary-tree depth is O(log M) not O(log N). Locate the col_sum
function and change its signature accordingly and update the docstring lines
describing axis, output shape, and complexity.
---
Nitpick comments:
In `@tests/st/runtime/test_col_reduction.py`:
- Around line 36-82: Rename the unlabeled binary-tree class ColSum_32x64_FP32 to
ColSum_32x64_FP32_Binary (and update any corresponding test function/method
names that reference ColSum_32x64_FP32) so it is symmetric with
ColSum_32x64_FP32_Sequential; update all references to the old class name in the
file to the new ColSum_32x64_FP32_Binary identifier to keep intent clear while
preserving the is_binary=True behavior in the kernel.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 12c77cc0-b150-4eab-82f3-a29d10930e34
📒 Files selected for processing (9)
docs/en/user/02-operation_reference.mddocs/zh-cn/user/02-operation_reference.mdpython/pypto/ir/op/tile_ops.pypython/pypto/language/op/tile_ops.pypython/pypto/language/op/unified_ops.pysrc/backend/common/pto_ops_common.cppsrc/ir/op/tile_ops/reduction.cpptests/st/runtime/test_col_reduction.pytests/ut/codegen/test_pto_codegen_ops.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/st/runtime/test_col_reduction.py`:
- Around line 71-72: Remove the duplicated unreachable return statement: keep a
single "return pl.store(result, [0, 0], output)" call and delete the second
identical one so the test_col_reduction code only returns once; locate the
duplicate by finding the consecutive "return pl.store(result, [0, 0], output)"
lines referencing result, pl.store and output and remove the extra occurrence.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 4d857248-3442-4977-9731-fb96b55932ec
📒 Files selected for processing (9)
docs/en/user/02-operation_reference.mddocs/zh-cn/user/02-operation_reference.mdpython/pypto/ir/op/tile_ops.pypython/pypto/language/op/tile_ops.pypython/pypto/language/op/unified_ops.pysrc/backend/common/pto_ops_common.cppsrc/ir/op/tile_ops/reduction.cpptests/st/runtime/test_col_reduction.pytests/ut/codegen/test_pto_codegen_ops.py
🚧 Files skipped from review as they are similar to previous changes (4)
- python/pypto/language/op/tile_ops.py
- docs/en/user/02-operation_reference.md
- src/ir/op/tile_ops/reduction.cpp
- python/pypto/ir/op/tile_ops.py
| return pl.store(result, [0, 0], output) | ||
| return pl.store(result, [0, 0], output) |
There was a problem hiding this comment.
Remove the duplicated return pl.store(...).
Line 72 is unreachable and duplicates Line 71; keeping a single return avoids confusing the source-based DSL parser and future readers.
Proposed cleanup
tile: pl.Tile[[32, 64], pl.FP32] = pl.load(input_tensor, [0, 0], [32, 64])
result: pl.Tile[[1, 64], pl.FP32] = pl.tile.col_sum(tile)
return pl.store(result, [0, 0], output)
- return pl.store(result, [0, 0], output)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return pl.store(result, [0, 0], output) | |
| return pl.store(result, [0, 0], output) | |
| tile: pl.Tile[[32, 64], pl.FP32] = pl.load(input_tensor, [0, 0], [32, 64]) | |
| result: pl.Tile[[1, 64], pl.FP32] = pl.tile.col_sum(tile) | |
| return pl.store(result, [0, 0], output) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/st/runtime/test_col_reduction.py` around lines 71 - 72, Remove the
duplicated unreachable return statement: keep a single "return pl.store(result,
[0, 0], output)" call and delete the second identical one so the
test_col_reduction code only returns once; locate the duplicate by finding the
consecutive "return pl.store(result, [0, 0], output)" lines referencing result,
pl.store and output and remove the extra occurrence.
|
等PTOAS提供2参TCOLSUM |
Expose the TCOLSUM isBinary knob as an optional kwarg on pl.tile.col_sum (default False, sequential reduction). PTOAS confirmed false is the natural default — sequential path has different precision / latency characteristics than the binary-tree path. Threaded through IR registration, codegen, Python IR wrapper, DSL and unified dispatch. Existing ST tests pass is_binary=True to preserve binary-tree coverage; a new FP32 sequential ST variant exercises the default path on hardware.
- Align col_sum unified-dispatch signature with row_sum (use T, add tile-only tag) in both en and zh-cn docs. - Correct is_binary docstring: binary-tree depth is O(log M) along the reduced axis 0, not O(log N).
Replace the explicit `is_binary` kwarg on `pl.tile.col_sum` with presence-based inference, matching the `pl.tile.rsqrt(tile, tmp=...)` idiom: - `col_sum(tile)` -> sequential reduction (TCOLSUM 2-arg form) - `col_sum(tile, tmp)` -> binary-tree reduction (TCOLSUM 4-arg form) The sequential ST test is skipped for now: PTOAS NPU backends (a2a3/a5) currently only implement the 4-arg `TCOLSUM_IMPL`. The test will be re-enabled once PTOAS adds the 2-arg NPU overload.
Update CI pto-isa commit to the revision that adds the 2-arg TCOLSUM_IMPL overload in the NPU (a2a3/a5) backends. With the sequential path now buildable on hardware, re-enable the `test_32x64_fp32_sequential` ST case. Also drop a stray duplicate `pl.store` line in ColSum_32x64_FP32_Sequential.
c0c24c9 to
6f5f760
Compare
Previous sequential test only covered [32, 64] FP32. Add matching sequential-path cases for [16, 16] FP32, [8, 128] FP32, and [32, 64] FP16 so the no-tmp path matches the binary-tree coverage matrix (shape x dtype).
Summary
isBinaryknob as an optional kwarg onpl.tile.col_sum(defaultFalse, sequential reduction).falseis the natural default — the sequential path has different precision / latency characteristics than the binary-tree path and was hardcoded totruein feat(ir): add tile.col_sum, tile.col_max, tile.col_min operations #1088 only because the attribute was required.Changes
src/ir/op/tile_ops/reduction.cpp— addset_attr<bool>("is_binary")src/backend/common/pto_ops_common.cpp— read kwarg, emit{isBinary = true/false}python/pypto/ir/op/tile_ops.py— forwardis_binarykwargpython/pypto/language/op/tile_ops.pypython/pypto/language/op/unified_ops.pydocs/en/user/02-operation_reference.md,docs/zh-cn/user/02-operation_reference.mdtests/ut/codegen/test_pto_codegen_ops.py— addis_binary=TrueMLIR assertion alongside existing default casetests/st/runtime/test_col_reduction.py— new FP32 sequential program / case; existing four cases pinned tois_binary=Trueto preserve binary-tree coverageTest plan
pytest tests/ut/codegen/test_pto_codegen_ops.py -k col_sum -vTestColSum::test_*cases pass, including the newtest_32x64_fp32_sequential