feat(ir): add is_binary parameter to tile.col_sum by Little-oil · Pull Request #1099 · hw-native-sys/pypto

Little-oil · 2026-04-21T01:59:34Z

Summary

Expose the TCOLSUM isBinary knob as an optional kwarg on pl.tile.col_sum (default False, sequential reduction).
PTOAS confirmed false is the natural default — the sequential path has different precision / latency characteristics than the binary-tree path and was hardcoded to true in feat(ir): add tile.col_sum, tile.col_max, tile.col_min operations #1088 only because the attribute was required.
Threaded through IR registration, PTO codegen, Python IR wrapper, DSL, and unified dispatch; signature updated in en / zh-cn operation reference.

Changes

Layer	File
IR register	`src/ir/op/tile_ops/reduction.cpp` — add `set_attr<bool>("is_binary")`
Codegen	`src/backend/common/pto_ops_common.cpp` — read kwarg, emit `{isBinary = true/false}`
Python IR	`python/pypto/ir/op/tile_ops.py` — forward `is_binary` kwarg
DSL	`python/pypto/language/op/tile_ops.py`
Unified	`python/pypto/language/op/unified_ops.py`
Docs	`docs/en/user/02-operation_reference.md`, `docs/zh-cn/user/02-operation_reference.md`
UT	`tests/ut/codegen/test_pto_codegen_ops.py` — add `is_binary=True` MLIR assertion alongside existing default case
ST	`tests/st/runtime/test_col_reduction.py` — new FP32 sequential program / case; existing four cases pinned to `is_binary=True` to preserve binary-tree coverage

Test plan

UT: pytest tests/ut/codegen/test_pto_codegen_ops.py -k col_sum -v
ST on hardware (a2a3): all five TestColSum::test_* cases pass, including the new test_32x64_fp32_sequential

coderabbitai · 2026-04-21T01:59:50Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

col_sum was changed to make the scratch tmp_tile optional across DSL, IR, and codegen: when tmp_tile is provided the op uses the binary-tree reduction path; when omitted it uses a sequential reduction path. Documentation, Python wrappers, IR op, codegen emission, and tests were updated accordingly.

Changes

Cohort / File(s)	Summary
Documentation `docs/en/user/02-operation_reference.md`, `docs/zh-cn/user/02-operation_reference.md`	Updated `col_sum` signatures to accept `tmp_tile: Tile
Python IR / DSL `python/pypto/ir/op/tile_ops.py`, `python/pypto/language/op/tile_ops.py`, `python/pypto/language/op/unified_ops.py`	Made `tmp_tile` optional in `col_sum` signatures; wrappers forward `None` when omitted and conditionally include `tmp_tile` in emitted IR call; docstrings updated to describe behavior.
IR / C++ Codegen `src/ir/op/tile_ops/reduction.cpp`, `src/backend/common/pto_ops_common.cpp`	Operator now accepts 1 or 2 args; type-deduction and validation only check `tmp_tile` when present; codegen emits `pto.tcolsum` with `isBinary` attribute only when two operands are present.
Unit & ST Tests `tests/ut/codegen/test_pto_codegen_ops.py`, `tests/st/runtime/test_col_reduction.py`	Codegen tests added/updated to assert presence/absence of `isBinary` depending on arity; added a sequential-path program/test (skipped) and wired new test case classes for the sequential path.

Sequence Diagram

sequenceDiagram
    participant DSL as DSL/Language
    participant IR as IR Layer
    participant Codegen as PTO Codegen
    participant Runtime as Backend

    DSL->>IR: pl.tile.col_sum(tile) or pl.tile.col_sum(tile, tmp_tile)
    IR->>IR: Emit `tile.col_sum` IR call with 1 or 2 operands
    Codegen->>IR: Read IR call (args count)
    alt 2 args (tmp_tile provided)
        Codegen->>Runtime: Emit `pto.tcolsum {isBinary = true}` with 2 operands
    else 1 arg (no tmp_tile)
        Codegen->>Runtime: Emit `pto.tcolsum` with 1 operand (no isBinary)
    end
    Runtime->>Runtime: Execute binary-tree or sequential reduction accordingly

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat(ops): Expose pto.trsqrt high-precision mode in Python frontend #1070: Also makes a tile op accept an optional scratch tile and updates IR/codegen/tests in the same pattern.
feat(ir): add tile.col_sum, tile.col_max, tile.col_min operations #1088: Prior PR that introduced tile.col_sum with a hardcoded binary flag; this change makes the scratch argument optional and adjusts surrounding layers.
refactor(ir): Rename block to tile across entire codebase #386: Related work on tile reduction operator changes and renaming across reductions.

Suggested reviewers

lyfne123
Hzfengsy

Poem

🐰 I hopped through tiles both wide and thin,
One scratch to split, or none to spin.
Binary or straight, the sums cascade—
The rabbit cheers each path we made! 🎋

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: making the `is_binary` parameter optional on `tile.col_sum` through presence-based inference of the `tmp_tile` argument.
Description check	✅ Passed	The description clearly explains the purpose, changes made across all layers (IR, codegen, Python, DSL, docs, tests), and includes a test plan with verification results.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces an is_binary parameter to the col_sum operation, allowing users to choose between binary-tree and sequential reduction algorithms. The changes are implemented across the entire stack, including the Python API, IR definitions, backend codegen, and documentation, with corresponding updates to runtime and unit tests. Review feedback identifies a technical inaccuracy in the docstring regarding reduction complexity and suggests improving documentation consistency within the unified dispatch section.

- Align col_sum unified-dispatch signature with row_sum (use T, add tile-only tag) in both en and zh-cn docs. - Correct is_binary docstring: binary-tree depth is O(log M) along the reduced axis 0, not O(log N).

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/st/runtime/test_col_reduction.py (1)
36-82: LGTM — but naming asymmetry between the two 32x64 FP32 cases is slightly confusing.

Pinning the existing ColSum_32x64_FP32 kernel to is_binary=True correctly preserves the prior hardcoded-True behavior, and the new ColSum_32x64_FP32_Sequential exercises the new default (is_binary=False). The reference compute_expected using torch.sum is appropriate for both FP32 cases.

Minor readability nit: the binary-tree variant is unlabeled while the sequential variant carries the _Sequential suffix. Renaming the binary one to ColSum_32x64_FP32_Binary (and the test class/method accordingly) would make the two paths symmetric at a glance. Non-blocking.

Based on learnings, the FP32 torch.sum reference is intentional and precision-sufficient for ≤32-row reductions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/st/runtime/test_col_reduction.py` around lines 36 - 82, Rename the
unlabeled binary-tree class ColSum_32x64_FP32 to ColSum_32x64_FP32_Binary (and
update any corresponding test function/method names that reference
ColSum_32x64_FP32) so it is symmetric with ColSum_32x64_FP32_Sequential; update
all references to the old class name in the file to the new
ColSum_32x64_FP32_Binary identifier to keep intent clear while preserving the
is_binary=True behavior in the kernel.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/pypto/ir/op/tile_ops.py`:
- Around line 1796-1812: Update the col_sum function signature to make is_binary
a keyword-only parameter (so calls like col_sum(tile, tmp_tile, span) won't bind
span to is_binary) and keep span as the final optional positional/captured
param; adjust the docstring to state the reduction reduces axis=0 (first axis M)
and that the binary-tree depth is O(log M) not O(log N). Locate the col_sum
function and change its signature accordingly and update the docstring lines
describing axis, output shape, and complexity.

---

Nitpick comments:
In `@tests/st/runtime/test_col_reduction.py`:
- Around line 36-82: Rename the unlabeled binary-tree class ColSum_32x64_FP32 to
ColSum_32x64_FP32_Binary (and update any corresponding test function/method
names that reference ColSum_32x64_FP32) so it is symmetric with
ColSum_32x64_FP32_Sequential; update all references to the old class name in the
file to the new ColSum_32x64_FP32_Binary identifier to keep intent clear while
preserving the is_binary=True behavior in the kernel.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 12c77cc0-b150-4eab-82f3-a29d10930e34

📥 Commits

Reviewing files that changed from the base of the PR and between 4122b0e and ae50e86.

📒 Files selected for processing (9)

docs/en/user/02-operation_reference.md
docs/zh-cn/user/02-operation_reference.md
python/pypto/ir/op/tile_ops.py
python/pypto/language/op/tile_ops.py
python/pypto/language/op/unified_ops.py
src/backend/common/pto_ops_common.cpp
src/ir/op/tile_ops/reduction.cpp
tests/st/runtime/test_col_reduction.py
tests/ut/codegen/test_pto_codegen_ops.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/st/runtime/test_col_reduction.py`:
- Around line 71-72: Remove the duplicated unreachable return statement: keep a
single "return pl.store(result, [0, 0], output)" call and delete the second
identical one so the test_col_reduction code only returns once; locate the
duplicate by finding the consecutive "return pl.store(result, [0, 0], output)"
lines referencing result, pl.store and output and remove the extra occurrence.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 4d857248-3442-4977-9731-fb96b55932ec

📥 Commits

Reviewing files that changed from the base of the PR and between ae50e86 and 47315b1.

📒 Files selected for processing (9)

docs/en/user/02-operation_reference.md
docs/zh-cn/user/02-operation_reference.md
python/pypto/ir/op/tile_ops.py
python/pypto/language/op/tile_ops.py
python/pypto/language/op/unified_ops.py
src/backend/common/pto_ops_common.cpp
src/ir/op/tile_ops/reduction.cpp
tests/st/runtime/test_col_reduction.py
tests/ut/codegen/test_pto_codegen_ops.py

🚧 Files skipped from review as they are similar to previous changes (4)

python/pypto/language/op/tile_ops.py
docs/en/user/02-operation_reference.md
src/ir/op/tile_ops/reduction.cpp
python/pypto/ir/op/tile_ops.py

coderabbitai · 2026-04-21T04:02:28Z

+        return pl.store(result, [0, 0], output)
+        return pl.store(result, [0, 0], output)


⚠️ Potential issue | 🟡 Minor

Remove the duplicated return pl.store(...).

Line 72 is unreachable and duplicates Line 71; keeping a single return avoids confusing the source-based DSL parser and future readers.

Proposed cleanup

tile: pl.Tile[[32, 64], pl.FP32] = pl.load(input_tensor, [0, 0], [32, 64]) result: pl.Tile[[1, 64], pl.FP32] = pl.tile.col_sum(tile) return pl.store(result, [0, 0], output) - return pl.store(result, [0, 0], output)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

return pl.store(result, [0, 0], output)

return pl.store(result, [0, 0], output)

tile: pl.Tile[[32, 64], pl.FP32] = pl.load(input_tensor, [0, 0], [32, 64])

result: pl.Tile[[1, 64], pl.FP32] = pl.tile.col_sum(tile)

return pl.store(result, [0, 0], output)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/st/runtime/test_col_reduction.py` around lines 71 - 72, Remove the duplicated unreachable return statement: keep a single "return pl.store(result, [0, 0], output)" call and delete the second identical one so the test_col_reduction code only returns once; locate the duplicate by finding the consecutive "return pl.store(result, [0, 0], output)" lines referencing result, pl.store and output and remove the extra occurrence.

Little-oil · 2026-04-21T06:35:34Z

等PTOAS提供2参TCOLSUM

Expose the TCOLSUM isBinary knob as an optional kwarg on pl.tile.col_sum (default False, sequential reduction). PTOAS confirmed false is the natural default — sequential path has different precision / latency characteristics than the binary-tree path. Threaded through IR registration, codegen, Python IR wrapper, DSL and unified dispatch. Existing ST tests pass is_binary=True to preserve binary-tree coverage; a new FP32 sequential ST variant exercises the default path on hardware.

- Align col_sum unified-dispatch signature with row_sum (use T, add tile-only tag) in both en and zh-cn docs. - Correct is_binary docstring: binary-tree depth is O(log M) along the reduced axis 0, not O(log N).

Replace the explicit `is_binary` kwarg on `pl.tile.col_sum` with presence-based inference, matching the `pl.tile.rsqrt(tile, tmp=...)` idiom: - `col_sum(tile)` -> sequential reduction (TCOLSUM 2-arg form) - `col_sum(tile, tmp)` -> binary-tree reduction (TCOLSUM 4-arg form) The sequential ST test is skipped for now: PTOAS NPU backends (a2a3/a5) currently only implement the 4-arg `TCOLSUM_IMPL`. The test will be re-enabled once PTOAS adds the 2-arg NPU overload.

Update CI pto-isa commit to the revision that adds the 2-arg TCOLSUM_IMPL overload in the NPU (a2a3/a5) backends. With the sequential path now buildable on hardware, re-enable the `test_32x64_fp32_sequential` ST case. Also drop a stray duplicate `pl.store` line in ColSum_32x64_FP32_Sequential.

Previous sequential test only covered [32, 64] FP32. Add matching sequential-path cases for [16, 16] FP32, [8, 128] FP32, and [32, 64] FP16 so the no-tmp path matches the binary-tree coverage matrix (shape x dtype).

github-project-automation Bot added this to pto project Apr 21, 2026

gemini-code-assist Bot reviewed Apr 21, 2026

View reviewed changes

Comment thread docs/en/user/02-operation_reference.md Outdated

Comment thread docs/zh-cn/user/02-operation_reference.md Outdated

Comment thread python/pypto/ir/op/tile_ops.py Outdated

coderabbitai Bot reviewed Apr 21, 2026

View reviewed changes

Comment thread python/pypto/ir/op/tile_ops.py Outdated

coderabbitai Bot reviewed Apr 21, 2026

View reviewed changes

Youhezhen added 4 commits April 22, 2026 17:35

fix(pr): resolve issues for hw-native-sys#1099

14d611a

- Align col_sum unified-dispatch signature with row_sum (use T, add tile-only tag) in both en and zh-cn docs. - Correct is_binary docstring: binary-tree depth is O(log M) along the reduced axis 0, not O(log N).

Little-oil force-pushed the issue-881-col-sum-is-binary branch from c0c24c9 to 6f5f760 Compare April 22, 2026 09:39

Youhezhen added 2 commits April 22, 2026 17:57

test(st): expand col_sum sequential coverage to 4 shapes/dtypes

59b424a

Previous sequential test only covered [32, 64] FP32. Add matching sequential-path cases for [16, 16] FP32, [8, 128] FP32, and [32, 64] FP16 so the no-tmp path matches the binary-tree coverage matrix (shape x dtype).

ci: retrigger after flaky system-tests failure

6935a17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ir): add is_binary parameter to tile.col_sum#1099

feat(ir): add is_binary parameter to tile.col_sum#1099
Little-oil wants to merge 6 commits intohw-native-sys:mainfrom
Little-oil:issue-881-col-sum-is-binary

Little-oil commented Apr 21, 2026

Uh oh!

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Uh oh!

Little-oil commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		return pl.store(result, [0, 0], output)
		return pl.store(result, [0, 0], output)

Conversation

Little-oil commented Apr 21, 2026

Summary

Changes

Test plan

Uh oh!

coderabbitai Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Little-oil commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading