Skip to content

feat(ir): add is_binary parameter to tile.col_sum#1099

Open
Little-oil wants to merge 6 commits intohw-native-sys:mainfrom
Little-oil:issue-881-col-sum-is-binary
Open

feat(ir): add is_binary parameter to tile.col_sum#1099
Little-oil wants to merge 6 commits intohw-native-sys:mainfrom
Little-oil:issue-881-col-sum-is-binary

Conversation

@Little-oil
Copy link
Copy Markdown
Contributor

Summary

  • Expose the TCOLSUM isBinary knob as an optional kwarg on pl.tile.col_sum (default False, sequential reduction).
  • PTOAS confirmed false is the natural default — the sequential path has different precision / latency characteristics than the binary-tree path and was hardcoded to true in feat(ir): add tile.col_sum, tile.col_max, tile.col_min operations #1088 only because the attribute was required.
  • Threaded through IR registration, PTO codegen, Python IR wrapper, DSL, and unified dispatch; signature updated in en / zh-cn operation reference.

Changes

Layer File
IR register src/ir/op/tile_ops/reduction.cpp — add set_attr<bool>("is_binary")
Codegen src/backend/common/pto_ops_common.cpp — read kwarg, emit {isBinary = true/false}
Python IR python/pypto/ir/op/tile_ops.py — forward is_binary kwarg
DSL python/pypto/language/op/tile_ops.py
Unified python/pypto/language/op/unified_ops.py
Docs docs/en/user/02-operation_reference.md, docs/zh-cn/user/02-operation_reference.md
UT tests/ut/codegen/test_pto_codegen_ops.py — add is_binary=True MLIR assertion alongside existing default case
ST tests/st/runtime/test_col_reduction.py — new FP32 sequential program / case; existing four cases pinned to is_binary=True to preserve binary-tree coverage

Test plan

  • UT: pytest tests/ut/codegen/test_pto_codegen_ops.py -k col_sum -v
  • ST on hardware (a2a3): all five TestColSum::test_* cases pass, including the new test_32x64_fp32_sequential

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 21, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

col_sum was changed to make the scratch tmp_tile optional across DSL, IR, and codegen: when tmp_tile is provided the op uses the binary-tree reduction path; when omitted it uses a sequential reduction path. Documentation, Python wrappers, IR op, codegen emission, and tests were updated accordingly.

Changes

Cohort / File(s) Summary
Documentation
docs/en/user/02-operation_reference.md, docs/zh-cn/user/02-operation_reference.md
Updated col_sum signatures to accept `tmp_tile: Tile
Python IR / DSL
python/pypto/ir/op/tile_ops.py, python/pypto/language/op/tile_ops.py, python/pypto/language/op/unified_ops.py
Made tmp_tile optional in col_sum signatures; wrappers forward None when omitted and conditionally include tmp_tile in emitted IR call; docstrings updated to describe behavior.
IR / C++ Codegen
src/ir/op/tile_ops/reduction.cpp, src/backend/common/pto_ops_common.cpp
Operator now accepts 1 or 2 args; type-deduction and validation only check tmp_tile when present; codegen emits pto.tcolsum with isBinary attribute only when two operands are present.
Unit & ST Tests
tests/ut/codegen/test_pto_codegen_ops.py, tests/st/runtime/test_col_reduction.py
Codegen tests added/updated to assert presence/absence of isBinary depending on arity; added a sequential-path program/test (skipped) and wired new test case classes for the sequential path.

Sequence Diagram

sequenceDiagram
    participant DSL as DSL/Language
    participant IR as IR Layer
    participant Codegen as PTO Codegen
    participant Runtime as Backend

    DSL->>IR: pl.tile.col_sum(tile) or pl.tile.col_sum(tile, tmp_tile)
    IR->>IR: Emit `tile.col_sum` IR call with 1 or 2 operands
    Codegen->>IR: Read IR call (args count)
    alt 2 args (tmp_tile provided)
        Codegen->>Runtime: Emit `pto.tcolsum {isBinary = true}` with 2 operands
    else 1 arg (no tmp_tile)
        Codegen->>Runtime: Emit `pto.tcolsum` with 1 operand (no isBinary)
    end
    Runtime->>Runtime: Execute binary-tree or sequential reduction accordingly
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • lyfne123
  • Hzfengsy

Poem

🐰 I hopped through tiles both wide and thin,
One scratch to split, or none to spin.
Binary or straight, the sums cascade—
The rabbit cheers each path we made! 🎋

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: making the is_binary parameter optional on tile.col_sum through presence-based inference of the tmp_tile argument.
Description check ✅ Passed The description clearly explains the purpose, changes made across all layers (IR, codegen, Python, DSL, docs, tests), and includes a test plan with verification results.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an is_binary parameter to the col_sum operation, allowing users to choose between binary-tree and sequential reduction algorithms. The changes are implemented across the entire stack, including the Python API, IR definitions, backend codegen, and documentation, with corresponding updates to runtime and unit tests. Review feedback identifies a technical inaccuracy in the docstring regarding reduction complexity and suggests improving documentation consistency within the unified dispatch section.

Comment thread docs/en/user/02-operation_reference.md Outdated
Comment thread docs/zh-cn/user/02-operation_reference.md Outdated
Comment thread python/pypto/ir/op/tile_ops.py Outdated
Little-oil pushed a commit to Little-oil/pypto that referenced this pull request Apr 21, 2026
- Align col_sum unified-dispatch signature with row_sum (use T, add
  tile-only tag) in both en and zh-cn docs.
- Correct is_binary docstring: binary-tree depth is O(log M) along the
  reduced axis 0, not O(log N).
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/st/runtime/test_col_reduction.py (1)

36-82: LGTM — but naming asymmetry between the two 32x64 FP32 cases is slightly confusing.

Pinning the existing ColSum_32x64_FP32 kernel to is_binary=True correctly preserves the prior hardcoded-True behavior, and the new ColSum_32x64_FP32_Sequential exercises the new default (is_binary=False). The reference compute_expected using torch.sum is appropriate for both FP32 cases.

Minor readability nit: the binary-tree variant is unlabeled while the sequential variant carries the _Sequential suffix. Renaming the binary one to ColSum_32x64_FP32_Binary (and the test class/method accordingly) would make the two paths symmetric at a glance. Non-blocking.

Based on learnings, the FP32 torch.sum reference is intentional and precision-sufficient for ≤32-row reductions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/st/runtime/test_col_reduction.py` around lines 36 - 82, Rename the
unlabeled binary-tree class ColSum_32x64_FP32 to ColSum_32x64_FP32_Binary (and
update any corresponding test function/method names that reference
ColSum_32x64_FP32) so it is symmetric with ColSum_32x64_FP32_Sequential; update
all references to the old class name in the file to the new
ColSum_32x64_FP32_Binary identifier to keep intent clear while preserving the
is_binary=True behavior in the kernel.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/pypto/ir/op/tile_ops.py`:
- Around line 1796-1812: Update the col_sum function signature to make is_binary
a keyword-only parameter (so calls like col_sum(tile, tmp_tile, span) won't bind
span to is_binary) and keep span as the final optional positional/captured
param; adjust the docstring to state the reduction reduces axis=0 (first axis M)
and that the binary-tree depth is O(log M) not O(log N). Locate the col_sum
function and change its signature accordingly and update the docstring lines
describing axis, output shape, and complexity.

---

Nitpick comments:
In `@tests/st/runtime/test_col_reduction.py`:
- Around line 36-82: Rename the unlabeled binary-tree class ColSum_32x64_FP32 to
ColSum_32x64_FP32_Binary (and update any corresponding test function/method
names that reference ColSum_32x64_FP32) so it is symmetric with
ColSum_32x64_FP32_Sequential; update all references to the old class name in the
file to the new ColSum_32x64_FP32_Binary identifier to keep intent clear while
preserving the is_binary=True behavior in the kernel.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 12c77cc0-b150-4eab-82f3-a29d10930e34

📥 Commits

Reviewing files that changed from the base of the PR and between 4122b0e and ae50e86.

📒 Files selected for processing (9)
  • docs/en/user/02-operation_reference.md
  • docs/zh-cn/user/02-operation_reference.md
  • python/pypto/ir/op/tile_ops.py
  • python/pypto/language/op/tile_ops.py
  • python/pypto/language/op/unified_ops.py
  • src/backend/common/pto_ops_common.cpp
  • src/ir/op/tile_ops/reduction.cpp
  • tests/st/runtime/test_col_reduction.py
  • tests/ut/codegen/test_pto_codegen_ops.py

Comment thread python/pypto/ir/op/tile_ops.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/st/runtime/test_col_reduction.py`:
- Around line 71-72: Remove the duplicated unreachable return statement: keep a
single "return pl.store(result, [0, 0], output)" call and delete the second
identical one so the test_col_reduction code only returns once; locate the
duplicate by finding the consecutive "return pl.store(result, [0, 0], output)"
lines referencing result, pl.store and output and remove the extra occurrence.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 4d857248-3442-4977-9731-fb96b55932ec

📥 Commits

Reviewing files that changed from the base of the PR and between ae50e86 and 47315b1.

📒 Files selected for processing (9)
  • docs/en/user/02-operation_reference.md
  • docs/zh-cn/user/02-operation_reference.md
  • python/pypto/ir/op/tile_ops.py
  • python/pypto/language/op/tile_ops.py
  • python/pypto/language/op/unified_ops.py
  • src/backend/common/pto_ops_common.cpp
  • src/ir/op/tile_ops/reduction.cpp
  • tests/st/runtime/test_col_reduction.py
  • tests/ut/codegen/test_pto_codegen_ops.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • python/pypto/language/op/tile_ops.py
  • docs/en/user/02-operation_reference.md
  • src/ir/op/tile_ops/reduction.cpp
  • python/pypto/ir/op/tile_ops.py

Comment thread tests/st/runtime/test_col_reduction.py Outdated
Comment on lines +71 to +72
return pl.store(result, [0, 0], output)
return pl.store(result, [0, 0], output)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove the duplicated return pl.store(...).

Line 72 is unreachable and duplicates Line 71; keeping a single return avoids confusing the source-based DSL parser and future readers.

Proposed cleanup
         tile: pl.Tile[[32, 64], pl.FP32] = pl.load(input_tensor, [0, 0], [32, 64])
         result: pl.Tile[[1, 64], pl.FP32] = pl.tile.col_sum(tile)
         return pl.store(result, [0, 0], output)
-        return pl.store(result, [0, 0], output)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return pl.store(result, [0, 0], output)
return pl.store(result, [0, 0], output)
tile: pl.Tile[[32, 64], pl.FP32] = pl.load(input_tensor, [0, 0], [32, 64])
result: pl.Tile[[1, 64], pl.FP32] = pl.tile.col_sum(tile)
return pl.store(result, [0, 0], output)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/st/runtime/test_col_reduction.py` around lines 71 - 72, Remove the
duplicated unreachable return statement: keep a single "return pl.store(result,
[0, 0], output)" call and delete the second identical one so the
test_col_reduction code only returns once; locate the duplicate by finding the
consecutive "return pl.store(result, [0, 0], output)" lines referencing result,
pl.store and output and remove the extra occurrence.

@Little-oil
Copy link
Copy Markdown
Contributor Author

等PTOAS提供2参TCOLSUM

Youhezhen added 4 commits April 22, 2026 17:35
Expose the TCOLSUM isBinary knob as an optional kwarg on pl.tile.col_sum
(default False, sequential reduction). PTOAS confirmed false is the
natural default — sequential path has different precision / latency
characteristics than the binary-tree path.

Threaded through IR registration, codegen, Python IR wrapper, DSL and
unified dispatch. Existing ST tests pass is_binary=True to preserve
binary-tree coverage; a new FP32 sequential ST variant exercises the
default path on hardware.
- Align col_sum unified-dispatch signature with row_sum (use T, add
  tile-only tag) in both en and zh-cn docs.
- Correct is_binary docstring: binary-tree depth is O(log M) along the
  reduced axis 0, not O(log N).
Replace the explicit `is_binary` kwarg on `pl.tile.col_sum` with
presence-based inference, matching the `pl.tile.rsqrt(tile, tmp=...)`
idiom:

- `col_sum(tile)` -> sequential reduction (TCOLSUM 2-arg form)
- `col_sum(tile, tmp)` -> binary-tree reduction (TCOLSUM 4-arg form)

The sequential ST test is skipped for now: PTOAS NPU backends
(a2a3/a5) currently only implement the 4-arg `TCOLSUM_IMPL`. The
test will be re-enabled once PTOAS adds the 2-arg NPU overload.
Update CI pto-isa commit to the revision that adds the 2-arg
TCOLSUM_IMPL overload in the NPU (a2a3/a5) backends. With the
sequential path now buildable on hardware, re-enable the
`test_32x64_fp32_sequential` ST case.

Also drop a stray duplicate `pl.store` line in ColSum_32x64_FP32_Sequential.
@Little-oil Little-oil force-pushed the issue-881-col-sum-is-binary branch from c0c24c9 to 6f5f760 Compare April 22, 2026 09:39
Youhezhen added 2 commits April 22, 2026 17:57
Previous sequential test only covered [32, 64] FP32. Add matching
sequential-path cases for [16, 16] FP32, [8, 128] FP32, and
[32, 64] FP16 so the no-tmp path matches the binary-tree coverage
matrix (shape x dtype).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant