Skip to content

feat(ir): add tile.ci / tensor.ci ops (arange) — #1095#1110

Open
Little-oil wants to merge 5 commits intohw-native-sys:mainfrom
Little-oil:issue-1095-add-tci-op
Open

feat(ir): add tile.ci / tensor.ci ops (arange) — #1095#1110
Little-oil wants to merge 5 commits intohw-native-sys:mainfrom
Little-oil:issue-1095-add-tci-op

Conversation

@Little-oil
Copy link
Copy Markdown
Contributor

Summary

Register TCI (contiguous integer sequence) at the IR / DSL / conversion layers so the already-wired pto.tci codegen actually fires. Before this PR, ir.is_op_registered("tile.ci") == False blocked all usage despite backend codegen being in place.

Fixes #1095

Usage

import pypto.language as pl

# Ascending: dst[k] = start + k
idx: pl.Tile[[1, 32], pl.INT32] = pl.tile.ci(0, [1, 32], dtype=pl.INT32)

# Descending: dst[k] = start - k
idx: pl.Tile[[1, 32], pl.INT32] = pl.tile.ci(31, [1, 32], dtype=pl.INT32, descending=True)

# Tensor variant (lowers to tile.ci via conversion pass)
idx: pl.Tensor[[1, 32], pl.INT32] = pl.tensor.ci(0, [1, 32], dtype=pl.INT32)

# arange alias
idx = pl.arange(0, [1, 32], dtype=pl.INT32)

Primary motivation: supply the idx operand for tile.sort32 / tensor.sort32 without fabricating it via orchestration.

Inputs / Outputs

Param Type Notes
start scalar same dtype as destination
shape Sequence[IntLike] innermost dim must be != 1
dtype DataType {INT16, INT32} only (see note below)
descending bool (default False) ascending vs. descending

Output: tile/tensor of the given shape / dtype populated per the formula.

Important ISA Constraint — only the first row is populated

Per _deps/pto-isa/docs/isa/tile/ops/irregular-and-complex/tci.md (Constraints section):

The implementation uses dst.GetValidCol() as the sequence length and does not consult dst.GetValidRow().

This means for shape [M, N] with M > 1, only row 0 gets [start, start±1, …, start±(N-1)]; rows 1..M-1 retain UB residue. Typical correct usage is shape=[1, N], which matches sort32's idx operand. Multi-row arange is out of scope for this PR and should be built on top via broadcast (future work).

Why dtype is narrowed to {INT16, INT32} (no UINT)

The ISA doc nominally allows {INT16, UINT16, INT32, UINT32} (tci.md:96), but the PTOAS lowering pipeline has a gap on the unsigned path:

  1. MLIR arith.constant requires signless integer types (i32, not ui32).
  2. Bridging signless → unsigned via builtin.unrealized_conversion_cast produces valid MLIR but PTOAS recovers the template parameter T from the SSA def chain, ignoring the cast.
  3. The result: TCI<Tile<..., uint32_t, ...>, int32_t, 0>(...)T=int32_t mismatches TileData::DType=uint32_t, triggering static_assert<is_same<DType, T>> failure in _deps/pto-isa/include/pto/npu/a2a3/TCI.hpp:20.
  4. Official PTOAS ST (_deps/pto-isa/tests/npu/a2a3/src/st/testcase/tci/tci_kernel.cpp) only instantiates int32_t and int16_t — the unsigned path was never validated upstream.

Frontend UTs added (test_tile_ci_rejects_uint_dtype, test_tensor_ci_rejects_uint_dtype) guard this restriction. Re-enable UINT16/UINT32 once PTOAS supports it properly.

Changes

  • C++ IR: REGISTER_OP("tile.ci") in src/ir/op/tile_ops/memory.cpp; REGISTER_OP("tensor.ci") in src/ir/op/tensor_ops/memory.cpp.
  • Python IR: ci(...) in python/pypto/ir/op/{tile,tensor}_ops.py.
  • Python DSL: ci(...) + arange alias in python/pypto/language/op/{tile,tensor}_ops.py.
  • Conversion: RegisterSimple("tensor.ci", "tile.ci") in op_conversion_registry.cpp.
  • Backend: small MakeCiCodegenPTO tweak to emit DPS-form type annotations (ins(%v {attr} : type) outs(%d : type)).
  • Docs: en/zh operator table entries.
  • Tests: UTs for both ops, conversion test, ST (tests/st/runtime/test_ci.py).

Test plan

  • Unit tests: pytest tests/ut/ir/operators/ -v -k ci
  • Conversion test: pytest tests/ut/ir/transforms/ -v -k ci
  • ST (PTOAS path on 910B): pytest tests/st/runtime/test_ci.py — ascending start=0, ascending start=10, descending, all passed
  • pre-commit run --all-files passed

…1095

Register TCI (contiguous integer sequence) at IR / DSL / conversion
layers so the already-wired pto.tci codegen actually fires. Expose
`pl.tile.ci` / `pl.tensor.ci` with `pl.arange` alias.

Fixes hw-native-sys#1095

Semantics (per ISA):
- Ascending:  dst[k] = start + k
- Descending: dst[k] = start - k
- Hardware uses dst.GetValidCol() as sequence length and does NOT
  consult GetValidRow(); only the first row of a multi-row tile is
  populated. Typical usage is shape=[1, N] (matches sort32's idx
  operand).

Inputs:
- start: scalar (same dtype as destination)
- shape: output shape, innermost dim != 1
- dtype: destination dtype
- descending: bool kwarg (default False)

dtype whitelist narrowed to {INT16, INT32}.
The ISA doc nominally allows {INT16, UINT16, INT32, UINT32}, but the
PTOAS lowering pipeline only validates signed paths — emitting
`arith.constant` for unsigned dtypes requires a signless + unrealized
conversion cast bridge, and PTOAS recovers T from the def chain,
triggering a `static_assert<is_same<DType, T>>` failure in
`_deps/pto-isa/include/pto/npu/a2a3/TCI.hpp`. Official ST only
instantiates int32_t / int16_t. Re-enable uint once PTOAS supports it.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 21, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a contiguous-integer-sequence operator (ci / alias arange) at tensor and tile IR levels, with C++ registrations and type deduction, Python IR/DSL wrappers, tensor→tile conversion, backend codegen updates, and unit + runtime tests.

Changes

Cohort / File(s) Summary
Documentation
docs/en/dev/ir/05-operators.md, docs/zh-cn/dev/ir/05-operators.md
Documented new tensor.ci/tensor.arange and tile.ci entries; noted dtype constraints and innermost-dim != 1.
IR Operators (C++)
src/ir/op/tensor_ops/memory.cpp, src/ir/op/tile_ops/memory.cpp
Registered tensor.ci and tile.ci with DeduceTensorCiType / DeduceTileCiType enforcing (start, shape) signature, dtype {INT16, INT32}, shape validation, and innermost-dim ≠ 1 constraint.
Python IR Wrappers
python/pypto/ir/op/tensor_ops.py, python/pypto/ir/op/tile_ops.py
Added ci(...) helpers (and arange alias) that normalize start to ConstInt, convert shape to MakeTuple, capture span, and emit IR Call with dtype/descending kwargs.
Python DSL Wrappers
python/pypto/language/op/tensor_ops.py, python/pypto/language/op/tile_ops.py
Exported ci and arange in __all__; DSL helpers unwrap Scalar input, normalize shape elements, and lower to _ir_ops.ci(...).
Backend Codegen
src/backend/common/pto_ops_common.cpp
MakeCiCodegenPTO updated to accept (start, shape) and emit ins(...)/outs(...) with conditional : <type> annotations.
Operator Conversion
src/ir/transforms/op_conversion_registry.cpp
Registered simple 1:1 conversion tensor.citile.ci.
IR Unit Tests
tests/ut/ir/operators/test_tensor_ops.py, tests/ut/ir/operators/test_tile_ops.py
Added tests validating call types, dtype handling, descending propagation, error cases (float/uint dtypes, innermost dim = 1, dtype mismatch), and alias arange.
Conversion Unit Tests
tests/ut/ir/transforms/test_convert_tensor_to_tile_ops.py
Added test asserting tensor.ci lowers to tile.ci and preserves descending.
System Tests
tests/st/runtime/test_ci.py
Added runtime tests for Ascend910B covering ascending/descending variants and arange aliases; expected outputs computed via torch.arange.

Sequence Diagram

sequenceDiagram
    participant User as User Code
    participant DSL as DSL Layer
    participant IR as IR Layer
    participant TypeDeduction as Type Deduction
    participant Conversion as Conversion Pass
    participant Codegen as Backend Codegen
    participant HW as PTO Hardware

    User->>DSL: pl.tensor.ci(start, shape, dtype, descending)
    DSL->>IR: emit _ir_ops.ci(start, shape, dtype, descending)
    IR->>TypeDeduction: deduce tensor/tile type (validate dtype, shape, start)
    TypeDeduction->>IR: return TensorType/TileType
    IR->>Conversion: runs convert_tensor_to_tile_ops
    Conversion->>IR: replace tensor.ci with tile.ci
    IR->>Codegen: lower tile.ci -> pto.tci emission
    Codegen->>HW: emit pto.tci (with optional type annotations)
    HW->>HW: fill output with contiguous integer sequence
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Suggested reviewers

  • lyfne123

Poem

🐰 Hop, hop, integers in a line,

start to end, ascending fine,
or flip the sign and tumble down—
CI fills tiles without a frown,
arange and ci, together shine.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 32.20% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding tile.ci and tensor.ci operators with an arange alias, and directly references the issue #1095.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the summary, usage, constraints, and test results.
Linked Issues check ✅ Passed The PR fully addresses all coding objectives from issue #1095: C++ IR registrations for tile.ci and tensor.ci [#1095], Python IR/DSL wrappers with arange alias [#1095], tensor-to-tile conversion [#1095], backend type annotations update [#1095], comprehensive tests [#1095], and documentation updates [#1095].
Out of Scope Changes check ✅ Passed All changes are in-scope for implementing TCI operators per issue #1095: IR registration, IR/DSL wrappers, conversion, backend codegen tweaks, documentation, and tests. No unrelated changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the ci and arange operators for generating contiguous integer sequences at both the tensor and tile levels. The changes include IR operator registration, Python language bindings, backend codegen updates, and extensive testing. Feedback was provided to strictly enforce that leading dimensions are 1 in the shape deduction logic to align with ISA constraints and prevent uninitialized data in multi-row tiles.

Comment thread src/ir/op/tensor_ops/memory.cpp
Comment thread src/ir/op/tile_ops/memory.cpp
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
docs/en/dev/ir/05-operators.md (1)

222-261: ⚠️ Potential issue | 🟡 Minor

Clarify the arange alias and the first-row tile.ci limitation.

This IR doc should avoid implying tensor.arange is a registered IR op, and the tile.ci entry should mention that hardware only populates row 0; otherwise users may expect multi-row shapes to be fully initialized.

Proposed wording update
-**Operations:** `tensor.add/sub/mul/div` (element-wise with full N-D broadcasting), `tensor.set_validshape` (internal, update valid-shape metadata without data movement — compiler-generated only), `tensor.sort32` / `tensor.mrgsort_format1` / `tensor.mrgsort_format2` (sorting; tensor-level counterparts of `tile.sort32` / `tile.mrgsort` — converted to tile ops by `ConvertTensorToTileOps`), `tensor.gather` (per-dim indexing; MVP supports rank-2 inputs with `dim=-1` and lowers to a per-row `tile.gather` loop via `ConvertTensorToTileOps`), `tensor.gather_mask` (mask-pattern gather; tensor-level counterpart of `tile.gather_mask`, with optional same-bit-width `output_dtype`), `tensor.ci` / `tensor.arange` (contiguous integer sequence generation; lowers to `tile.ci`)
+**Operations:** `tensor.add/sub/mul/div` (element-wise with full N-D broadcasting), `tensor.set_validshape` (internal, update valid-shape metadata without data movement — compiler-generated only), `tensor.sort32` / `tensor.mrgsort_format1` / `tensor.mrgsort_format2` (sorting; tensor-level counterparts of `tile.sort32` / `tile.mrgsort` — converted to tile ops by `ConvertTensorToTileOps`), `tensor.gather` (per-dim indexing; MVP supports rank-2 inputs with `dim=-1` and lowers to a per-row `tile.gather` loop via `ConvertTensorToTileOps`), `tensor.gather_mask` (mask-pattern gather; tensor-level counterpart of `tile.gather_mask`, with optional same-bit-width `output_dtype`), `tensor.ci` (contiguous integer sequence generation; exposed by Python APIs with an `arange` alias; lowers to `tile.ci`)
...
-| - | `tile.ci` | Generate contiguous integer sequence (start + k / start - k); dtype ∈ {INT16, INT32}; innermost dim != 1 |
+| - | `tile.ci` | Generate contiguous integer sequence (start + k / start - k); dtype ∈ {INT16, INT32}; innermost dim != 1; hardware populates only row 0, so prefer shape `[1, N]` |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/en/dev/ir/05-operators.md` around lines 222 - 261, Update the docs to
clarify that tensor.arange is just a Python alias (not a registered IR op) and
that tensor.ci/arange lower to tile.ci via ConvertTensorToTileOps; also amend
the tile.ci entry to state the hardware limitation that tile.ci only populates
the first/row-0 of multi-row tiles (other rows remain uninitialized) so users
shouldn't expect full multi-row initialization.
docs/zh-cn/dev/ir/05-operators.md (1)

219-258: ⚠️ Potential issue | 🟡 Minor

澄清 arange 是别名,并补充 tile.ci 只写第 0 行的限制。

当前写法容易让读者以为存在已注册的 tensor.arange IR 算子;同时 tile.ci 的硬件行为未说明,用户可能误以为多行 shape 会全部被初始化。

建议文案调整
-**操作:** `tensor.add/sub/mul/div`(逐元素,支持完整 N 维广播),`tensor.set_validshape`(内部 API,更新 valid_shape 元数据,不搬移数据 — 仅供编译器生成代码使用),`tensor.sort32` / `tensor.mrgsort_format1` / `tensor.mrgsort_format2`(排序;分别对应 `tile.sort32` / `tile.mrgsort` 的 tensor 层接口,由 `ConvertTensorToTileOps` 转换为 tile 操作),`tensor.gather`(按维索引;MVP 仅支持 2D 输入 + `dim=-1`,由 `ConvertTensorToTileOps` 按行展开为 `tile.gather` 循环),`tensor.gather_mask`(掩码模式选择;对应 `tile.gather_mask`,支持可选同位宽 `output_dtype`),`tensor.ci` / `tensor.arange`(生成连续整数序列,下层降到 `tile.ci`)
+**操作:** `tensor.add/sub/mul/div`(逐元素,支持完整 N 维广播),`tensor.set_validshape`(内部 API,更新 valid_shape 元数据,不搬移数据 — 仅供编译器生成代码使用),`tensor.sort32` / `tensor.mrgsort_format1` / `tensor.mrgsort_format2`(排序;分别对应 `tile.sort32` / `tile.mrgsort` 的 tensor 层接口,由 `ConvertTensorToTileOps` 转换为 tile 操作),`tensor.gather`(按维索引;MVP 仅支持 2D 输入 + `dim=-1`,由 `ConvertTensorToTileOps` 按行展开为 `tile.gather` 循环),`tensor.gather_mask`(掩码模式选择;对应 `tile.gather_mask`,支持可选同位宽 `output_dtype`),`tensor.ci`(生成连续整数序列;Python API 提供 `arange` 别名;下层降到 `tile.ci`)
...
-| - | `tile.ci` | 生成连续整数序列(升序 start+k 或降序 start-k);dtype ∈ {INT16, INT32};最内维 != 1 |
+| - | `tile.ci` | 生成连续整数序列(升序 start+k 或降序 start-k);dtype ∈ {INT16, INT32};最内维 != 1;硬件只填充第 0 行,通常应使用 shape `[1, N]` |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/zh-cn/dev/ir/05-operators.md` around lines 219 - 258, Update the docs to
clarify that tensor.arange is just an alias (not a separately registered IR op)
for tensor.ci and explicitly state that tile.ci writes only the first row when
the TileType has multiple rows (i.e., it initializes row 0 only, other rows
remain unchanged); edit the descriptions for tensor.arange / tensor.ci and the
tile.ci entry to mention the alias relationship, the single-row write behavior,
the dtype constraints (INT16/INT32) and the requirement that the innermost
dimension != 1 so readers won't assume full-matrix initialization.
🧹 Nitpick comments (3)
tests/ut/ir/operators/test_tensor_ops.py (1)

2124-2130: Assert the exact CI output shape.

This test currently verifies only rank, so a regression that returns the wrong dimensions would still pass.

🧪 Proposed test strengthening
     def test_tensor_ci_ascending(self):
         call = tensor.ci(0, [4, 32], dtype=DataType.INT32)
         t = call.type
         assert isinstance(t, ir.TensorType)
         assert t.dtype == DataType.INT32
         assert len(t.shape) == 2
+        assert [dim.value for dim in t.shape] == [4, 32]
         assert "tensor.ci" in str(call)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ut/ir/operators/test_tensor_ops.py` around lines 2124 - 2130, The
test_tensor_ci_ascending test only checks tensor rank; update it to assert the
exact CI output shape equals [4,32]. Locate the test function
test_tensor_ci_ascending and the call created via tensor.ci(0, [4, 32],
dtype=DataType.INT32), then replace or add an assertion that call.type.shape (or
t.shape) equals the expected tuple/list [4, 32] (or (4, 32) depending on type)
in addition to the existing dtype and rank checks.
tests/ut/ir/transforms/test_convert_tensor_to_tile_ops.py (1)

2513-2520: Use a single-row ci shape in this conversion test.

Line 2514 uses [4, 32], but pto.tci only populates the first row; keeping the fixture at [1, 32] avoids encoding a runtime-unsafe pattern in tests.

Proposed test fixture adjustment
-            def main_incore_0(self, x: pl.Tensor[[4, 32], pl.INT32]) -> pl.Tensor[[4, 32], pl.INT32]:
-                idx: pl.Tensor[[4, 32], pl.INT32] = pl.tensor.ci(0, [4, 32], dtype=pl.INT32, descending=True)
-                y: pl.Tensor[[4, 32], pl.INT32] = pl.add(idx, x)
+            def main_incore_0(self, x: pl.Tensor[[1, 32], pl.INT32]) -> pl.Tensor[[1, 32], pl.INT32]:
+                idx: pl.Tensor[[1, 32], pl.INT32] = pl.tensor.ci(0, [1, 32], dtype=pl.INT32, descending=True)
+                y: pl.Tensor[[1, 32], pl.INT32] = pl.add(idx, x)
                 return y
 
             `@pl.function`
-            def main(self, x: pl.Tensor[[4, 32], pl.INT32]) -> pl.Tensor[[4, 32], pl.INT32]:
-                y: pl.Tensor[[4, 32], pl.INT32] = self.main_incore_0(x)
+            def main(self, x: pl.Tensor[[1, 32], pl.INT32]) -> pl.Tensor[[1, 32], pl.INT32]:
+                y: pl.Tensor[[1, 32], pl.INT32] = self.main_incore_0(x)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ut/ir/transforms/test_convert_tensor_to_tile_ops.py` around lines 2513
- 2520, The test encodes a runtime-unsafe pattern by creating a ci tensor with
shape [4, 32] even though pto.tci only populates the first row; update the
fixture in the test function main_incore_0 to use a single-row ci shape (change
the pl.tensor.ci shape from [4, 32] to [1, 32]) so the test matches pto.tci
behavior and avoids encoding multiple-row expectations; ensure pl.tensor.ci(...)
in main_incore_0 and any corresponding type annotations or usages in main
reflect the single-row [1, 32] shape.
tests/st/runtime/test_ci.py (1)

35-168: Add one runtime case through pl.tensor.ci or pl.arange.

All ST cases call pl.tile.ci directly, so the new tensor.ci -> tile.ci conversion and public alias path are not exercised end-to-end on Ascend910B. A minimal tensor-level case would catch conversion/type-shape mismatches before backend codegen.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/st/runtime/test_ci.py` around lines 35 - 168, Existing tests only
exercise pl.tile.ci; add a new minimal runtime case that constructs the sequence
via the tensor-level API (use pl.tensor.ci or pl.arange) so the tensor->tile
conversion and public alias path are exercised end-to-end on Ascend910B. Create
a new program class (patterned after
CiAscendStart0Program/CiAscendStart10Program) whose kernel calls pl.tensor.ci
(or pl.arange) to produce seq and stores it to output, add a matching TestCase
(like CiAscendStart0TestCase) with compute_expected using torch.arange, and add
a corresponding test method in TestCi to run it (mirroring
test_ci_ascend_start0).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ir/op/tensor_ops/memory.cpp`:
- Around line 436-454: The tensor.ci shape handling currently allows runtime
tuple shapes and element-wise dynamic shapes but tile.ci lowering requires
static shape metadata and only fills the first row; update the validation around
args[1] (the shape) so that tensor.ci accepts only a MakeTuple of compile-time
integers: require that As<MakeTuple>(args[1]) is present and that every element
in make_tuple->elements_ is a ConstInt (reuse As<ConstInt>), otherwise
CHECK-fail with a clear message referencing tensor.ci -> tile.ci lowering; keep
the existing innermost-dim != 1 check and return
std::make_shared<TensorType>(shape, dtype) only after these static-shape checks.

In `@src/ir/op/tile_ops/memory.cpp`:
- Around line 424-425: The current call to GetKwarg<bool>(kwargs, "descending")
throws when "descending" is omitted; change it to a non-throwing lookup that
returns the documented default false when the key is missing (e.g., use the
GetKwarg/GetKwargOrDefault/GetOptionalKwarg API variant provided by the project
to read "descending" from kwargs as a bool with default=false), keeping the same
variable/parameter names (kwargs and "descending") and preserving type bool so
callers that omit descending get ascending behavior.
- Around line 418-429: Reject or clamp any claimed valid rows that exceed what
pto.tci actually writes: inspect the leading dimension of tile_shape (use
As<ConstInt>(tile_shape.front()) or similar), and if it is missing or its value
> 1 (or the backend-supported row count), either CHECK-fail with a clear message
referencing op_name and the invalid leading dim, or set tile_view.valid_shape[0]
to the backend-supported row count (e.g., 1) before constructing TileType;
ensure you update only the TileView.valid_shape (not the reported tile_shape) so
the returned TileType does not advertise more valid rows than pto.tci writes.

In `@tests/ut/ir/operators/test_tile_ops.py`:
- Line 2579: The test uses pytest.raises(ValueError, match="start.*dtype") with
a non-raw regex string which triggers Ruff RUF043; change the match argument to
a raw string literal (r"start.*dtype") in the test (the pytest.raises call in
tests/ut/ir/operators/test_tile_ops.py) so regex metacharacters are not
interpreted by the linter.

---

Outside diff comments:
In `@docs/en/dev/ir/05-operators.md`:
- Around line 222-261: Update the docs to clarify that tensor.arange is just a
Python alias (not a registered IR op) and that tensor.ci/arange lower to tile.ci
via ConvertTensorToTileOps; also amend the tile.ci entry to state the hardware
limitation that tile.ci only populates the first/row-0 of multi-row tiles (other
rows remain uninitialized) so users shouldn't expect full multi-row
initialization.

In `@docs/zh-cn/dev/ir/05-operators.md`:
- Around line 219-258: Update the docs to clarify that tensor.arange is just an
alias (not a separately registered IR op) for tensor.ci and explicitly state
that tile.ci writes only the first row when the TileType has multiple rows
(i.e., it initializes row 0 only, other rows remain unchanged); edit the
descriptions for tensor.arange / tensor.ci and the tile.ci entry to mention the
alias relationship, the single-row write behavior, the dtype constraints
(INT16/INT32) and the requirement that the innermost dimension != 1 so readers
won't assume full-matrix initialization.

---

Nitpick comments:
In `@tests/st/runtime/test_ci.py`:
- Around line 35-168: Existing tests only exercise pl.tile.ci; add a new minimal
runtime case that constructs the sequence via the tensor-level API (use
pl.tensor.ci or pl.arange) so the tensor->tile conversion and public alias path
are exercised end-to-end on Ascend910B. Create a new program class (patterned
after CiAscendStart0Program/CiAscendStart10Program) whose kernel calls
pl.tensor.ci (or pl.arange) to produce seq and stores it to output, add a
matching TestCase (like CiAscendStart0TestCase) with compute_expected using
torch.arange, and add a corresponding test method in TestCi to run it (mirroring
test_ci_ascend_start0).

In `@tests/ut/ir/operators/test_tensor_ops.py`:
- Around line 2124-2130: The test_tensor_ci_ascending test only checks tensor
rank; update it to assert the exact CI output shape equals [4,32]. Locate the
test function test_tensor_ci_ascending and the call created via tensor.ci(0, [4,
32], dtype=DataType.INT32), then replace or add an assertion that
call.type.shape (or t.shape) equals the expected tuple/list [4, 32] (or (4, 32)
depending on type) in addition to the existing dtype and rank checks.

In `@tests/ut/ir/transforms/test_convert_tensor_to_tile_ops.py`:
- Around line 2513-2520: The test encodes a runtime-unsafe pattern by creating a
ci tensor with shape [4, 32] even though pto.tci only populates the first row;
update the fixture in the test function main_incore_0 to use a single-row ci
shape (change the pl.tensor.ci shape from [4, 32] to [1, 32]) so the test
matches pto.tci behavior and avoids encoding multiple-row expectations; ensure
pl.tensor.ci(...) in main_incore_0 and any corresponding type annotations or
usages in main reflect the single-row [1, 32] shape.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 997a7bfa-72ae-4d91-b4e6-8e48492e699d

📥 Commits

Reviewing files that changed from the base of the PR and between d7a0377 and 2a81a18.

📒 Files selected for processing (14)
  • docs/en/dev/ir/05-operators.md
  • docs/zh-cn/dev/ir/05-operators.md
  • python/pypto/ir/op/tensor_ops.py
  • python/pypto/ir/op/tile_ops.py
  • python/pypto/language/op/tensor_ops.py
  • python/pypto/language/op/tile_ops.py
  • src/backend/common/pto_ops_common.cpp
  • src/ir/op/tensor_ops/memory.cpp
  • src/ir/op/tile_ops/memory.cpp
  • src/ir/transforms/op_conversion_registry.cpp
  • tests/st/runtime/test_ci.py
  • tests/ut/ir/operators/test_tensor_ops.py
  • tests/ut/ir/operators/test_tile_ops.py
  • tests/ut/ir/transforms/test_convert_tensor_to_tile_ops.py

Comment thread src/ir/op/tensor_ops/memory.cpp
Comment thread src/ir/op/tile_ops/memory.cpp
Comment thread src/ir/op/tile_ops/memory.cpp Outdated
Comment thread tests/ut/ir/operators/test_tile_ops.py Outdated
- Expose arange = ci in python/pypto/ir/op/{tile,tensor}_ops.py so
  pl.{tile,tensor}.arange is resolvable by the AST parser (DSL-only
  alias wasn't visible to _dispatch_op).
- Add ST cases for pl.tensor.ci (ascend/descend) and alias variants
  (pl.tile.arange ascend/descend, pl.tensor.arange). Tensor-level
  programs follow the Opaque + pl.at(CORE_GROUP) + pl.assemble pattern
  used in test_sort.py.
- Add UT alias-identity checks in test_tile_ops.py / test_tensor_ops.py.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tests/ut/ir/operators/test_tile_ops.py (1)

2579-2579: ⚠️ Potential issue | 🟡 Minor

Use a raw regex string for pytest.raises(..., match=...).

Line 2579 still triggers Ruff RUF043 because .* is a regex metacharacter in a non-raw string.

Proposed fix
-        with pytest.raises(ValueError, match="start.*dtype"):
+        with pytest.raises(ValueError, match=r"start.*dtype"):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ut/ir/operators/test_tile_ops.py` at line 2579, The test uses a non-raw
regex in the pytest.raises call which triggers Ruff RUF043; update the
pytest.raises(..., match="start.*dtype") invocation to use a raw string literal
for the pattern (e.g., change the match argument to r"start.*dtype") so
backslashes and regex metacharacters are interpreted correctly; locate the
pytest.raises usage in tests/ut/ir/operators/test_tile_ops.py and replace the
quoted pattern with a raw string.
🧹 Nitpick comments (1)
tests/st/runtime/test_ci.py (1)

44-47: Add one INT16 runtime case to cover the full supported dtype set.

All ST cases exercise INT32 only, but this PR explicitly supports both INT16 and INT32. A single INT16 ascending case would catch dtype-specific constant typing or PTO lowering regressions.

Also applies to: 207-214

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/st/runtime/test_ci.py` around lines 44 - 47, Add a new runtime test
case that mirrors the existing INT32 ascending case but uses dtype=pl.INT16 so
the full supported dtype set is covered; duplicate the existing sequence
creation and store logic (the pl.tile.ci(...) call that produces seq and the
pl.store(..., output_tensor=output) that writes out) and change the function
signature/output typing to pl.INT16 (and pl.Tensor[[ROWS, COLS], pl.INT16] for
output/seq) for that case; also add the same INT16 variant in the other location
referenced (the block around the existing case at the later 207-214 region) so
both runtime spots exercise INT16.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/pypto/ir/op/tile_ops.py`:
- Around line 491-498: The docstring for tile.ci is misleading about "valid
elements" implying all rows*cols; update the docstring near tile.ci to state
explicitly that tile.ci generates the sequence only across the first row (column
index varying), so the sequence length equals the number of valid columns (not
rows*cols) and the practical destination tile shape is [1, N] (one row by N
columns). Mention that start may be an int or scalar Expr and dtype constraints
remain the same, but clarify that the stride/ordering applies across columns of
the first row only.

---

Duplicate comments:
In `@tests/ut/ir/operators/test_tile_ops.py`:
- Line 2579: The test uses a non-raw regex in the pytest.raises call which
triggers Ruff RUF043; update the pytest.raises(..., match="start.*dtype")
invocation to use a raw string literal for the pattern (e.g., change the match
argument to r"start.*dtype") so backslashes and regex metacharacters are
interpreted correctly; locate the pytest.raises usage in
tests/ut/ir/operators/test_tile_ops.py and replace the quoted pattern with a raw
string.

---

Nitpick comments:
In `@tests/st/runtime/test_ci.py`:
- Around line 44-47: Add a new runtime test case that mirrors the existing INT32
ascending case but uses dtype=pl.INT16 so the full supported dtype set is
covered; duplicate the existing sequence creation and store logic (the
pl.tile.ci(...) call that produces seq and the pl.store(...,
output_tensor=output) that writes out) and change the function signature/output
typing to pl.INT16 (and pl.Tensor[[ROWS, COLS], pl.INT16] for output/seq) for
that case; also add the same INT16 variant in the other location referenced (the
block around the existing case at the later 207-214 region) so both runtime
spots exercise INT16.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ee8c6053-872c-45ed-a026-249b070edc26

📥 Commits

Reviewing files that changed from the base of the PR and between 2a81a18 and ae07106.

📒 Files selected for processing (5)
  • python/pypto/ir/op/tensor_ops.py
  • python/pypto/ir/op/tile_ops.py
  • tests/st/runtime/test_ci.py
  • tests/ut/ir/operators/test_tensor_ops.py
  • tests/ut/ir/operators/test_tile_ops.py
✅ Files skipped from review due to trivial changes (1)
  • tests/ut/ir/operators/test_tensor_ops.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • python/pypto/ir/op/tensor_ops.py

Comment thread python/pypto/ir/op/tile_ops.py Outdated
Youhezhen added 2 commits April 21, 2026 18:33
- Enforce leading dims == 1 in tile.ci / tensor.ci deduce (pto.tci
  only populates the first row; reject multi-row shapes).
- Give GetKwarg("descending") a false default so omitted kwarg works.
- Clarify tile.ci / tensor.ci docstrings about single-row semantics.
- Use raw regex for pytest.raises(match=...) (RUF043).
- Update UT/ST shapes to [1, N]; add multi-row rejection tests.
Widen dtype whitelist for tile.ci / tensor.ci to accept UINT16 and
UINT32 alongside INT16 / INT32. Add INT16 / UINT16 mappings in the
PTO MLIR dtype helper, and emit unsigned integer constants via a
signless arith.constant + unrealized_conversion_cast bridge so
arith.constant's signless-type requirement is respected while the
use-site type matches the declared dtype. Extend UT whitelist checks
and add UINT16 / UINT32 ascending ST cases.

Note: PTOAS currently lowers pto.tci's unsigned scalar operand to a
signed template parameter (tracked upstream in PTOAS#538); the pypto
emission is consistent but runtime correctness depends on that fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[New Op] Add TCI (tile.ci / tensor.ci) — contiguous integer sequence generation

1 participant