refactor(ir): add backward memory-space demand inference by Hzfengsy · Pull Request #1130 · hw-native-sys/pypto

Hzfengsy · 2026-04-22T16:51:30Z

Summary

Introduces a single metadata-driven mechanism for back-propagating consumer memory-space demand, so chains like slice → alias → matmul compile as pure-AIC without routing loads through Vec. Concretely fixes the qwen3 decode MLP-down pattern where incore_16 was being incorrectly expanded into an AIC + empty-AIV pair because tile.load defaulted to Vec and combined with matmul produced a MIXED core-affinity classification.

Key pieces:

OpRegistry — new OutputMemoryInheritsInput() and HasRetargetableMemoryKwarg() predicates backed by OpMemorySpaceSpec::output_inherits_input. set_output_memory_from_kwarg now accepts std::nullopt default, marking an op as retargetable with no fallback.
InferTileMemorySpace — Phase-0 DemandCollector records per-op input_constraints and back-propagates through OutputMemoryInheritsInput() ops + parser-elided SSA aliases. Phase 1 handles y = x alias statements and consults demand for retargetable producers when the kwarg is absent (never overrides an explicit kwarg). Phase 3 rewrites target_memory on any retargetable op whose resolved space differs (generalizes today's tile.create-only rewrite).
ConvertTensorToTileOps — ConsumerSpaceCollector::PropagateThroughInheritInputOps back-propagates at the tensor layer through inherit-input Calls and SSA aliases, so tensor.slice → tensor.matmul emits tile.load(target=Mat) directly.
tile.load / tile.create — become retargetable: target_memory kwarg is optional, DeduceTileLoadType tolerates its absence, and InferTileMemorySpace resolves it from consumer demand.
Shared helper — init_memref and memory_reuse_pass delegate the view-op predicate to OutputMemoryInheritsInput(), replacing the brittle deduce_output_memory({}).has_value() probe.
Python binding / stub — get_op_memory_spec distinguishes "inherit_from_input" from "deferred" (retargetable-no-kwarg).

Propagation is a single reverse-order sweep over edges captured in program order — O(N).

Test plan

New DSL Before/Expected tests in test_convert_tensor_to_tile_ops.py for slice → alias → matmul and transitive alias chains — exact qwen3 MLP-down pattern.
New DSL Before/Expected tests in test_infer_tile_memory_space.py for Phase-1 SSA-alias memory-space inheritance (single and transitive).
Updated test_op_registry.py expectations: tile.load / tile.create report "deferred" when kwarg absent.
Full IR test suite green: 4009 passing, 17 skipped, no regressions.
clang-tidy clean on changed C++.
Verified on qwen3.py end-to-end: incore_16 is now FunctionType.AIC with both loads on Mem.Mat and tile.move(Mat→Left/Right) inserted before matmul (no AIV shell, no Group wrapper, no dual_aiv_dispatch attr, no injected get_subblock_idx).

Scope

Intentionally does not include the fillpad-related changes from an earlier WIP on the same direction (which had an unresolved TFILLPAD-on-Mat codegen issue). tile.fillpad / tile.fillpad_inplace / tensor.fillpad keep their current Vec-only registrations; enabling them as inherit-input is a separate follow-up.

End-to-end qwen3 compilation now surfaces a separate, unrelated codegen issue in the Spmd → AIC orchestration path (orchestration_codegen.cpp:661: "inner call arg does not map to any wrapper parameter") when the Spmd wrapper calls an AIC function with a locally-created tensor argument. Previously masked because the broken classification routed through a Group wrapper. Not fixed here.

Introduce a single metadata-driven mechanism that back-propagates consumer memory-space demand so chains like slice→alias→matmul compile as pure-AIC without routing loads through Vec. OpRegistry gains an explicit OutputMemoryInheritsInput() predicate backed by OpMemorySpaceSpec::output_inherits_input, and HasRetargetableMemoryKwarg() for producers with a writable target_memory kwarg. set_output_memory_from_kwarg now accepts std::nullopt as default, marking the op as retargetable with no fallback. InferTileMemorySpace gains a Phase-0 DemandCollector that records per-op input_constraints and back-propagates demand through any op whose OutputMemoryInheritsInput() is true plus SSA aliases elided by the parser. Phase 1 handles plain SSA alias assignments and consults demand for retargetable producers when their kwarg is absent; Phase 3 rewrites target_memory on any retargetable op whose resolved space differs (not just tile.create). ConsumerSpaceCollector in convert_tensor_to_tile_ops does the same at the tensor layer. Propagation is a single reverse-order sweep over edges captured in program order — O(N). tile.load / tile.create become retargetable: target_memory kwarg is now optional, DeduceTileLoadType tolerates its absence, and InferTileMemorySpace resolves it from consumer demand when absent. init_memref and memory_reuse_pass delegate their view-op predicate to the shared OutputMemoryInheritsInput() helper, replacing the brittle deduce_output_memory({}).has_value() probe. Python binding and type stub distinguish "inherit_from_input" (view ops) from "deferred" (retargetable-no-kwarg) in get_op_memory_spec. Test expectations for tile.load / tile.create specs updated to "deferred". Adds DSL Before/Expected tests in test_convert_tensor_to_tile_ops.py and test_infer_tile_memory_space.py covering slice→alias→matmul and transitive alias chains — the qwen3 MLP-down pattern that motivated the refactor.

coderabbitai · 2026-04-22T16:51:45Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Extends op-registry memory-space semantics with an output-inherits-input flag and true retargetable (deferred) kwarg behavior; propagates consumer demands backward through inherit-input and SSA-alias chains; updates inference and conversion passes to use deferred/resolved spaces and rewrite target_memory accordingly.

Changes

Cohort / File(s)	Summary
OpRegistry Core `include/pypto/ir/op_registry.h`	Add `output_inherits_input` to `OpMemorySpaceSpec`; make `set_output_memory_from_kwarg` accept optional default (`std::nullopt`); add `OutputMemoryInheritsInput()` and `HasRetargetableMemoryKwarg()` query methods.
Python Bindings & Type Stubs `python/bindings/modules/ir.cpp`, `python/pypto/pypto_core/ir.pyi`	Report three output-memory states in `ir.get_op_memory_spec`: `MemorySpace`, `"inherit_from_input"`, `"deferred"`, or `None`; update docstring and binding logic to reflect deferred/inhertiance cases.
Tile Ops Memory Logic `src/ir/op/tile_ops/memory.cpp`	Treat `target_memory` as optional in `DeduceTileLoadType`; defer layout/transpose decisions when kwarg absent; remove default `Vec` fallback in `tile.create`/`tile.load` registrations.
Convert Tensor→Tile Pass `src/ir/transforms/convert_tensor_to_tile_ops_pass.cpp`	Add `PropagateThroughInheritInputOps()` and record propagation edges for SSA aliases and ops whose outputs inherit input, enabling backward propagation of consumer requirements prior to conversion.
Infer Tile Memory Space Pass `src/ir/transforms/infer_tile_memory_space_pass.cpp`	Add Phase 0 demand pre-collection and backward propagation through inherit-input chains; consult backward-demanded spaces when resolvers return `nullopt`; support SSA-alias inheritance and rewrite `target_memory` for all retargetable ops.
Init MemRef Pass `src/ir/transforms/init_memref.cpp`	Refactor `IsViewOperation` to use `OutputMemoryInheritsInput()` registry predicate rather than ad-hoc memory-spec inspection.
Memory Reuse Pass `src/ir/transforms/memory_reuse_pass.cpp`	Centralize predicate by delegating `IsOutputMemoryInheritInput` to `OpRegistryEntry::OutputMemoryInheritsInput()`.
Unit Tests (registry) `tests/ut/ir/operators/test_op_registry.py`	Expect `tile.load`/`tile.create` `output_memory` to be `"deferred"` when `target_memory` kwarg is absent; update assertions/docstrings.
Conversion & Inference Tests `tests/ut/ir/transforms/test_convert_tensor_to_tile_ops.py`, `tests/ut/ir/transforms/test_infer_tile_memory_space.py`	Add tests for SSA-alias propagation and alias-chain memory-space inheritance (slice→alias→matmul patterns and alias chains), asserting Mat promotions and move insertion behavior.

Sequence Diagram

sequenceDiagram
    participant Convert as ConvertPass
    participant Registry as OpRegistry
    participant Infer as InferPass
    participant Mutator as Mutator

    Convert->>Registry: Query OutputMemoryInheritsInput()
    Registry-->>Convert: Inheritance flag
    Convert->>Convert: Pre-scan consumer requirements
    Convert->>Convert: Record propagation edges (SSA aliases, inherit-input ops)
    Convert->>Convert: Propagate requirements through edges

    Infer->>Registry: Query input_constraints for ops
    Registry-->>Infer: Input demand specs
    Infer->>Infer: Phase 0: collect backward demands
    Infer->>Registry: Query output memory resolver (deduce_output_memory)
    alt Resolver returns nullopt
        Infer->>Infer: Use backward-demanded space (if valid)
    else Resolver returns MemorySpace
        Infer->>Infer: Use resolved space
    end
    Infer->>Mutator: Resolved spaces per var
    Mutator->>Registry: Query HasRetargetableMemoryKwarg()
    Registry-->>Mutator: Retargetable flag
    Mutator->>Mutator: Rewrite `target_memory` kwarg when retargetable

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

feat(ir): add OpMemorySpaceSpec to op registry with auto-move insertion #501 — modifies op-registry memory-spec APIs and inference retargeting semantics (high overlap with registry changes).
refactor(ir): unify memory space requirements into OpConversionRegistry #974 — adjusts convert_tensor_to_tile_ops pipeline and consumer-driven propagation logic (overlaps convert-pass work).
fix(pass): use TileType memory_space in InitMemRef and rename tpop/tpush to tile.* #546 — changes init_memref and infer-pass behaviors related to view ops and memory initialization (overlaps inference/init logic).

Suggested reviewers

lyfne123

"🐰 I hopped through registry rows,
I learned which outputs follow their source,
Deferred demands I carry light,
I nudge targets into place just right,
Hooray—memory dances in order tonight!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 43.55% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'refactor(ir): add backward memory-space demand inference' clearly and specifically describes the main change—introducing backward propagation of memory-space demand in the IR layer.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset, explaining the motivation (qwen3 MLP-down pattern fix), key architectural pieces (OpRegistry, InferTileMemorySpace, ConvertTensorToTileOps), implementation details, test coverage, and verification results.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request enhances the memory space inference system by introducing retargetable producers and a backward demand propagation phase. Key changes include updating the OpRegistry to support deferred memory space resolution for operations like tile.load and tile.create, and implementing a DemandCollector in the InferTileMemorySpace pass to propagate consumer requirements through view-like operations and SSA aliases. The PR also refactors view-op detection across several passes to use a centralized predicate in the registry and adds comprehensive unit tests for the new propagation logic. I have no feedback to provide as no review comments were submitted.

Copilot

Pull request overview

Adds a metadata-driven, backward “consumer demand” mechanism to infer/retarget tile memory spaces so view/alias chains (e.g. slice → alias → matmul) can stay pure-AIC and avoid defaulting loads through Vec.

Changes:

Introduces OpRegistryEntry::OutputMemoryInheritsInput() and HasRetargetableMemoryKwarg() plus output_inherits_input spec flag to unify “view op” handling across passes.
Extends InferTileMemorySpace and ConvertTensorToTileOps to propagate memory-space requirements backward through inherit-input ops and SSA aliases, and to rewrite retargetable producers’ target_memory.
Updates tile.load / tile.create to allow missing target_memory (deferred resolution), and updates Python bindings/stubs + tests accordingly.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`include/pypto/ir/op_registry.h`	Adds memory-space spec flag + registry predicates for inherit-input and retargetable ops.
`src/ir/transforms/infer_tile_memory_space_pass.cpp`	Adds backward demand collection, SSA-alias propagation, and generalized `target_memory` rewriting.
`src/ir/transforms/convert_tensor_to_tile_ops_pass.cpp`	Back-propagates consumer space requirements through inherit-input ops and SSA aliases at tensor-conversion time.
`src/ir/op/tile_ops/memory.cpp`	Makes `target_memory` optional for `tile.load`/`tile.create`; updates type deduction accordingly.
`src/ir/transforms/init_memref.cpp`	Switches view-op detection to the shared registry predicate.
`src/ir/transforms/memory_reuse_pass.cpp`	Switches inherit-input detection to the shared registry predicate.
`python/bindings/modules/ir.cpp`	Exposes `"inherit_from_input"` vs `"deferred"` in `get_op_memory_spec`.
`python/pypto/pypto_core/ir.pyi`	Documents the new `"deferred"` output_memory value in the stub.
`tests/ut/ir/operators/test_op_registry.py`	Updates expectations for `"deferred"` on `tile.load`/`tile.create`.
`tests/ut/ir/transforms/test_infer_tile_memory_space.py`	Adds SSA-alias inheritance tests for `InferTileMemorySpace`.
`tests/ut/ir/transforms/test_convert_tensor_to_tile_ops.py`	Adds slice→alias→matmul tests ensuring Mat-directed loads are produced.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/ir/transforms/infer_tile_memory_space_pass.cpp (1)
458-503: ⚠️ Potential issue | 🟡 Minor

Phase 3 target_memory rewrite should use OpRegistry::Create to preserve type deduction semantics, or handle transpose-aware valid_shape swap.

The manual std::make_shared<Call>(...) construction bypasses DeduceTileLoadType, which applies a critical transpose-aware swap of valid_shape dimensions (lines 179–182 of src/ir/op/tile_ops/memory.cpp). GetImplicitTileView(shape, promoted) only populates valid_shape = shape without the swap.

Currently, this is unreachable: tile.load with transpose=true must commit to target_memory=Mat at construction time, and InferTileMemorySpace explicitly does not revisit transpose decisions, so Phase 3 retargeting should not be triggered for transposed loads. However, the code lacks a guard against this scenario. To be defensive, either route through OpRegistry::Create(call->op_->name_, call->args_, new_kwargs, call->span_) to re-run the registered deducer, or pre-compute the transpose swap on valid_shape before constructing the TileType manually if the transpose kwarg is present.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ir/transforms/infer_tile_memory_space_pass.cpp` around lines 458 - 503,
The manual construction of Call/TileType in InferTileMemorySpace (the Call
handling block that rewrites "target_memory") bypasses the registered deducer
(e.g., DeduceTileLoadType) and thus misses transpose-aware valid_shape swapping;
change the code to either (1) recreate the call via
OpRegistry::Create(call->op_->name_, call->args_, new_kwargs, call->span_) so
the op's deducer runs and produces the correct TileType, or (2) if you must
construct the TileType manually, detect a transpose=true kwarg on the original
call and apply the same valid_shape dimension swap that DeduceTileLoadType
performs before calling tile_view_semantics::GetImplicitTileView and building
the new TileType (update references: Call, OpRegistry::Create,
DeduceTileLoadType, GetImplicitTileView, TileType, and the "transpose" kwarg).

🧹 Nitpick comments (3)

src/ir/transforms/infer_tile_memory_space_pass.cpp (2)
190-199: Plain SSA-alias inheritance depends on src_var already being in var_memory_.

y = x is handled correctly when x was assigned earlier in the same analyzer walk (the common case covered by the new tests). However, if x is a ForStmt iter_arg whose inferred space is only back-filled in VisitStmt_(ForStmtPtr) after the loop body visit, then an alias y = x inside the body runs before the iter_arg's entry exists in var_memory_ and y gets no memory_space recorded at all — leaving the TileMemoryInferredVerifier to reject the function at pass exit.

Probably unreachable under current frontends (aliases of iter_args aren't emitted), but worth either a comment noting the assumption or a targeted fallback (e.g. consult the TileType::memory_space_ annotation on src_var as a last resort) for defensive robustness.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ir/transforms/infer_tile_memory_space_pass.cpp` around lines 190 - 199,
The plain SSA alias handling in the branch that matches As<Var>(op->value_)
assumes src_var exists in var_memory_; update it to defensively fallback when
src_it == var_memory_.end() by inspecting src_var's declared type (use
TileType::memory_space_ on src_var->type if it is a TileType) and set
var_memory_[op->var_] from that memory_space_; also mention (or add) a short
comment referencing VisitStmt_(ForStmtPtr) to explain why back-filling can occur
and why the fallback is needed.
69-71: Conflicting non-Vec demands resolve by first-wins — consider documenting or detecting.

ShouldOverrideDemand only upgrades Vec → any specialized space. When two consumers impose different specialized spaces on the same producer var (e.g. one path demands Left, another demands Acc), try_emplace silently keeps whichever was collected first, making the final space dependent on IR visit order.

Today this likely never occurs in practice (matmul inputs don't feed accumulators, etc.), but a defensive CHECK/diagnostic — or at least a comment spelling out the invariant — would help future maintainers avoid silently wrong retargeting if the invariant breaks.

Also applies to: 124-143
include/pypto/ir/op_registry.h (1)
322-342: Consider recording the retargetable kwarg key in the spec rather than hardcoding "target_memory".

set_output_memory_from_kwarg accepts an arbitrary kwarg_key, but HasRetargetableMemoryKwarg() hardcodes a check for "target_memory". If any future op registers with a different retargetable kwarg (e.g. "output_memory"), the resolver would still work but InferTileMemorySpace would silently skip retargeting it. Storing the kwarg key into OpMemorySpaceSpec when the resolver is installed would make the two consistent.
Sketch
 struct OpMemorySpaceSpec {
   ...
   OutputResolver deduce_output_memory;
+  /// Set by set_output_memory_from_kwarg — name of the kwarg whose value the
+  /// resolver reads. Empty when no kwarg-driven resolver was installed.
+  std::string retargetable_kwarg_key;
   ...
 };
 [[nodiscard]] bool HasRetargetableMemoryKwarg() const {
   if (!memory_spec_.has_value() || !memory_spec_->deduce_output_memory) return false;
   if (memory_spec_->output_inherits_input) return false;
-  return op_ && op_->HasAttr("target_memory");
+  const auto& key = memory_spec_->retargetable_kwarg_key;
+  return !key.empty() && op_ && op_->HasAttr(key);
 }
Also applies to: 399-403
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/pypto/ir/op_registry.h` around lines 322 - 342, The
deduce_output_memory lambda is installed by set_output_memory_from_kwarg but the
kwarg key isn't recorded in OpMemorySpaceSpec, while
HasRetargetableMemoryKwarg() still checks the hardcoded "target_memory"; add a
field to OpMemorySpaceSpec (e.g., retargetable_kwarg or
retargetable_output_kwarg), populate it inside set_output_memory_from_kwarg
(using memory_spec_), and change HasRetargetableMemoryKwarg() to check that spec
field instead of the fixed string; apply the same change for the analogous
input/other-kwarg setter at the other location (lines ~399-403) so the resolver
and the retargeting check stay consistent.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/ir/transforms/infer_tile_memory_space_pass.cpp`:
- Around line 458-503: The manual construction of Call/TileType in
InferTileMemorySpace (the Call handling block that rewrites "target_memory")
bypasses the registered deducer (e.g., DeduceTileLoadType) and thus misses
transpose-aware valid_shape swapping; change the code to either (1) recreate the
call via OpRegistry::Create(call->op_->name_, call->args_, new_kwargs,
call->span_) so the op's deducer runs and produces the correct TileType, or (2)
if you must construct the TileType manually, detect a transpose=true kwarg on
the original call and apply the same valid_shape dimension swap that
DeduceTileLoadType performs before calling
tile_view_semantics::GetImplicitTileView and building the new TileType (update
references: Call, OpRegistry::Create, DeduceTileLoadType, GetImplicitTileView,
TileType, and the "transpose" kwarg).

---

Nitpick comments:
In `@include/pypto/ir/op_registry.h`:
- Around line 322-342: The deduce_output_memory lambda is installed by
set_output_memory_from_kwarg but the kwarg key isn't recorded in
OpMemorySpaceSpec, while HasRetargetableMemoryKwarg() still checks the hardcoded
"target_memory"; add a field to OpMemorySpaceSpec (e.g., retargetable_kwarg or
retargetable_output_kwarg), populate it inside set_output_memory_from_kwarg
(using memory_spec_), and change HasRetargetableMemoryKwarg() to check that spec
field instead of the fixed string; apply the same change for the analogous
input/other-kwarg setter at the other location (lines ~399-403) so the resolver
and the retargeting check stay consistent.

In `@src/ir/transforms/infer_tile_memory_space_pass.cpp`:
- Around line 190-199: The plain SSA alias handling in the branch that matches
As<Var>(op->value_) assumes src_var exists in var_memory_; update it to
defensively fallback when src_it == var_memory_.end() by inspecting src_var's
declared type (use TileType::memory_space_ on src_var->type if it is a TileType)
and set var_memory_[op->var_] from that memory_space_; also mention (or add) a
short comment referencing VisitStmt_(ForStmtPtr) to explain why back-filling can
occur and why the fallback is needed.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 33761b61-7ec4-4e96-a771-e97fc5d48013

📥 Commits

Reviewing files that changed from the base of the PR and between c53dac0 and 5d727ac.

📒 Files selected for processing (11)

include/pypto/ir/op_registry.h
python/bindings/modules/ir.cpp
python/pypto/pypto_core/ir.pyi
src/ir/op/tile_ops/memory.cpp
src/ir/transforms/convert_tensor_to_tile_ops_pass.cpp
src/ir/transforms/infer_tile_memory_space_pass.cpp
src/ir/transforms/init_memref.cpp
src/ir/transforms/memory_reuse_pass.cpp
tests/ut/ir/operators/test_op_registry.py
tests/ut/ir/transforms/test_convert_tensor_to_tile_ops.py
tests/ut/ir/transforms/test_infer_tile_memory_space.py

- Tighten HasRetargetableMemoryKwarg() to require the resolver to actually defer (return nullopt for empty kwargs). Keeps tile.move (Vec default) out of the "retargetable" bucket so Phase 3 Mutator can't accidentally rewrite its target_memory kwarg. - In Phase 1 InferFromOp, clamp backward-collected demand for retargetable producers to {Vec, Mat}. DDR-facing tile.load can only directly produce {Vec, Mat}; specialized consumer demands (Left/Right/Acc/Bias) must be reached via a Phase 2 tile.move. Raw demand would otherwise emit hardware-invalid loads.

coderabbitai

🧹 Nitpick comments (2)

include/pypto/ir/op_registry.h (2)
402-407: Nit: hardcoded "target_memory" couples this predicate to the kwarg name.

HasRetargetableMemoryKwarg() checks op_->HasAttr("target_memory") while set_output_memory_from_kwarg permits any kwarg_key. An op registered as set_output_memory_from_kwarg("other_key", std::nullopt) would defer resolution yet be reported as non-retargetable here. This appears intentional (Phase 3 in infer_tile_memory_space_pass.cpp specifically rewrites target_memory), but if that ever generalizes, the kwarg name will need to be tracked on the spec rather than hardcoded in two places. Consider persisting the kwarg_key on OpMemorySpaceSpec so both the resolver and this predicate share one source of truth.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/pypto/ir/op_registry.h` around lines 402 - 407, The predicate
HasRetargetableMemoryKwarg currently checks op_->HasAttr("target_memory") which
hardcodes the kwarg name; persist the kwarg_key in OpMemorySpaceSpec when
set_output_memory_from_kwarg(...) is called and use that stored
memory_spec_->kwarg_key (instead of the literal "target_memory") in
HasRetargetableMemoryKwarg so both the resolver and predicate share the same
source of truth; update set_output_memory_from_kwarg to populate
OpMemorySpaceSpec::kwarg_key, adjust HasRetargetableMemoryKwarg to check
op_->HasAttr(memory_spec_->kwarg_key) and related calls that assume
"target_memory" to use the spec value, and keep behavior unchanged when
kwarg_key is empty/nullopt for backward compatibility.
384-407: LGTM on the new predicates.

OutputMemoryInheritsInput() and HasRetargetableMemoryKwarg() correctly centralize the view-op vs. retargetable-producer distinction. The tightened check at line 406 — invoking deduce_output_memory({}) and requiring nullopt — properly excludes ops like tile.move that carry a target_memory kwarg but resolve to a concrete default (e.g. Vec), matching the commit-message intent. The mutual exclusion with output_inherits_input at line 404 prevents view-like ops from being mistakenly rewritten by Phase 3.

Minor performance note: HasRetargetableMemoryKwarg() constructs an empty std::vector and invokes the resolver lambda on every call. If this is hit per-op in a tight loop during inference, caching the "resolver-defers-when-empty" result on the spec at registration time would eliminate the per-call invocation, but this is likely negligible in practice.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/pypto/ir/op_registry.h` around lines 384 - 407, The performance issue
is that HasRetargetableMemoryKwarg() calls
memory_spec_->deduce_output_memory({}) on every invocation; cache whether the
resolver defers for an empty input at registration time so the predicate can
cheaply consult a boolean. Add a cached flag (e.g.
MemorySpec::resolver_defers_on_empty or similar) to the MemorySpec populated
when the op is registered (compute resolver_defers_on_empty =
!deduce_output_memory({}).has_value() once), and change
HasRetargetableMemoryKwarg() to use memory_spec_->resolver_defers_on_empty
(keeping the existing checks for memory_spec_ presence, deduce_output_memory
existence, and output_inherits_input).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@include/pypto/ir/op_registry.h`:
- Around line 402-407: The predicate HasRetargetableMemoryKwarg currently checks
op_->HasAttr("target_memory") which hardcodes the kwarg name; persist the
kwarg_key in OpMemorySpaceSpec when set_output_memory_from_kwarg(...) is called
and use that stored memory_spec_->kwarg_key (instead of the literal
"target_memory") in HasRetargetableMemoryKwarg so both the resolver and
predicate share the same source of truth; update set_output_memory_from_kwarg to
populate OpMemorySpaceSpec::kwarg_key, adjust HasRetargetableMemoryKwarg to
check op_->HasAttr(memory_spec_->kwarg_key) and related calls that assume
"target_memory" to use the spec value, and keep behavior unchanged when
kwarg_key is empty/nullopt for backward compatibility.
- Around line 384-407: The performance issue is that
HasRetargetableMemoryKwarg() calls memory_spec_->deduce_output_memory({}) on
every invocation; cache whether the resolver defers for an empty input at
registration time so the predicate can cheaply consult a boolean. Add a cached
flag (e.g. MemorySpec::resolver_defers_on_empty or similar) to the MemorySpec
populated when the op is registered (compute resolver_defers_on_empty =
!deduce_output_memory({}).has_value() once), and change
HasRetargetableMemoryKwarg() to use memory_spec_->resolver_defers_on_empty
(keeping the existing checks for memory_spec_ presence, deduce_output_memory
existence, and output_inherits_input).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 8b9da074-b753-4da3-bfc2-401347466a8a

📥 Commits

Reviewing files that changed from the base of the PR and between 5d727ac and ccf880b.

📒 Files selected for processing (2)

include/pypto/ir/op_registry.h
src/ir/transforms/infer_tile_memory_space_pass.cpp

✅ Files skipped from review due to trivial changes (1)

src/ir/transforms/infer_tile_memory_space_pass.cpp

Copilot AI review requested due to automatic review settings April 22, 2026 16:51

github-project-automation Bot added this to pto project Apr 22, 2026

Copilot started reviewing on behalf of Hzfengsy April 22, 2026 16:52 View session

gemini-code-assist Bot reviewed Apr 22, 2026

View reviewed changes

Copilot AI reviewed Apr 22, 2026

View reviewed changes

Comment thread include/pypto/ir/op_registry.h Outdated

Comment thread src/ir/transforms/infer_tile_memory_space_pass.cpp

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 23, 2026

View reviewed changes

lyfne123 approved these changes Apr 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(ir): add backward memory-space demand inference#1130

refactor(ir): add backward memory-space demand inference#1130
Hzfengsy wants to merge 2 commits intohw-native-sys:mainfrom
Hzfengsy:refactor/memory-space-demand-inference

Hzfengsy commented Apr 22, 2026

Uh oh!

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Hzfengsy commented Apr 22, 2026

Summary

Test plan

Scope

Uh oh!

coderabbitai Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading