Skip to content

refactor(ir): add backward memory-space demand inference#1130

Open
Hzfengsy wants to merge 2 commits intohw-native-sys:mainfrom
Hzfengsy:refactor/memory-space-demand-inference
Open

refactor(ir): add backward memory-space demand inference#1130
Hzfengsy wants to merge 2 commits intohw-native-sys:mainfrom
Hzfengsy:refactor/memory-space-demand-inference

Conversation

@Hzfengsy
Copy link
Copy Markdown
Member

Summary

Introduces a single metadata-driven mechanism for back-propagating consumer memory-space demand, so chains like slice → alias → matmul compile as pure-AIC without routing loads through Vec. Concretely fixes the qwen3 decode MLP-down pattern where incore_16 was being incorrectly expanded into an AIC + empty-AIV pair because tile.load defaulted to Vec and combined with matmul produced a MIXED core-affinity classification.

Key pieces:

  • OpRegistry — new OutputMemoryInheritsInput() and HasRetargetableMemoryKwarg() predicates backed by OpMemorySpaceSpec::output_inherits_input. set_output_memory_from_kwarg now accepts std::nullopt default, marking an op as retargetable with no fallback.
  • InferTileMemorySpace — Phase-0 DemandCollector records per-op input_constraints and back-propagates through OutputMemoryInheritsInput() ops + parser-elided SSA aliases. Phase 1 handles y = x alias statements and consults demand for retargetable producers when the kwarg is absent (never overrides an explicit kwarg). Phase 3 rewrites target_memory on any retargetable op whose resolved space differs (generalizes today's tile.create-only rewrite).
  • ConvertTensorToTileOpsConsumerSpaceCollector::PropagateThroughInheritInputOps back-propagates at the tensor layer through inherit-input Calls and SSA aliases, so tensor.slice → tensor.matmul emits tile.load(target=Mat) directly.
  • tile.load / tile.create — become retargetable: target_memory kwarg is optional, DeduceTileLoadType tolerates its absence, and InferTileMemorySpace resolves it from consumer demand.
  • Shared helperinit_memref and memory_reuse_pass delegate the view-op predicate to OutputMemoryInheritsInput(), replacing the brittle deduce_output_memory({}).has_value() probe.
  • Python binding / stubget_op_memory_spec distinguishes "inherit_from_input" from "deferred" (retargetable-no-kwarg).

Propagation is a single reverse-order sweep over edges captured in program order — O(N).

Test plan

  • New DSL Before/Expected tests in test_convert_tensor_to_tile_ops.py for slice → alias → matmul and transitive alias chains — exact qwen3 MLP-down pattern.
  • New DSL Before/Expected tests in test_infer_tile_memory_space.py for Phase-1 SSA-alias memory-space inheritance (single and transitive).
  • Updated test_op_registry.py expectations: tile.load / tile.create report "deferred" when kwarg absent.
  • Full IR test suite green: 4009 passing, 17 skipped, no regressions.
  • clang-tidy clean on changed C++.
  • Verified on qwen3.py end-to-end: incore_16 is now FunctionType.AIC with both loads on Mem.Mat and tile.move(Mat→Left/Right) inserted before matmul (no AIV shell, no Group wrapper, no dual_aiv_dispatch attr, no injected get_subblock_idx).

Scope

Intentionally does not include the fillpad-related changes from an earlier WIP on the same direction (which had an unresolved TFILLPAD-on-Mat codegen issue). tile.fillpad / tile.fillpad_inplace / tensor.fillpad keep their current Vec-only registrations; enabling them as inherit-input is a separate follow-up.

End-to-end qwen3 compilation now surfaces a separate, unrelated codegen issue in the Spmd → AIC orchestration path (orchestration_codegen.cpp:661: "inner call arg does not map to any wrapper parameter") when the Spmd wrapper calls an AIC function with a locally-created tensor argument. Previously masked because the broken classification routed through a Group wrapper. Not fixed here.

Introduce a single metadata-driven mechanism that back-propagates consumer
memory-space demand so chains like slice→alias→matmul compile as pure-AIC
without routing loads through Vec.

OpRegistry gains an explicit OutputMemoryInheritsInput() predicate backed
by OpMemorySpaceSpec::output_inherits_input, and HasRetargetableMemoryKwarg()
for producers with a writable target_memory kwarg. set_output_memory_from_kwarg
now accepts std::nullopt as default, marking the op as retargetable with no
fallback.

InferTileMemorySpace gains a Phase-0 DemandCollector that records per-op
input_constraints and back-propagates demand through any op whose
OutputMemoryInheritsInput() is true plus SSA aliases elided by the parser.
Phase 1 handles plain SSA alias assignments and consults demand for
retargetable producers when their kwarg is absent; Phase 3 rewrites
target_memory on any retargetable op whose resolved space differs (not
just tile.create). ConsumerSpaceCollector in convert_tensor_to_tile_ops
does the same at the tensor layer. Propagation is a single reverse-order
sweep over edges captured in program order — O(N).

tile.load / tile.create become retargetable: target_memory kwarg is now
optional, DeduceTileLoadType tolerates its absence, and InferTileMemorySpace
resolves it from consumer demand when absent.

init_memref and memory_reuse_pass delegate their view-op predicate to the
shared OutputMemoryInheritsInput() helper, replacing the brittle
deduce_output_memory({}).has_value() probe.

Python binding and type stub distinguish "inherit_from_input" (view ops)
from "deferred" (retargetable-no-kwarg) in get_op_memory_spec. Test
expectations for tile.load / tile.create specs updated to "deferred".

Adds DSL Before/Expected tests in test_convert_tensor_to_tile_ops.py and
test_infer_tile_memory_space.py covering slice→alias→matmul and transitive
alias chains — the qwen3 MLP-down pattern that motivated the refactor.
Copilot AI review requested due to automatic review settings April 22, 2026 16:51
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Extends op-registry memory-space semantics with an output-inherits-input flag and true retargetable (deferred) kwarg behavior; propagates consumer demands backward through inherit-input and SSA-alias chains; updates inference and conversion passes to use deferred/resolved spaces and rewrite target_memory accordingly.

Changes

Cohort / File(s) Summary
OpRegistry Core
include/pypto/ir/op_registry.h
Add output_inherits_input to OpMemorySpaceSpec; make set_output_memory_from_kwarg accept optional default (std::nullopt); add OutputMemoryInheritsInput() and HasRetargetableMemoryKwarg() query methods.
Python Bindings & Type Stubs
python/bindings/modules/ir.cpp, python/pypto/pypto_core/ir.pyi
Report three output-memory states in ir.get_op_memory_spec: MemorySpace, "inherit_from_input", "deferred", or None; update docstring and binding logic to reflect deferred/inhertiance cases.
Tile Ops Memory Logic
src/ir/op/tile_ops/memory.cpp
Treat target_memory as optional in DeduceTileLoadType; defer layout/transpose decisions when kwarg absent; remove default Vec fallback in tile.create/tile.load registrations.
Convert Tensor→Tile Pass
src/ir/transforms/convert_tensor_to_tile_ops_pass.cpp
Add PropagateThroughInheritInputOps() and record propagation edges for SSA aliases and ops whose outputs inherit input, enabling backward propagation of consumer requirements prior to conversion.
Infer Tile Memory Space Pass
src/ir/transforms/infer_tile_memory_space_pass.cpp
Add Phase 0 demand pre-collection and backward propagation through inherit-input chains; consult backward-demanded spaces when resolvers return nullopt; support SSA-alias inheritance and rewrite target_memory for all retargetable ops.
Init MemRef Pass
src/ir/transforms/init_memref.cpp
Refactor IsViewOperation to use OutputMemoryInheritsInput() registry predicate rather than ad-hoc memory-spec inspection.
Memory Reuse Pass
src/ir/transforms/memory_reuse_pass.cpp
Centralize predicate by delegating IsOutputMemoryInheritInput to OpRegistryEntry::OutputMemoryInheritsInput().
Unit Tests (registry)
tests/ut/ir/operators/test_op_registry.py
Expect tile.load/tile.create output_memory to be "deferred" when target_memory kwarg is absent; update assertions/docstrings.
Conversion & Inference Tests
tests/ut/ir/transforms/test_convert_tensor_to_tile_ops.py, tests/ut/ir/transforms/test_infer_tile_memory_space.py
Add tests for SSA-alias propagation and alias-chain memory-space inheritance (slice→alias→matmul patterns and alias chains), asserting Mat promotions and move insertion behavior.

Sequence Diagram

sequenceDiagram
    participant Convert as ConvertPass
    participant Registry as OpRegistry
    participant Infer as InferPass
    participant Mutator as Mutator

    Convert->>Registry: Query OutputMemoryInheritsInput()
    Registry-->>Convert: Inheritance flag
    Convert->>Convert: Pre-scan consumer requirements
    Convert->>Convert: Record propagation edges (SSA aliases, inherit-input ops)
    Convert->>Convert: Propagate requirements through edges

    Infer->>Registry: Query input_constraints for ops
    Registry-->>Infer: Input demand specs
    Infer->>Infer: Phase 0: collect backward demands
    Infer->>Registry: Query output memory resolver (deduce_output_memory)
    alt Resolver returns nullopt
        Infer->>Infer: Use backward-demanded space (if valid)
    else Resolver returns MemorySpace
        Infer->>Infer: Use resolved space
    end
    Infer->>Mutator: Resolved spaces per var
    Mutator->>Registry: Query HasRetargetableMemoryKwarg()
    Registry-->>Mutator: Retargetable flag
    Mutator->>Mutator: Rewrite `target_memory` kwarg when retargetable
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • lyfne123

"🐰 I hopped through registry rows,
I learned which outputs follow their source,
Deferred demands I carry light,
I nudge targets into place just right,
Hooray—memory dances in order tonight!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 43.55% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'refactor(ir): add backward memory-space demand inference' clearly and specifically describes the main change—introducing backward propagation of memory-space demand in the IR layer.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the motivation (qwen3 MLP-down pattern fix), key architectural pieces (OpRegistry, InferTileMemorySpace, ConvertTensorToTileOps), implementation details, test coverage, and verification results.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the memory space inference system by introducing retargetable producers and a backward demand propagation phase. Key changes include updating the OpRegistry to support deferred memory space resolution for operations like tile.load and tile.create, and implementing a DemandCollector in the InferTileMemorySpace pass to propagate consumer requirements through view-like operations and SSA aliases. The PR also refactors view-op detection across several passes to use a centralized predicate in the registry and adds comprehensive unit tests for the new propagation logic. I have no feedback to provide as no review comments were submitted.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a metadata-driven, backward “consumer demand” mechanism to infer/retarget tile memory spaces so view/alias chains (e.g. slice → alias → matmul) can stay pure-AIC and avoid defaulting loads through Vec.

Changes:

  • Introduces OpRegistryEntry::OutputMemoryInheritsInput() and HasRetargetableMemoryKwarg() plus output_inherits_input spec flag to unify “view op” handling across passes.
  • Extends InferTileMemorySpace and ConvertTensorToTileOps to propagate memory-space requirements backward through inherit-input ops and SSA aliases, and to rewrite retargetable producers’ target_memory.
  • Updates tile.load / tile.create to allow missing target_memory (deferred resolution), and updates Python bindings/stubs + tests accordingly.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
include/pypto/ir/op_registry.h Adds memory-space spec flag + registry predicates for inherit-input and retargetable ops.
src/ir/transforms/infer_tile_memory_space_pass.cpp Adds backward demand collection, SSA-alias propagation, and generalized target_memory rewriting.
src/ir/transforms/convert_tensor_to_tile_ops_pass.cpp Back-propagates consumer space requirements through inherit-input ops and SSA aliases at tensor-conversion time.
src/ir/op/tile_ops/memory.cpp Makes target_memory optional for tile.load/tile.create; updates type deduction accordingly.
src/ir/transforms/init_memref.cpp Switches view-op detection to the shared registry predicate.
src/ir/transforms/memory_reuse_pass.cpp Switches inherit-input detection to the shared registry predicate.
python/bindings/modules/ir.cpp Exposes "inherit_from_input" vs "deferred" in get_op_memory_spec.
python/pypto/pypto_core/ir.pyi Documents the new "deferred" output_memory value in the stub.
tests/ut/ir/operators/test_op_registry.py Updates expectations for "deferred" on tile.load/tile.create.
tests/ut/ir/transforms/test_infer_tile_memory_space.py Adds SSA-alias inheritance tests for InferTileMemorySpace.
tests/ut/ir/transforms/test_convert_tensor_to_tile_ops.py Adds slice→alias→matmul tests ensuring Mat-directed loads are produced.

Comment thread include/pypto/ir/op_registry.h Outdated
Comment thread src/ir/transforms/infer_tile_memory_space_pass.cpp
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/ir/transforms/infer_tile_memory_space_pass.cpp (1)

458-503: ⚠️ Potential issue | 🟡 Minor

Phase 3 target_memory rewrite should use OpRegistry::Create to preserve type deduction semantics, or handle transpose-aware valid_shape swap.

The manual std::make_shared<Call>(...) construction bypasses DeduceTileLoadType, which applies a critical transpose-aware swap of valid_shape dimensions (lines 179–182 of src/ir/op/tile_ops/memory.cpp). GetImplicitTileView(shape, promoted) only populates valid_shape = shape without the swap.

Currently, this is unreachable: tile.load with transpose=true must commit to target_memory=Mat at construction time, and InferTileMemorySpace explicitly does not revisit transpose decisions, so Phase 3 retargeting should not be triggered for transposed loads. However, the code lacks a guard against this scenario. To be defensive, either route through OpRegistry::Create(call->op_->name_, call->args_, new_kwargs, call->span_) to re-run the registered deducer, or pre-compute the transpose swap on valid_shape before constructing the TileType manually if the transpose kwarg is present.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ir/transforms/infer_tile_memory_space_pass.cpp` around lines 458 - 503,
The manual construction of Call/TileType in InferTileMemorySpace (the Call
handling block that rewrites "target_memory") bypasses the registered deducer
(e.g., DeduceTileLoadType) and thus misses transpose-aware valid_shape swapping;
change the code to either (1) recreate the call via
OpRegistry::Create(call->op_->name_, call->args_, new_kwargs, call->span_) so
the op's deducer runs and produces the correct TileType, or (2) if you must
construct the TileType manually, detect a transpose=true kwarg on the original
call and apply the same valid_shape dimension swap that DeduceTileLoadType
performs before calling tile_view_semantics::GetImplicitTileView and building
the new TileType (update references: Call, OpRegistry::Create,
DeduceTileLoadType, GetImplicitTileView, TileType, and the "transpose" kwarg).
🧹 Nitpick comments (3)
src/ir/transforms/infer_tile_memory_space_pass.cpp (2)

190-199: Plain SSA-alias inheritance depends on src_var already being in var_memory_.

y = x is handled correctly when x was assigned earlier in the same analyzer walk (the common case covered by the new tests). However, if x is a ForStmt iter_arg whose inferred space is only back-filled in VisitStmt_(ForStmtPtr) after the loop body visit, then an alias y = x inside the body runs before the iter_arg's entry exists in var_memory_ and y gets no memory_space recorded at all — leaving the TileMemoryInferredVerifier to reject the function at pass exit.

Probably unreachable under current frontends (aliases of iter_args aren't emitted), but worth either a comment noting the assumption or a targeted fallback (e.g. consult the TileType::memory_space_ annotation on src_var as a last resort) for defensive robustness.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ir/transforms/infer_tile_memory_space_pass.cpp` around lines 190 - 199,
The plain SSA alias handling in the branch that matches As<Var>(op->value_)
assumes src_var exists in var_memory_; update it to defensively fallback when
src_it == var_memory_.end() by inspecting src_var's declared type (use
TileType::memory_space_ on src_var->type if it is a TileType) and set
var_memory_[op->var_] from that memory_space_; also mention (or add) a short
comment referencing VisitStmt_(ForStmtPtr) to explain why back-filling can occur
and why the fallback is needed.

69-71: Conflicting non-Vec demands resolve by first-wins — consider documenting or detecting.

ShouldOverrideDemand only upgrades Vec → any specialized space. When two consumers impose different specialized spaces on the same producer var (e.g. one path demands Left, another demands Acc), try_emplace silently keeps whichever was collected first, making the final space dependent on IR visit order.

Today this likely never occurs in practice (matmul inputs don't feed accumulators, etc.), but a defensive CHECK/diagnostic — or at least a comment spelling out the invariant — would help future maintainers avoid silently wrong retargeting if the invariant breaks.

Also applies to: 124-143

include/pypto/ir/op_registry.h (1)

322-342: Consider recording the retargetable kwarg key in the spec rather than hardcoding "target_memory".

set_output_memory_from_kwarg accepts an arbitrary kwarg_key, but HasRetargetableMemoryKwarg() hardcodes a check for "target_memory". If any future op registers with a different retargetable kwarg (e.g. "output_memory"), the resolver would still work but InferTileMemorySpace would silently skip retargeting it. Storing the kwarg key into OpMemorySpaceSpec when the resolver is installed would make the two consistent.

Sketch
 struct OpMemorySpaceSpec {
   ...
   OutputResolver deduce_output_memory;
+  /// Set by set_output_memory_from_kwarg — name of the kwarg whose value the
+  /// resolver reads. Empty when no kwarg-driven resolver was installed.
+  std::string retargetable_kwarg_key;
   ...
 };
 [[nodiscard]] bool HasRetargetableMemoryKwarg() const {
   if (!memory_spec_.has_value() || !memory_spec_->deduce_output_memory) return false;
   if (memory_spec_->output_inherits_input) return false;
-  return op_ && op_->HasAttr("target_memory");
+  const auto& key = memory_spec_->retargetable_kwarg_key;
+  return !key.empty() && op_ && op_->HasAttr(key);
 }

Also applies to: 399-403

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/pypto/ir/op_registry.h` around lines 322 - 342, The
deduce_output_memory lambda is installed by set_output_memory_from_kwarg but the
kwarg key isn't recorded in OpMemorySpaceSpec, while
HasRetargetableMemoryKwarg() still checks the hardcoded "target_memory"; add a
field to OpMemorySpaceSpec (e.g., retargetable_kwarg or
retargetable_output_kwarg), populate it inside set_output_memory_from_kwarg
(using memory_spec_), and change HasRetargetableMemoryKwarg() to check that spec
field instead of the fixed string; apply the same change for the analogous
input/other-kwarg setter at the other location (lines ~399-403) so the resolver
and the retargeting check stay consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/ir/transforms/infer_tile_memory_space_pass.cpp`:
- Around line 458-503: The manual construction of Call/TileType in
InferTileMemorySpace (the Call handling block that rewrites "target_memory")
bypasses the registered deducer (e.g., DeduceTileLoadType) and thus misses
transpose-aware valid_shape swapping; change the code to either (1) recreate the
call via OpRegistry::Create(call->op_->name_, call->args_, new_kwargs,
call->span_) so the op's deducer runs and produces the correct TileType, or (2)
if you must construct the TileType manually, detect a transpose=true kwarg on
the original call and apply the same valid_shape dimension swap that
DeduceTileLoadType performs before calling
tile_view_semantics::GetImplicitTileView and building the new TileType (update
references: Call, OpRegistry::Create, DeduceTileLoadType, GetImplicitTileView,
TileType, and the "transpose" kwarg).

---

Nitpick comments:
In `@include/pypto/ir/op_registry.h`:
- Around line 322-342: The deduce_output_memory lambda is installed by
set_output_memory_from_kwarg but the kwarg key isn't recorded in
OpMemorySpaceSpec, while HasRetargetableMemoryKwarg() still checks the hardcoded
"target_memory"; add a field to OpMemorySpaceSpec (e.g., retargetable_kwarg or
retargetable_output_kwarg), populate it inside set_output_memory_from_kwarg
(using memory_spec_), and change HasRetargetableMemoryKwarg() to check that spec
field instead of the fixed string; apply the same change for the analogous
input/other-kwarg setter at the other location (lines ~399-403) so the resolver
and the retargeting check stay consistent.

In `@src/ir/transforms/infer_tile_memory_space_pass.cpp`:
- Around line 190-199: The plain SSA alias handling in the branch that matches
As<Var>(op->value_) assumes src_var exists in var_memory_; update it to
defensively fallback when src_it == var_memory_.end() by inspecting src_var's
declared type (use TileType::memory_space_ on src_var->type if it is a TileType)
and set var_memory_[op->var_] from that memory_space_; also mention (or add) a
short comment referencing VisitStmt_(ForStmtPtr) to explain why back-filling can
occur and why the fallback is needed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 33761b61-7ec4-4e96-a771-e97fc5d48013

📥 Commits

Reviewing files that changed from the base of the PR and between c53dac0 and 5d727ac.

📒 Files selected for processing (11)
  • include/pypto/ir/op_registry.h
  • python/bindings/modules/ir.cpp
  • python/pypto/pypto_core/ir.pyi
  • src/ir/op/tile_ops/memory.cpp
  • src/ir/transforms/convert_tensor_to_tile_ops_pass.cpp
  • src/ir/transforms/infer_tile_memory_space_pass.cpp
  • src/ir/transforms/init_memref.cpp
  • src/ir/transforms/memory_reuse_pass.cpp
  • tests/ut/ir/operators/test_op_registry.py
  • tests/ut/ir/transforms/test_convert_tensor_to_tile_ops.py
  • tests/ut/ir/transforms/test_infer_tile_memory_space.py

- Tighten HasRetargetableMemoryKwarg() to require the resolver to
  actually defer (return nullopt for empty kwargs). Keeps tile.move
  (Vec default) out of the "retargetable" bucket so Phase 3 Mutator
  can't accidentally rewrite its target_memory kwarg.

- In Phase 1 InferFromOp, clamp backward-collected demand for
  retargetable producers to {Vec, Mat}. DDR-facing tile.load can only
  directly produce {Vec, Mat}; specialized consumer demands
  (Left/Right/Acc/Bias) must be reached via a Phase 2 tile.move. Raw
  demand would otherwise emit hardware-invalid loads.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
include/pypto/ir/op_registry.h (2)

402-407: Nit: hardcoded "target_memory" couples this predicate to the kwarg name.

HasRetargetableMemoryKwarg() checks op_->HasAttr("target_memory") while set_output_memory_from_kwarg permits any kwarg_key. An op registered as set_output_memory_from_kwarg("other_key", std::nullopt) would defer resolution yet be reported as non-retargetable here. This appears intentional (Phase 3 in infer_tile_memory_space_pass.cpp specifically rewrites target_memory), but if that ever generalizes, the kwarg name will need to be tracked on the spec rather than hardcoded in two places. Consider persisting the kwarg_key on OpMemorySpaceSpec so both the resolver and this predicate share one source of truth.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/pypto/ir/op_registry.h` around lines 402 - 407, The predicate
HasRetargetableMemoryKwarg currently checks op_->HasAttr("target_memory") which
hardcodes the kwarg name; persist the kwarg_key in OpMemorySpaceSpec when
set_output_memory_from_kwarg(...) is called and use that stored
memory_spec_->kwarg_key (instead of the literal "target_memory") in
HasRetargetableMemoryKwarg so both the resolver and predicate share the same
source of truth; update set_output_memory_from_kwarg to populate
OpMemorySpaceSpec::kwarg_key, adjust HasRetargetableMemoryKwarg to check
op_->HasAttr(memory_spec_->kwarg_key) and related calls that assume
"target_memory" to use the spec value, and keep behavior unchanged when
kwarg_key is empty/nullopt for backward compatibility.

384-407: LGTM on the new predicates.

OutputMemoryInheritsInput() and HasRetargetableMemoryKwarg() correctly centralize the view-op vs. retargetable-producer distinction. The tightened check at line 406 — invoking deduce_output_memory({}) and requiring nullopt — properly excludes ops like tile.move that carry a target_memory kwarg but resolve to a concrete default (e.g. Vec), matching the commit-message intent. The mutual exclusion with output_inherits_input at line 404 prevents view-like ops from being mistakenly rewritten by Phase 3.

Minor performance note: HasRetargetableMemoryKwarg() constructs an empty std::vector and invokes the resolver lambda on every call. If this is hit per-op in a tight loop during inference, caching the "resolver-defers-when-empty" result on the spec at registration time would eliminate the per-call invocation, but this is likely negligible in practice.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/pypto/ir/op_registry.h` around lines 384 - 407, The performance issue
is that HasRetargetableMemoryKwarg() calls
memory_spec_->deduce_output_memory({}) on every invocation; cache whether the
resolver defers for an empty input at registration time so the predicate can
cheaply consult a boolean. Add a cached flag (e.g.
MemorySpec::resolver_defers_on_empty or similar) to the MemorySpec populated
when the op is registered (compute resolver_defers_on_empty =
!deduce_output_memory({}).has_value() once), and change
HasRetargetableMemoryKwarg() to use memory_spec_->resolver_defers_on_empty
(keeping the existing checks for memory_spec_ presence, deduce_output_memory
existence, and output_inherits_input).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@include/pypto/ir/op_registry.h`:
- Around line 402-407: The predicate HasRetargetableMemoryKwarg currently checks
op_->HasAttr("target_memory") which hardcodes the kwarg name; persist the
kwarg_key in OpMemorySpaceSpec when set_output_memory_from_kwarg(...) is called
and use that stored memory_spec_->kwarg_key (instead of the literal
"target_memory") in HasRetargetableMemoryKwarg so both the resolver and
predicate share the same source of truth; update set_output_memory_from_kwarg to
populate OpMemorySpaceSpec::kwarg_key, adjust HasRetargetableMemoryKwarg to
check op_->HasAttr(memory_spec_->kwarg_key) and related calls that assume
"target_memory" to use the spec value, and keep behavior unchanged when
kwarg_key is empty/nullopt for backward compatibility.
- Around line 384-407: The performance issue is that
HasRetargetableMemoryKwarg() calls memory_spec_->deduce_output_memory({}) on
every invocation; cache whether the resolver defers for an empty input at
registration time so the predicate can cheaply consult a boolean. Add a cached
flag (e.g. MemorySpec::resolver_defers_on_empty or similar) to the MemorySpec
populated when the op is registered (compute resolver_defers_on_empty =
!deduce_output_memory({}).has_value() once), and change
HasRetargetableMemoryKwarg() to use memory_spec_->resolver_defers_on_empty
(keeping the existing checks for memory_spec_ presence, deduce_output_memory
existence, and output_inherits_input).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 8b9da074-b753-4da3-bfc2-401347466a8a

📥 Commits

Reviewing files that changed from the base of the PR and between 5d727ac and ccf880b.

📒 Files selected for processing (2)
  • include/pypto/ir/op_registry.h
  • src/ir/transforms/infer_tile_memory_space_pass.cpp
✅ Files skipped from review due to trivial changes (1)
  • src/ir/transforms/infer_tile_memory_space_pass.cpp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants