refactor(ir): add backward memory-space demand inference#1130
refactor(ir): add backward memory-space demand inference#1130Hzfengsy wants to merge 2 commits intohw-native-sys:mainfrom
Conversation
Introduce a single metadata-driven mechanism that back-propagates consumer
memory-space demand so chains like slice→alias→matmul compile as pure-AIC
without routing loads through Vec.
OpRegistry gains an explicit OutputMemoryInheritsInput() predicate backed
by OpMemorySpaceSpec::output_inherits_input, and HasRetargetableMemoryKwarg()
for producers with a writable target_memory kwarg. set_output_memory_from_kwarg
now accepts std::nullopt as default, marking the op as retargetable with no
fallback.
InferTileMemorySpace gains a Phase-0 DemandCollector that records per-op
input_constraints and back-propagates demand through any op whose
OutputMemoryInheritsInput() is true plus SSA aliases elided by the parser.
Phase 1 handles plain SSA alias assignments and consults demand for
retargetable producers when their kwarg is absent; Phase 3 rewrites
target_memory on any retargetable op whose resolved space differs (not
just tile.create). ConsumerSpaceCollector in convert_tensor_to_tile_ops
does the same at the tensor layer. Propagation is a single reverse-order
sweep over edges captured in program order — O(N).
tile.load / tile.create become retargetable: target_memory kwarg is now
optional, DeduceTileLoadType tolerates its absence, and InferTileMemorySpace
resolves it from consumer demand when absent.
init_memref and memory_reuse_pass delegate their view-op predicate to the
shared OutputMemoryInheritsInput() helper, replacing the brittle
deduce_output_memory({}).has_value() probe.
Python binding and type stub distinguish "inherit_from_input" (view ops)
from "deferred" (retargetable-no-kwarg) in get_op_memory_spec. Test
expectations for tile.load / tile.create specs updated to "deferred".
Adds DSL Before/Expected tests in test_convert_tensor_to_tile_ops.py and
test_infer_tile_memory_space.py covering slice→alias→matmul and transitive
alias chains — the qwen3 MLP-down pattern that motivated the refactor.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughExtends op-registry memory-space semantics with an output-inherits-input flag and true retargetable (deferred) kwarg behavior; propagates consumer demands backward through inherit-input and SSA-alias chains; updates inference and conversion passes to use deferred/resolved spaces and rewrite Changes
Sequence DiagramsequenceDiagram
participant Convert as ConvertPass
participant Registry as OpRegistry
participant Infer as InferPass
participant Mutator as Mutator
Convert->>Registry: Query OutputMemoryInheritsInput()
Registry-->>Convert: Inheritance flag
Convert->>Convert: Pre-scan consumer requirements
Convert->>Convert: Record propagation edges (SSA aliases, inherit-input ops)
Convert->>Convert: Propagate requirements through edges
Infer->>Registry: Query input_constraints for ops
Registry-->>Infer: Input demand specs
Infer->>Infer: Phase 0: collect backward demands
Infer->>Registry: Query output memory resolver (deduce_output_memory)
alt Resolver returns nullopt
Infer->>Infer: Use backward-demanded space (if valid)
else Resolver returns MemorySpace
Infer->>Infer: Use resolved space
end
Infer->>Mutator: Resolved spaces per var
Mutator->>Registry: Query HasRetargetableMemoryKwarg()
Registry-->>Mutator: Retargetable flag
Mutator->>Mutator: Rewrite `target_memory` kwarg when retargetable
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request enhances the memory space inference system by introducing retargetable producers and a backward demand propagation phase. Key changes include updating the OpRegistry to support deferred memory space resolution for operations like tile.load and tile.create, and implementing a DemandCollector in the InferTileMemorySpace pass to propagate consumer requirements through view-like operations and SSA aliases. The PR also refactors view-op detection across several passes to use a centralized predicate in the registry and adds comprehensive unit tests for the new propagation logic. I have no feedback to provide as no review comments were submitted.
There was a problem hiding this comment.
Pull request overview
Adds a metadata-driven, backward “consumer demand” mechanism to infer/retarget tile memory spaces so view/alias chains (e.g. slice → alias → matmul) can stay pure-AIC and avoid defaulting loads through Vec.
Changes:
- Introduces
OpRegistryEntry::OutputMemoryInheritsInput()andHasRetargetableMemoryKwarg()plusoutput_inherits_inputspec flag to unify “view op” handling across passes. - Extends
InferTileMemorySpaceandConvertTensorToTileOpsto propagate memory-space requirements backward through inherit-input ops and SSA aliases, and to rewrite retargetable producers’target_memory. - Updates
tile.load/tile.createto allow missingtarget_memory(deferred resolution), and updates Python bindings/stubs + tests accordingly.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
include/pypto/ir/op_registry.h |
Adds memory-space spec flag + registry predicates for inherit-input and retargetable ops. |
src/ir/transforms/infer_tile_memory_space_pass.cpp |
Adds backward demand collection, SSA-alias propagation, and generalized target_memory rewriting. |
src/ir/transforms/convert_tensor_to_tile_ops_pass.cpp |
Back-propagates consumer space requirements through inherit-input ops and SSA aliases at tensor-conversion time. |
src/ir/op/tile_ops/memory.cpp |
Makes target_memory optional for tile.load/tile.create; updates type deduction accordingly. |
src/ir/transforms/init_memref.cpp |
Switches view-op detection to the shared registry predicate. |
src/ir/transforms/memory_reuse_pass.cpp |
Switches inherit-input detection to the shared registry predicate. |
python/bindings/modules/ir.cpp |
Exposes "inherit_from_input" vs "deferred" in get_op_memory_spec. |
python/pypto/pypto_core/ir.pyi |
Documents the new "deferred" output_memory value in the stub. |
tests/ut/ir/operators/test_op_registry.py |
Updates expectations for "deferred" on tile.load/tile.create. |
tests/ut/ir/transforms/test_infer_tile_memory_space.py |
Adds SSA-alias inheritance tests for InferTileMemorySpace. |
tests/ut/ir/transforms/test_convert_tensor_to_tile_ops.py |
Adds slice→alias→matmul tests ensuring Mat-directed loads are produced. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/ir/transforms/infer_tile_memory_space_pass.cpp (1)
458-503:⚠️ Potential issue | 🟡 MinorPhase 3
target_memoryrewrite should useOpRegistry::Createto preserve type deduction semantics, or handle transpose-awarevalid_shapeswap.The manual
std::make_shared<Call>(...)construction bypassesDeduceTileLoadType, which applies a critical transpose-aware swap ofvalid_shapedimensions (lines 179–182 ofsrc/ir/op/tile_ops/memory.cpp).GetImplicitTileView(shape, promoted)only populatesvalid_shape = shapewithout the swap.Currently, this is unreachable:
tile.loadwithtranspose=truemust commit totarget_memory=Matat construction time, andInferTileMemorySpaceexplicitly does not revisit transpose decisions, so Phase 3 retargeting should not be triggered for transposed loads. However, the code lacks a guard against this scenario. To be defensive, either route throughOpRegistry::Create(call->op_->name_, call->args_, new_kwargs, call->span_)to re-run the registered deducer, or pre-compute the transpose swap onvalid_shapebefore constructing theTileTypemanually if thetransposekwarg is present.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ir/transforms/infer_tile_memory_space_pass.cpp` around lines 458 - 503, The manual construction of Call/TileType in InferTileMemorySpace (the Call handling block that rewrites "target_memory") bypasses the registered deducer (e.g., DeduceTileLoadType) and thus misses transpose-aware valid_shape swapping; change the code to either (1) recreate the call via OpRegistry::Create(call->op_->name_, call->args_, new_kwargs, call->span_) so the op's deducer runs and produces the correct TileType, or (2) if you must construct the TileType manually, detect a transpose=true kwarg on the original call and apply the same valid_shape dimension swap that DeduceTileLoadType performs before calling tile_view_semantics::GetImplicitTileView and building the new TileType (update references: Call, OpRegistry::Create, DeduceTileLoadType, GetImplicitTileView, TileType, and the "transpose" kwarg).
🧹 Nitpick comments (3)
src/ir/transforms/infer_tile_memory_space_pass.cpp (2)
190-199: Plain SSA-alias inheritance depends onsrc_varalready being invar_memory_.
y = xis handled correctly whenxwas assigned earlier in the same analyzer walk (the common case covered by the new tests). However, ifxis aForStmtiter_arg whose inferred space is only back-filled inVisitStmt_(ForStmtPtr)after the loop body visit, then an aliasy = xinside the body runs before the iter_arg's entry exists invar_memory_andygets no memory_space recorded at all — leaving theTileMemoryInferredVerifierto reject the function at pass exit.Probably unreachable under current frontends (aliases of iter_args aren't emitted), but worth either a comment noting the assumption or a targeted fallback (e.g. consult the
TileType::memory_space_annotation onsrc_varas a last resort) for defensive robustness.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ir/transforms/infer_tile_memory_space_pass.cpp` around lines 190 - 199, The plain SSA alias handling in the branch that matches As<Var>(op->value_) assumes src_var exists in var_memory_; update it to defensively fallback when src_it == var_memory_.end() by inspecting src_var's declared type (use TileType::memory_space_ on src_var->type if it is a TileType) and set var_memory_[op->var_] from that memory_space_; also mention (or add) a short comment referencing VisitStmt_(ForStmtPtr) to explain why back-filling can occur and why the fallback is needed.
69-71: Conflicting non-Vecdemands resolve by first-wins — consider documenting or detecting.
ShouldOverrideDemandonly upgradesVec→ any specialized space. When two consumers impose different specialized spaces on the same producer var (e.g. one path demandsLeft, another demandsAcc),try_emplacesilently keeps whichever was collected first, making the final space dependent on IR visit order.Today this likely never occurs in practice (matmul inputs don't feed accumulators, etc.), but a defensive
CHECK/diagnostic — or at least a comment spelling out the invariant — would help future maintainers avoid silently wrong retargeting if the invariant breaks.Also applies to: 124-143
include/pypto/ir/op_registry.h (1)
322-342: Consider recording the retargetable kwarg key in the spec rather than hardcoding"target_memory".
set_output_memory_from_kwargaccepts an arbitrarykwarg_key, butHasRetargetableMemoryKwarg()hardcodes a check for"target_memory". If any future op registers with a different retargetable kwarg (e.g."output_memory"), the resolver would still work butInferTileMemorySpacewould silently skip retargeting it. Storing the kwarg key intoOpMemorySpaceSpecwhen the resolver is installed would make the two consistent.Sketch
struct OpMemorySpaceSpec { ... OutputResolver deduce_output_memory; + /// Set by set_output_memory_from_kwarg — name of the kwarg whose value the + /// resolver reads. Empty when no kwarg-driven resolver was installed. + std::string retargetable_kwarg_key; ... };[[nodiscard]] bool HasRetargetableMemoryKwarg() const { if (!memory_spec_.has_value() || !memory_spec_->deduce_output_memory) return false; if (memory_spec_->output_inherits_input) return false; - return op_ && op_->HasAttr("target_memory"); + const auto& key = memory_spec_->retargetable_kwarg_key; + return !key.empty() && op_ && op_->HasAttr(key); }Also applies to: 399-403
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@include/pypto/ir/op_registry.h` around lines 322 - 342, The deduce_output_memory lambda is installed by set_output_memory_from_kwarg but the kwarg key isn't recorded in OpMemorySpaceSpec, while HasRetargetableMemoryKwarg() still checks the hardcoded "target_memory"; add a field to OpMemorySpaceSpec (e.g., retargetable_kwarg or retargetable_output_kwarg), populate it inside set_output_memory_from_kwarg (using memory_spec_), and change HasRetargetableMemoryKwarg() to check that spec field instead of the fixed string; apply the same change for the analogous input/other-kwarg setter at the other location (lines ~399-403) so the resolver and the retargeting check stay consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/ir/transforms/infer_tile_memory_space_pass.cpp`:
- Around line 458-503: The manual construction of Call/TileType in
InferTileMemorySpace (the Call handling block that rewrites "target_memory")
bypasses the registered deducer (e.g., DeduceTileLoadType) and thus misses
transpose-aware valid_shape swapping; change the code to either (1) recreate the
call via OpRegistry::Create(call->op_->name_, call->args_, new_kwargs,
call->span_) so the op's deducer runs and produces the correct TileType, or (2)
if you must construct the TileType manually, detect a transpose=true kwarg on
the original call and apply the same valid_shape dimension swap that
DeduceTileLoadType performs before calling
tile_view_semantics::GetImplicitTileView and building the new TileType (update
references: Call, OpRegistry::Create, DeduceTileLoadType, GetImplicitTileView,
TileType, and the "transpose" kwarg).
---
Nitpick comments:
In `@include/pypto/ir/op_registry.h`:
- Around line 322-342: The deduce_output_memory lambda is installed by
set_output_memory_from_kwarg but the kwarg key isn't recorded in
OpMemorySpaceSpec, while HasRetargetableMemoryKwarg() still checks the hardcoded
"target_memory"; add a field to OpMemorySpaceSpec (e.g., retargetable_kwarg or
retargetable_output_kwarg), populate it inside set_output_memory_from_kwarg
(using memory_spec_), and change HasRetargetableMemoryKwarg() to check that spec
field instead of the fixed string; apply the same change for the analogous
input/other-kwarg setter at the other location (lines ~399-403) so the resolver
and the retargeting check stay consistent.
In `@src/ir/transforms/infer_tile_memory_space_pass.cpp`:
- Around line 190-199: The plain SSA alias handling in the branch that matches
As<Var>(op->value_) assumes src_var exists in var_memory_; update it to
defensively fallback when src_it == var_memory_.end() by inspecting src_var's
declared type (use TileType::memory_space_ on src_var->type if it is a TileType)
and set var_memory_[op->var_] from that memory_space_; also mention (or add) a
short comment referencing VisitStmt_(ForStmtPtr) to explain why back-filling can
occur and why the fallback is needed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 33761b61-7ec4-4e96-a771-e97fc5d48013
📒 Files selected for processing (11)
include/pypto/ir/op_registry.hpython/bindings/modules/ir.cpppython/pypto/pypto_core/ir.pyisrc/ir/op/tile_ops/memory.cppsrc/ir/transforms/convert_tensor_to_tile_ops_pass.cppsrc/ir/transforms/infer_tile_memory_space_pass.cppsrc/ir/transforms/init_memref.cppsrc/ir/transforms/memory_reuse_pass.cpptests/ut/ir/operators/test_op_registry.pytests/ut/ir/transforms/test_convert_tensor_to_tile_ops.pytests/ut/ir/transforms/test_infer_tile_memory_space.py
- Tighten HasRetargetableMemoryKwarg() to require the resolver to
actually defer (return nullopt for empty kwargs). Keeps tile.move
(Vec default) out of the "retargetable" bucket so Phase 3 Mutator
can't accidentally rewrite its target_memory kwarg.
- In Phase 1 InferFromOp, clamp backward-collected demand for
retargetable producers to {Vec, Mat}. DDR-facing tile.load can only
directly produce {Vec, Mat}; specialized consumer demands
(Left/Right/Acc/Bias) must be reached via a Phase 2 tile.move. Raw
demand would otherwise emit hardware-invalid loads.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
include/pypto/ir/op_registry.h (2)
402-407: Nit: hardcoded"target_memory"couples this predicate to the kwarg name.
HasRetargetableMemoryKwarg()checksop_->HasAttr("target_memory")whileset_output_memory_from_kwargpermits anykwarg_key. An op registered asset_output_memory_from_kwarg("other_key", std::nullopt)would defer resolution yet be reported as non-retargetable here. This appears intentional (Phase 3 ininfer_tile_memory_space_pass.cppspecifically rewritestarget_memory), but if that ever generalizes, the kwarg name will need to be tracked on the spec rather than hardcoded in two places. Consider persisting thekwarg_keyonOpMemorySpaceSpecso both the resolver and this predicate share one source of truth.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@include/pypto/ir/op_registry.h` around lines 402 - 407, The predicate HasRetargetableMemoryKwarg currently checks op_->HasAttr("target_memory") which hardcodes the kwarg name; persist the kwarg_key in OpMemorySpaceSpec when set_output_memory_from_kwarg(...) is called and use that stored memory_spec_->kwarg_key (instead of the literal "target_memory") in HasRetargetableMemoryKwarg so both the resolver and predicate share the same source of truth; update set_output_memory_from_kwarg to populate OpMemorySpaceSpec::kwarg_key, adjust HasRetargetableMemoryKwarg to check op_->HasAttr(memory_spec_->kwarg_key) and related calls that assume "target_memory" to use the spec value, and keep behavior unchanged when kwarg_key is empty/nullopt for backward compatibility.
384-407: LGTM on the new predicates.
OutputMemoryInheritsInput()andHasRetargetableMemoryKwarg()correctly centralize the view-op vs. retargetable-producer distinction. The tightened check at line 406 — invokingdeduce_output_memory({})and requiringnullopt— properly excludes ops liketile.movethat carry atarget_memorykwarg but resolve to a concrete default (e.g.Vec), matching the commit-message intent. The mutual exclusion withoutput_inherits_inputat line 404 prevents view-like ops from being mistakenly rewritten by Phase 3.Minor performance note:
HasRetargetableMemoryKwarg()constructs an emptystd::vectorand invokes the resolver lambda on every call. If this is hit per-op in a tight loop during inference, caching the "resolver-defers-when-empty" result on the spec at registration time would eliminate the per-call invocation, but this is likely negligible in practice.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@include/pypto/ir/op_registry.h` around lines 384 - 407, The performance issue is that HasRetargetableMemoryKwarg() calls memory_spec_->deduce_output_memory({}) on every invocation; cache whether the resolver defers for an empty input at registration time so the predicate can cheaply consult a boolean. Add a cached flag (e.g. MemorySpec::resolver_defers_on_empty or similar) to the MemorySpec populated when the op is registered (compute resolver_defers_on_empty = !deduce_output_memory({}).has_value() once), and change HasRetargetableMemoryKwarg() to use memory_spec_->resolver_defers_on_empty (keeping the existing checks for memory_spec_ presence, deduce_output_memory existence, and output_inherits_input).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@include/pypto/ir/op_registry.h`:
- Around line 402-407: The predicate HasRetargetableMemoryKwarg currently checks
op_->HasAttr("target_memory") which hardcodes the kwarg name; persist the
kwarg_key in OpMemorySpaceSpec when set_output_memory_from_kwarg(...) is called
and use that stored memory_spec_->kwarg_key (instead of the literal
"target_memory") in HasRetargetableMemoryKwarg so both the resolver and
predicate share the same source of truth; update set_output_memory_from_kwarg to
populate OpMemorySpaceSpec::kwarg_key, adjust HasRetargetableMemoryKwarg to
check op_->HasAttr(memory_spec_->kwarg_key) and related calls that assume
"target_memory" to use the spec value, and keep behavior unchanged when
kwarg_key is empty/nullopt for backward compatibility.
- Around line 384-407: The performance issue is that
HasRetargetableMemoryKwarg() calls memory_spec_->deduce_output_memory({}) on
every invocation; cache whether the resolver defers for an empty input at
registration time so the predicate can cheaply consult a boolean. Add a cached
flag (e.g. MemorySpec::resolver_defers_on_empty or similar) to the MemorySpec
populated when the op is registered (compute resolver_defers_on_empty =
!deduce_output_memory({}).has_value() once), and change
HasRetargetableMemoryKwarg() to use memory_spec_->resolver_defers_on_empty
(keeping the existing checks for memory_spec_ presence, deduce_output_memory
existence, and output_inherits_input).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 8b9da074-b753-4da3-bfc2-401347466a8a
📒 Files selected for processing (2)
include/pypto/ir/op_registry.hsrc/ir/transforms/infer_tile_memory_space_pass.cpp
✅ Files skipped from review due to trivial changes (1)
- src/ir/transforms/infer_tile_memory_space_pass.cpp
Summary
Introduces a single metadata-driven mechanism for back-propagating consumer memory-space demand, so chains like
slice → alias → matmulcompile as pure-AIC without routing loads throughVec. Concretely fixes the qwen3 decode MLP-down pattern whereincore_16was being incorrectly expanded into an AIC + empty-AIV pair becausetile.loaddefaulted toVecand combined withmatmulproduced aMIXEDcore-affinity classification.Key pieces:
OpRegistry— newOutputMemoryInheritsInput()andHasRetargetableMemoryKwarg()predicates backed byOpMemorySpaceSpec::output_inherits_input.set_output_memory_from_kwargnow acceptsstd::nulloptdefault, marking an op as retargetable with no fallback.InferTileMemorySpace— Phase-0DemandCollectorrecords per-opinput_constraintsand back-propagates throughOutputMemoryInheritsInput()ops + parser-elided SSA aliases. Phase 1 handlesy = xalias statements and consults demand for retargetable producers when the kwarg is absent (never overrides an explicit kwarg). Phase 3 rewritestarget_memoryon any retargetable op whose resolved space differs (generalizes today'stile.create-only rewrite).ConvertTensorToTileOps—ConsumerSpaceCollector::PropagateThroughInheritInputOpsback-propagates at the tensor layer through inherit-input Calls and SSA aliases, sotensor.slice → tensor.matmulemitstile.load(target=Mat)directly.tile.load/tile.create— become retargetable:target_memorykwarg is optional,DeduceTileLoadTypetolerates its absence, andInferTileMemorySpaceresolves it from consumer demand.init_memrefandmemory_reuse_passdelegate the view-op predicate toOutputMemoryInheritsInput(), replacing the brittlededuce_output_memory({}).has_value()probe.get_op_memory_specdistinguishes"inherit_from_input"from"deferred"(retargetable-no-kwarg).Propagation is a single reverse-order sweep over edges captured in program order — O(N).
Test plan
Before/Expectedtests intest_convert_tensor_to_tile_ops.pyforslice → alias → matmuland transitive alias chains — exact qwen3 MLP-down pattern.Before/Expectedtests intest_infer_tile_memory_space.pyfor Phase-1 SSA-alias memory-space inheritance (single and transitive).test_op_registry.pyexpectations:tile.load/tile.createreport"deferred"when kwarg absent.incore_16is nowFunctionType.AICwith both loads onMem.Matandtile.move(Mat→Left/Right)inserted beforematmul(no AIV shell, no Group wrapper, nodual_aiv_dispatchattr, no injectedget_subblock_idx).Scope
Intentionally does not include the fillpad-related changes from an earlier WIP on the same direction (which had an unresolved
TFILLPAD-on-Matcodegen issue).tile.fillpad/tile.fillpad_inplace/tensor.fillpadkeep their current Vec-only registrations; enabling them as inherit-input is a separate follow-up.End-to-end qwen3 compilation now surfaces a separate, unrelated codegen issue in the
Spmd → AICorchestration path (orchestration_codegen.cpp:661: "inner call arg does not map to any wrapper parameter") when the Spmd wrapper calls an AIC function with a locally-created tensor argument. Previously masked because the broken classification routed through aGroupwrapper. Not fixed here.