Add AMAX, AVG, NORM1, NORM2, MUL, MUL_NO_ZEROS reduction modes by rsuderman · Pull Request #325 · iree-org/fusilli

rsuderman · 2026-04-08T18:04:12Z

Enable the remaining cuDNN reduction modes in ReductionAttr and add the corresponding MLIR schemas to the asm emitter:

NORM1 lowers to abs + sum.dim_IntList.
AMAX lowers to abs + amax.
AVG lowers to mean.dim (float dtypes only — torch.aten.mean.dim is not defined on integer tensors, so the sample skips int32 for AVG).
NORM2 lowers to mul + sum.dim_IntList + sqrt.
MUL lowers directly to torch.prims.prod.
MUL_NO_ZEROS uses aten.ne.Scalar to build an i1 mask, then aten.where.ScalarOther to substitute 1 for zero entries before feeding the result to torch.prims.prod, so zero inputs are excluded from the product.

Extend samples/reduction/reduction_ops.cpp to exercise every new mode. Input data is built by a per-mode generateReductionInputData helper so MUL/MUL_NO_ZEROS get a non-trivial pattern (mostly 1s with a 2 and a 3, plus injected zeros for MUL_NO_ZEROS) that stays in range for fp16/int32, and the expected value is computed by the existing reference reduction loop rather than hardcoded.

Add lit tests for each new mode under tests/lit/ and register them in tests/CMakeLists.txt.

Enable the remaining cuDNN reduction modes in ReductionAttr and add the corresponding MLIR schemas to the asm emitter: - NORM1 lowers to abs + sum.dim_IntList. - AMAX lowers to abs + amax. - AVG lowers to mean.dim (float dtypes only — torch.aten.mean.dim is not defined on integer tensors, so the sample skips int32 for AVG). - NORM2 lowers to mul + sum.dim_IntList + sqrt. - MUL lowers directly to torch.prims.prod. - MUL_NO_ZEROS uses aten.ne.Scalar to build an i1 mask, then aten.where.ScalarOther to substitute 1 for zero entries before feeding the result to torch.prims.prod, so zero inputs are excluded from the product. Extend samples/reduction/reduction_ops.cpp to exercise every new mode. Input data is built by a per-mode generateReductionInputData helper so MUL/MUL_NO_ZEROS get a non-trivial pattern (mostly 1s with a 2 and a 3, plus injected zeros for MUL_NO_ZEROS) that stays in range for fp16/int32, and the expected value is computed by the existing reference reduction loop rather than hardcoded. Add lit tests for each new mode under tests/lit/ and register them in tests/CMakeLists.txt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Rob Suderman <rob.suderman@gmail.com>

Signed-off-by: Rob Suderman <rob.suderman@gmail.com> # Conflicts: # include/fusilli/support/asm_emitter.h # samples/reduction/reduction_ops.cpp

Signed-off-by: Rob Suderman <rob.suderman@gmail.com>

rsuderman requested a review from IanWood1 April 8, 2026 18:25

IanWood1 mentioned this pull request Apr 8, 2026

[NFC] Refactor reduction emitter to be macro-based #320

Merged

rsuderman added 2 commits April 9, 2026 11:08

Merge remote-tracking branch 'origin/main' into HEAD

f4c10ab

Signed-off-by: Rob Suderman <rob.suderman@gmail.com> # Conflicts: # include/fusilli/support/asm_emitter.h # samples/reduction/reduction_ops.cpp

Match the templating approach

2a5541c

Signed-off-by: Rob Suderman <rob.suderman@gmail.com>

rsuderman force-pushed the reduction_rest branch from 05be4e2 to 2a5541c Compare April 9, 2026 20:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AMAX, AVG, NORM1, NORM2, MUL, MUL_NO_ZEROS reduction modes#325

Add AMAX, AVG, NORM1, NORM2, MUL, MUL_NO_ZEROS reduction modes#325
rsuderman wants to merge 3 commits intoiree-org:mainfrom
rsuderman:reduction_rest

rsuderman commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rsuderman commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant