Skip to content

Add blocked matmul (linalg.mmt4d) op support#265

Draft
MaheshRavishankar wants to merge 1 commit intoiree-org:mainfrom
MaheshRavishankar:users/MaheshRavishankar/blockedMatmul
Draft

Add blocked matmul (linalg.mmt4d) op support#265
MaheshRavishankar wants to merge 1 commit intoiree-org:mainfrom
MaheshRavishankar:users/MaheshRavishankar/blockedMatmul

Conversation

@MaheshRavishankar
Copy link
Copy Markdown
Contributor

Add BlockedMatmulNode for tiled matrix multiplication that lowers to linalg.mmt4d via torch_c casts:
LHS logical [M0, K0, M1, K1] x RHS logical [K0, N0, K1, N1]
-> OUT [M0, N0, M1, N1]

RHS must be specified with transposed strides (physical [N0, K0, N1, K1]) matching linalg.mmt4d's expected layout. Non-transposed RHS returns a NotImplemented error.

The emitter casts torch tensors to builtin tensors (torch_c), applies linalg.fill + linalg.mmt4d, and casts the result back. No permute ops are needed since the physical layout is used directly.

New files:

  • BlockedMatmulAttr (attributes)
  • BlockedMatmulNode (node with validation and shape inference)
  • ASM emitter with getBuiltinTensorTypeAsm helper
  • Lit test verifying MLIR structure, flow compilation, single dispatch
  • E2E sample verifying numerical correctness on CPU

Add BlockedMatmulNode for tiled matrix multiplication that lowers to
linalg.mmt4d via torch_c casts:
  LHS logical [M0, K0, M1, K1] x RHS logical [K0, N0, K1, N1]
  -> OUT [M0, N0, M1, N1]

RHS must be specified with transposed strides (physical [N0, K0, N1, K1])
matching linalg.mmt4d's expected layout. Non-transposed RHS returns a
NotImplemented error.

The emitter casts torch tensors to builtin tensors (torch_c), applies
linalg.fill + linalg.mmt4d, and casts the result back. No permute ops
are needed since the physical layout is used directly.

New files:
- BlockedMatmulAttr (attributes)
- BlockedMatmulNode (node with validation and shape inference)
- ASM emitter with getBuiltinTensorTypeAsm helper
- Lit test verifying MLIR structure, flow compilation, single dispatch
- E2E sample verifying numerical correctness on CPU

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
@MaheshRavishankar MaheshRavishankar force-pushed the users/MaheshRavishankar/blockedMatmul branch from 7966694 to 5cb5565 Compare March 24, 2026 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant