[Pallas] Attention perf: further reduce spillage from pre-loading Q, by loading Q in-loop and not pipelining it by AmesingFlank · Pull Request #2397 · pytorch/helion

AmesingFlank · 2026-05-11T21:30:42Z

Optimization found by claude, by comparing the current Helion kernel with this reference impl

Similar to #2373, but takes it one step further

Previously, the Helion attention kernel pre-loads q from q_view[tile_b, tile_m, :] before the device loop. This requires registers which persists across the entire loop, which results in more spillage that damages performance.

This PR modifies this by loading q_view[tile_b, tile_m, :] within the loop body (even though neither tile_b nor tile_m is part of the loop iteration). A small compiler change is needed here: when we decide whether or not to pipeline a tensor within a device loop, if the accessed block ids do not match any of the loop-itered block ids, then no need to pipeline it.

On TPU, this change improves TFLOPs from 653 TFLOPs to 660 TFLOPs.

…by loading Q in-loop and not pipelining it stack-info: PR: #2397, branch: AmesingFlank/stack/51

AmesingFlank · 2026-05-11T21:54:22Z

CI failure, investigating

[Pallas] Attention perf: further reduce spillage from pre-loading Q, …

18e5642

…by loading Q in-loop and not pipelining it stack-info: PR: #2397, branch: AmesingFlank/stack/51

AmesingFlank force-pushed the AmesingFlank/stack/51 branch from caff8b8 to 18e5642 Compare May 11, 2026 21:30

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 11, 2026

AmesingFlank requested a review from norx1991 May 11, 2026 21:36

AmesingFlank marked this pull request as draft May 11, 2026 21:54

AmesingFlank removed the request for review from norx1991 May 11, 2026 21:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Pallas] Attention perf: further reduce spillage from pre-loading Q, by loading Q in-loop and not pipelining it#2397

[Pallas] Attention perf: further reduce spillage from pre-loading Q, by loading Q in-loop and not pipelining it#2397
AmesingFlank wants to merge 1 commit into
mainfrom
AmesingFlank/stack/51

AmesingFlank commented May 11, 2026 •

edited

Loading

Uh oh!

AmesingFlank commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AmesingFlank commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AmesingFlank commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AmesingFlank commented May 11, 2026 •

edited

Loading