Skip to content

[Pallas] Attention perf: further reduce spillage from pre-loading Q, by loading Q in-loop and not pipelining it#2397

Draft
AmesingFlank wants to merge 1 commit into
mainfrom
AmesingFlank/stack/51
Draft

[Pallas] Attention perf: further reduce spillage from pre-loading Q, by loading Q in-loop and not pipelining it#2397
AmesingFlank wants to merge 1 commit into
mainfrom
AmesingFlank/stack/51

Conversation

@AmesingFlank
Copy link
Copy Markdown
Contributor

@AmesingFlank AmesingFlank commented May 11, 2026

Optimization found by claude, by comparing the current Helion kernel with this reference impl

Similar to #2373, but takes it one step further

Previously, the Helion attention kernel pre-loads q from q_view[tile_b, tile_m, :] before the device loop. This requires registers which persists across the entire loop, which results in more spillage that damages performance.

This PR modifies this by loading q_view[tile_b, tile_m, :] within the loop body (even though neither tile_b nor tile_m is part of the loop iteration). A small compiler change is needed here: when we decide whether or not to pipeline a tensor within a device loop, if the accessed block ids do not match any of the loop-itered block ids, then no need to pipeline it.

On TPU, this change improves TFLOPs from 653 TFLOPs to 660 TFLOPs.

…by loading Q in-loop and not pipelining it

stack-info: PR: #2397, branch: AmesingFlank/stack/51
@AmesingFlank AmesingFlank force-pushed the AmesingFlank/stack/51 branch from caff8b8 to 18e5642 Compare May 11, 2026 21:30
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 11, 2026
@AmesingFlank AmesingFlank requested a review from norx1991 May 11, 2026 21:36
@AmesingFlank AmesingFlank marked this pull request as draft May 11, 2026 21:54
@AmesingFlank AmesingFlank removed the request for review from norx1991 May 11, 2026 21:54
@AmesingFlank
Copy link
Copy Markdown
Contributor Author

CI failure, investigating

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant