WIP Pallas atomic all launchers v2#2346
Draft
thcmbs wants to merge 2 commits into
Draft
Conversation
Two bugs made hl.atomic_* (and split-K matmul) produce garbage on Pallas:
1. Pallas pipelining does not populate output VMEM from HBM even when
input_output_aliases is set, so the atomic_add codegen reads garbage
from the output VMEM ref before adding the new value.
2. The reordered_kernel copies in_ref -> out_ref for inplace outputs on
every grid cell, overwriting accumulated state on multi-cell writes.
Fix: route atomic outputs through a VMEM scratch buffer.
- Detect atomic targets at the FX-graph level: backend.py walks every
DeviceIR graph for call_function nodes whose target is in
helion.language.atomic_ops, and maps the first-arg fake tensor back
to the launch arg index. Emitted as _atomic_indices=[...] kwarg.
- For each atomic output, build a pltpu.VMEM scratch sized to the
BlockSpec tile. bf16/f16 targets get an f32 scratch so per-cell sums
don't round.
- On the first cell of any "arbitrary" dim (unmapped grid dims with
size > 1), preload the scratch from HBM via convert_element_type.
On the last cell, commit it back. dimension_semantics is set to
"arbitrary" on those dims so cells serialise.
- The atomic-add codegen casts the value to the ref's dtype rather
than the target tensor's dtype, so f32 accumulation into a bf16
scratch stays in f32 until the final write-back.
- Wired into all three launchers (default, emit_pipeline, fori_loop)
via shared _wrap_atomic_accumulator / _apply_atomic_accumulator
helpers. Atomic outputs are added to skip_inplace_copy so the
reordered_kernel's HBM copy doesn't clobber the scratch.
Tests:
- test_atomic_ops.py: split_k_add / split_k_max / multi-output kernels,
parametrised across all three launchers, plus bf16 and various
split_k counts. Two SymInt-related TODOs left for the default
launcher on the bf16 + various-splits variants.
- test_examples.py: test_matmul_split_k unxfailed; previously failed
with InductorLoweringError, now passes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.