[TPU][Pallas] relax tolerances and fix Pallas autotuning OOM in layer_norm by yarongmu-google · Pull Request #2272 · pytorch/helion

yarongmu-google · 2026-05-05T01:40:08Z

Fixes bfloat16 accuracy validation: Relaxes the rtol and atol from 1e-3 to 1e-2 in examples/layer_norm.py to prevent spurious validation failures caused by bfloat16 precision variations on TPUs.
Fixes VMEM Out-of-Memory during autotuning: The autotuner's default baseline config generation (block_sizes=[32, 128]) was attempting to allocate overly large chunks of the 10,240-element feature dimension into VMEM, causing the process to crash before tuning could even begin. Bypassed this by providing autotune_baseline_fn for both the forward and backward kernels.
Adds a safe fallback config: Added config=helion.Config(block_sizes=[32, 1024]) to the backward pass kernel to ensure a memory-safe configuration is available even if the search space fails to converge.

…st failure on bfloat16

test(examples): relax precision tolerances in layer_norm.py to fix te…

06f1187

…st failure on bfloat16

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 5, 2026

yarongmu-google marked this pull request as draft May 5, 2026 01:42

Provide feedback