Skip to content

ThreadStore: guard AMDGCN inline asm with __HIP_PLATFORM_SPIRV__#1

Merged
pvelesko merged 1 commit intomainfrom
fix-threadstore-spirv-guard
Apr 15, 2026
Merged

ThreadStore: guard AMDGCN inline asm with __HIP_PLATFORM_SPIRV__#1
pvelesko merged 1 commit intomainfrom
fix-threadstore-spirv-guard

Conversation

@pvelesko
Copy link
Copy Markdown

Summary

AMDGCN inline-asm specializations in HIPCUB_ASM_THREAD_STORE (flat_store_*, s_waitcnt) are meaningless on non-AMD targets. On SPIR-V targets (chipStar) they survive into the IR and crash SPIRV-LLVM-Translator at SPVWriter.cpp:5598 with a null-deref on InlineAsm call sites when SPV_INTEL_inline_assembly is not in the translator's extension allow-list.

One-line fix: gate the existing #if HIPCUB_THREAD_STORE_USE_CACHE_MODIFIERS == 1 block on !defined(__HIP_PLATFORM_SPIRV__) so SPIR-V falls back to the already-present __builtin_memcpy-based AsmThreadStore.

Behavior

  • AMD: unchanged
  • NVIDIA: unchanged (already uses separate backend)
  • SPIR-V (chipStar): AsmThreadStore resolves to the generic __builtin_memcpy fallback instead of emitting flat_store_dword ... glc

Context

This is the library-side fix that complements chipStar PR CHIP-SPV/chipStar#1236, which strips AMDGCN inline asm in a chipStar LLVM pass as a defense-in-depth backstop. This PR is the right long-term fix for hipCUB specifically.

Test plan

  • test_hipcub_thread builds and all 100 gtest cases PASS on Arc A770 / Level Zero (chipStar)
  • AMD backend unaffected — regression-test on a ROCm GPU before merge

The AMDGCN inline-asm specializations (flat_store_*, s_waitcnt) in
HIPCUB_ASM_THREAD_STORE are meaningless on non-AMD targets. On SPIR-V
targets (chipStar) they survive into the IR and crash SPIRV-LLVM-Translator
at SPVWriter.cpp:5598 with a null-deref on InlineAsm call sites when
SPV_INTEL_inline_assembly is not in the translator's extension allow-list.

Gate the asm block on !defined(__HIP_PLATFORM_SPIRV__) so SPIR-V targets
fall back to the existing __builtin_memcpy implementation of AsmThreadStore.
AMD behavior is unchanged.

Unblocks test_hipcub_thread under chipStar (all 100 gtest cases now pass
on Arc A770 / Level Zero).
@pvelesko pvelesko marked this pull request as ready for review April 15, 2026 06:20
@pvelesko pvelesko merged commit 4068c07 into main Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant