Skip to content

Strip AMDGCN inline assembly before SPIR-V emission#1236

Draft
pvelesko wants to merge 2 commits intomainfrom
chipstar-hipcub-threadstore-strip-amdgcn-asm
Draft

Strip AMDGCN inline assembly before SPIR-V emission#1236
pvelesko wants to merge 2 commits intomainfrom
chipstar-hipcub-threadstore-strip-amdgcn-asm

Conversation

@pvelesko
Copy link
Copy Markdown
Collaborator

Summary

  • hipCUB's ThreadStore<STORE_CS/CG/...> emits AMDGCN inline assembly (flat_store_dword ... glc, s_waitcnt) that survives into IR compiled for spirv64.
  • SPIRV-LLVM-Translator then crashes with a null-deref at SPVWriter.cpp:5598 (CI->getCalledFunction() is null for InlineAsm; the following F->getName() segfaults) when SPV_INTEL_inline_assembly is not in chipStar's extension allow-list.
  • Fix: new HipStripAMDGCNAsm LLVM pass that removes AMDGCN inline-asm call sites (meaningless on Intel GPU) before SPIR-V emission. Wired into HipPasses.cpp before HipSanityChecksPass.

Impact

  • Unblocks test_hipcub_thread build (all 100 gtest cases PASS on Arc A770 / Level Zero after fix).
  • Independent of any upstream SPIRV-LLVM-Translator fix (chipStar-side mitigation).

Reproducer

tests/regression/test_strip_amdgcn_asm.cpp — distills the failure to a kernel that contains flat_store_dword inline asm and verifies it compiles + runs to completion.

Pure LLVM IR reproducer (4 lines) for the upstream Translator crash is documented in the worktree TODO; optional upstream null-check PR is a separate track.

Test plan

  • test_strip_amdgcn_asm regression test passes
  • test_hipcub_thread builds and all 100 gtest cases PASS
  • Basic-kernel smoke test unaffected
  • Full check.py run pre-merge

Adds a minimal standalone reproducer for the hipspv-link stage crash
that prevents test_hipcub_thread from building. hipCUB's
ThreadStore<STORE_CS/CG/WB> templates in thread_store.hpp emit
AMDGCN inline assembly of the form

  asm volatile("flat_store_dword %0, %1 glc" : : "v"(ptr), "v"(val));
  asm volatile("s_waitcnt vmcnt(%0)" : : "I"(0x00));

which chipStar's LLVM pass pipeline leaves intact in the lowered
bitcode. When SPIRV-LLVM-Translator walks that IR it hits the
InlineAsm callee in SPIRV::LLVMToSPIRVBase::transDirectCallInst
(lib/SPIRV/SPIRVWriter.cpp:5598), calls getCalledFunction() which
returns nullptr for InlineAsm callees, and segfaults dereferencing
it on the next line.

This test reproduces the exact pattern with a single __device__
helper mimicking HIPCUB_ASM_THREAD_STORE. On unpatched chipStar
the compile aborts with 'hipspv-link command failed due to signal'
and the identical stack trace (transDirectCallInst). After the
HipStripAMDGCNAsm fix the compile succeeds and the test prints
PASS.
hipCUB's ThreadStore<STORE_CS/CG/...> templates emit AMDGCN
inline assembly (flat_store_dword glc, s_waitcnt, etc.) that
SPIRV-LLVM-Translator does not support. When the IR reaches
llvm-spirv, it null-derefs in transDirectCallInst (SPIRVWriter.cpp
line 5598, CallInst::getCalledFunction() returns nullptr for
InlineAsm).

Add HipStripAMDGCNAsm LLVM pass that walks call instructions
and replaces AMDGCN-mnemonic inline-asm with equivalent plain
LLVM load/store/fence/no-op. These cache-hint modifiers have
no SPIR-V equivalent and are performance hints only - the
semantic store/load is preserved.

Resolves: hipCUB test_hipcub_thread (ThreadStore<STORE_CS>) build crash.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant