Strip AMDGCN inline assembly before SPIR-V emission#1236
Draft
Strip AMDGCN inline assembly before SPIR-V emission#1236
Conversation
Adds a minimal standalone reproducer for the hipspv-link stage crash
that prevents test_hipcub_thread from building. hipCUB's
ThreadStore<STORE_CS/CG/WB> templates in thread_store.hpp emit
AMDGCN inline assembly of the form
asm volatile("flat_store_dword %0, %1 glc" : : "v"(ptr), "v"(val));
asm volatile("s_waitcnt vmcnt(%0)" : : "I"(0x00));
which chipStar's LLVM pass pipeline leaves intact in the lowered
bitcode. When SPIRV-LLVM-Translator walks that IR it hits the
InlineAsm callee in SPIRV::LLVMToSPIRVBase::transDirectCallInst
(lib/SPIRV/SPIRVWriter.cpp:5598), calls getCalledFunction() which
returns nullptr for InlineAsm callees, and segfaults dereferencing
it on the next line.
This test reproduces the exact pattern with a single __device__
helper mimicking HIPCUB_ASM_THREAD_STORE. On unpatched chipStar
the compile aborts with 'hipspv-link command failed due to signal'
and the identical stack trace (transDirectCallInst). After the
HipStripAMDGCNAsm fix the compile succeeds and the test prints
PASS.
hipCUB's ThreadStore<STORE_CS/CG/...> templates emit AMDGCN inline assembly (flat_store_dword glc, s_waitcnt, etc.) that SPIRV-LLVM-Translator does not support. When the IR reaches llvm-spirv, it null-derefs in transDirectCallInst (SPIRVWriter.cpp line 5598, CallInst::getCalledFunction() returns nullptr for InlineAsm). Add HipStripAMDGCNAsm LLVM pass that walks call instructions and replaces AMDGCN-mnemonic inline-asm with equivalent plain LLVM load/store/fence/no-op. These cache-hint modifiers have no SPIR-V equivalent and are performance hints only - the semantic store/load is preserved. Resolves: hipCUB test_hipcub_thread (ThreadStore<STORE_CS>) build crash.
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ThreadStore<STORE_CS/CG/...>emits AMDGCN inline assembly (flat_store_dword ... glc,s_waitcnt) that survives into IR compiled forspirv64.SPVWriter.cpp:5598(CI->getCalledFunction()is null forInlineAsm; the followingF->getName()segfaults) whenSPV_INTEL_inline_assemblyis not in chipStar's extension allow-list.HipStripAMDGCNAsmLLVM pass that removes AMDGCN inline-asm call sites (meaningless on Intel GPU) before SPIR-V emission. Wired intoHipPasses.cppbeforeHipSanityChecksPass.Impact
test_hipcub_threadbuild (all 100 gtest cases PASS on Arc A770 / Level Zero after fix).Reproducer
tests/regression/test_strip_amdgcn_asm.cpp— distills the failure to a kernel that containsflat_store_dwordinline asm and verifies it compiles + runs to completion.Pure LLVM IR reproducer (4 lines) for the upstream Translator crash is documented in the worktree TODO; optional upstream null-check PR is a separate track.
Test plan
test_strip_amdgcn_asmregression test passestest_hipcub_threadbuilds and all 100 gtest cases PASScheck.pyrun pre-merge