Skip to content

fix: isCuMemMapAllocated crashes on non-NVLS systems even with MSCCLPP_FORCE_DISABLE_NVLS=true#790

Merged
Binyang2014 merged 3 commits into
mainfrom
copilot/fix-runtimeerror-on-perlmutter-gpus
Apr 22, 2026
Merged

fix: isCuMemMapAllocated crashes on non-NVLS systems even with MSCCLPP_FORCE_DISABLE_NVLS=true#790
Binyang2014 merged 3 commits into
mainfrom
copilot/fix-runtimeerror-on-perlmutter-gpus

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 20, 2026

  • Fix isCuMemMapAllocated() to just return true/false without throwing when NVLS is not supported
  • Fix isNvlsSupported() caching bug where result/isChecked were never updated
  • Restore [[maybe_unused]] on result and isChecked statics — needed in HIP/ROCm env where CUDA_NVLS_API_AVAILABLE is not defined and the variables would otherwise be unused
  • Run linter (./tools/lint.sh)

…sSupported caching

Agent-Logs-Url: https://github.com/microsoft/mscclpp/sessions/f8417c19-e697-4a32-a298-9a958b9a16a1

Co-authored-by: Binyang2014 <9415966+Binyang2014@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix RuntimeError for MSCCLPP on Perlmutter A100 GPUs fix: isCuMemMapAllocated crashes on non-NVLS systems even with MSCCLPP_FORCE_DISABLE_NVLS=true Apr 20, 2026
Copilot AI requested a review from Binyang2014 April 20, 2026 00:39
Comment thread src/core/gpu_utils.cc Outdated
Comment thread src/core/gpu_utils.cc
@Binyang2014
Copy link
Copy Markdown
Contributor

/azp run mscclpp-ut

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@Binyang2014 Binyang2014 requested a review from a team April 22, 2026 16:08
@Binyang2014 Binyang2014 enabled auto-merge (squash) April 22, 2026 17:09
Copy link
Copy Markdown
Contributor

@caiomcbr caiomcbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Binyang2014 Binyang2014 merged commit e874bf1 into main Apr 22, 2026
30 of 34 checks passed
@Binyang2014 Binyang2014 deleted the copilot/fix-runtimeerror-on-perlmutter-gpus branch April 22, 2026 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

"RuntimeError: cuMemMap is used in env without NVLS support (mscclpp failure: InvalidUsage)" Error

4 participants