-
Notifications
You must be signed in to change notification settings - Fork 373
compute-sanitizer reports invalid shared memory read in warpspeed DeviceScanKernel on Blackwell (sm_120a) #8288
Description
Is this a duplicate?
- I confirmed there appear to be no duplicate issues for this bug and that I agree to the Code of Conduct
Type of Bug
Something else
Component
CUB
Describe the bug
Running compute-sanitizer --tool memcheck against a program that calls cub::DeviceScan::ExclusiveSum (or thrust::exclusive_scan) on Blackwell GPU (sm_120a) produces the following error:
========= Invalid __shared__ read of size 16 bytes
========= at cp_async_bulk_cp_mask in cp_async_bulk.h:236
========= by thread (128,0,0) in block (0,0,0)
========= Address 0x400 is out of bounds
========= Saved host backtrace up to driver entry point at kernel launch time
========= ...
========= in function decode_page_headers ...
=========
========= ERROR SUMMARY: 1 error
A more full stack trace is available at this PR: rapidsai/cudf#21999
The access address is 0x400 (1024 bytes). The warpspeed kernel's dynamic shared memory allocation is 8064 bytes (smem_for_stages with policy {use_warpspeed=true, num_total_threads=352, items_per_thread=63, tile_size=8064}), so the address is in-bounds. The program produces correct results.
The error originates inside cub::detail::warpspeed::squadStoreBulkSync<int> → cub::detail::scan::kernelBody → device_scan_lookahead_body → squadDispatch → DeviceScanKernel.
How to Reproduce
The error occurs in a libcudf example run with compute-sanitizer and can be seen here: https://github.com/rapidsai/cudf/actions/runs/23918157779/job/69767002847?pr=21999#step:12:835
The PR rapidsai/cudf#21999 also contains additional information about reproducing the issue.
Unfortunately, we have not been able to create a standalone reproducer.
Expected behavior
compute-sanitizer to pass with no errors
Reproduction link
No response
Operating System
No response
nvidia-smi output
Fri Apr 3 17:11:46 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX PRO 6000 Blac... Off | 00000000:3D:00.0 Off | 0 |
| N/A 32C P8 31W / 600W | 0MiB / 97887MiB | 0% Default |
| | | Disabled |
NVCC version
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Tue_Dec_16_07:23:41_PM_PST_2025
Cuda compilation tools, release 13.1, V13.1.115
Build cuda_13.1.r13.1/compiler.37061995_0
$ compute-sanitizer --version
NVIDIA (R) Compute Sanitizer
Copyright (c) 2020-2025 NVIDIA Corporation
Version 2025.4.1.0 (build 37093031) (public-release)
Also fails on
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2026 NVIDIA Corporation
Built on Mon_Mar_02_09:52:23_PM_PST_2026
Cuda compilation tools, release 13.2, V13.2.51
Build cuda_13.2.r13.2/compiler.37434383_0
$ compute-sanitizer --version
NVIDIA (R) Compute Sanitizer
Copyright (c) 2020-2026 NVIDIA Corporation
Version 2026.1.0.0 (build 37182542) (public-release)