Benchmarks: Micro benchmark - add nvbench based kernel-launch, sleep-kernel & auto-throughput#750
Benchmarks: Micro benchmark - add nvbench based kernel-launch, sleep-kernel & auto-throughput#750WenqingLan1 wants to merge 49 commits into
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #750 +/- ##
==========================================
+ Coverage 85.69% 86.03% +0.34%
==========================================
Files 103 107 +4
Lines 7890 8113 +223
==========================================
+ Hits 6761 6980 +219
- Misses 1129 1133 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 25 out of 29 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 25 out of 29 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 25 out of 29 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 25 out of 29 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| cmake_minimum_required(VERSION 3.18) | ||
| project(nvbench_benchmarks LANGUAGES CUDA) | ||
|
|
||
| # Check if we have a recent enough CMake for nvbench (which requires 3.30.4) | ||
| if(CMAKE_VERSION VERSION_LESS "3.30.4") | ||
| message(STATUS "CMake version ${CMAKE_VERSION} is less than 3.30.4 (required by nvbench), skipping nvbench benchmarks") | ||
| return() | ||
| endif() |
There was a problem hiding this comment.
This CMakeLists declares project(... LANGUAGES CUDA) before checking the CMake version / CUDA availability. If this directory is configured on a machine without a CUDA toolchain (or when CMake < 3.30.4), configuration can fail before reaching the intended “skip” logic. Consider moving the CMake version guard above project() and using project(... LANGUAGES CXX) + include(cuda_common.cmake)/enable_language(CUDA) only inside the CUDAToolkit_FOUND branch.
| cd ./nvbandwidth && git apply ../nvbandwidth.patch && cp ../nvbandwidth_testcases_patched.h ./testcases_patched.h && cmake . && make && cd .. | ||
| cp -v ./nvbandwidth/nvbandwidth $(SB_MICRO_PATH)/bin | ||
|
|
||
| # Build nvbench |
There was a problem hiding this comment.
New cuda_nvbench target isn’t listed in the Makefile’s .PHONY targets. If a file/directory named cuda_nvbench exists, make cuda_nvbench may become a no-op. Add cuda_nvbench to the .PHONY list to ensure the recipe always runs.
| # Build nvbench | |
| # Build nvbench | |
| .PHONY: cuda_nvbench |
| def parse_time_to_us(raw: str) -> float: | ||
| """Parse a time string like '123.45 us' or '1.5 s' to float microseconds.""" | ||
| raw = raw.strip() | ||
| m = re.match(r'^([\d.]+)\s*([mun]?s)?$', raw) | ||
| if not m: | ||
| raise ValueError(f'Invalid time string: {raw!r}') | ||
| val, unit = float(m.group(1)), (m.group(2) or 'us') | ||
| if unit == 's': | ||
| return val * 1e6 |
| 'BlasLtBaseBenchmark', 'ComputationCommunicationOverlap', 'CpuMemBwLatencyBenchmark', 'CpuHplBenchmark', | ||
| 'CpuStreamBenchmark', 'CublasBenchmark', 'CublasLtBenchmark', 'CudaGemmFlopsBenchmark', 'CudaMemBwBenchmark', | ||
| 'CudaNcclBwBenchmark', 'CudnnBenchmark', 'DiskBenchmark', 'DistInference', 'HipBlasLtBenchmark', 'GPCNetBenchmark', | ||
| 'GemmFlopsBenchmark', 'GpuBurnBenchmark', 'GpuCopyBwBenchmark', 'GpuStreamBenchmark', 'IBBenchmark', | ||
| 'IBLoopbackBenchmark', 'KernelLaunch', 'MemBwBenchmark', 'MicroBenchmark', 'MicroBenchmarkWithInvoke', | ||
| 'ORTInferenceBenchmark', 'RocmGemmFlopsBenchmark', 'RocmMemBwBenchmark', 'ShardingMatmul', | ||
| 'TCPConnectivityBenchmark', 'TensorRTInferenceBenchmark', 'DirectXGPUEncodingLatency', 'DirectXGPUCopyBw', | ||
| 'DirectXGPUMemBw', 'DirectXGPUCoreFlops', 'NvBandwidthBenchmark', 'NvbenchKernelLaunch', 'NvbenchSleepKernel', | ||
| 'NvbenchAutoThroughput' |
| find_package(CUDAToolkit QUIET) | ||
| if (CUDAToolkit_FOUND) | ||
| include(../cuda_common.cmake) | ||
|
|
||
| # Try to find nvbench, but don't require it | ||
| find_package(nvbench CONFIG QUIET) | ||
|
|
||
| if (nvbench_FOUND) | ||
| message(STATUS "Found nvbench, building nvbench benchmarks") |
| uses: lukka/get-cmake@latest | ||
| with: | ||
| cmakeVersion: '3.20.0' |
This pull request adds support for NVBench-based GPU micro-benchmarks to SuperBench.
nvbench-sleep-kernelnvbench-kernel-launchnvbench-auto-throughputExample config: