Benchmarks: Micro benchmark - add nvbench based kernel-launch, sleep-kernel & auto-throughput by WenqingLan1 · Pull Request #750 · microsoft/superbenchmark

WenqingLan1 · 2025-10-09T23:12:33Z

This pull request adds support for NVBench-based GPU micro-benchmarks to SuperBench.

Integrated the NVBench submodule
Implemented three benchmarks
- nvbench-sleep-kernel
- nvbench-kernel-launch
- nvbench-auto-throughput
updated documentation and added example scripts

Example config:

version: v0.12
superbench:
  enable:
  # nvbench benchmarks
  - nvbench-sleep-kernel:single
  - nvbench-sleep-kernel:list
  - nvbench-sleep-kernel:range
  - nvbench-sleep-kernel:range-step
  - nvbench-kernel-launch
  - nvbench-auto-throughput
  - nvbench-auto-throughput:stride-list
  - nvbench-auto-throughput:stride-range
  var:
    default_local_mode: &default_local_mode
      modes:
      - name: local
        proc_num: 4
        prefix: CUDA_VISIBLE_DEVICES={proc_rank}
        parallel: yes
  benchmarks:
    nvbench-sleep-kernel:single:
      <<: *default_local_mode
      timeout: 300
      parameters:
        duration_us: "50"                   # Single value format
        timeout: 30
    nvbench-sleep-kernel:list:
      <<: *default_local_mode
      timeout: 300
      parameters:
        duration_us: "[25,50,75]"         # List format - no spaces after commas
        timeout: 30
    nvbench-sleep-kernel:range:
      <<: *default_local_mode
      timeout: 300
      parameters:
        duration_us: "[0:5]"           # Range format
        timeout: 30
    nvbench-sleep-kernel:range-step:
      <<: *default_local_mode
      timeout: 300
      parameters:
        duration_us: "[0:50:10]"         # Range with step format
        timeout: 30
    nvbench-kernel-launch:
      <<: *default_local_mode
      timeout: 300
    nvbench-auto-throughput:
      <<: *default_local_mode
      timeout: 600
      parameters:
        stride: "[1,2,4,8]"              # List format for stride
        block_size: "[128,256,512,1024]"  # List format for block size
    nvbench-auto-throughput:stride-list:
      <<: *default_local_mode
      timeout: 600
      parameters:
        stride: "[1,2,4,8]"              # List format
        block_size: "[256,512]"
    nvbench-auto-throughput:stride-range:
      <<: *default_local_mode
      timeout: 600
      parameters:
        stride: "[1:8:2]"                # Range with step format
        block_size: "256"                 # Single value format

codecov · 2025-10-10T20:44:21Z

Codecov Report

❌ Patch coverage is 98.20628% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.03%. Comparing base (932d9f6) to head (4c6113d).

Files with missing lines	Patch %	Lines
...rbench/benchmarks/micro_benchmarks/nvbench_base.py	97.91%	2 Missing ⚠️
...hmarks/micro_benchmarks/nvbench_auto_throughput.py	98.07%	1 Missing ⚠️
...enchmarks/micro_benchmarks/nvbench_sleep_kernel.py	97.67%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #750      +/-   ##
==========================================
+ Coverage   85.69%   86.03%   +0.34%     
==========================================
  Files         103      107       +4     
  Lines        7890     8113     +223     
==========================================
+ Hits         6761     6980     +219     
- Misses       1129     1133       +4

Flag	Coverage Δ
cpu-python3.10-unit-test	`71.18% <98.17%> (+0.75%)`	⬆️
cpu-python3.12-unit-test	`71.18% <98.17%> (+0.75%)`	⬆️
cpu-python3.7-unit-test	`70.64% <98.20%> (+0.78%)`	⬆️
cuda-unit-test	`83.99% <98.17%> (+0.39%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

Copilot reviewed 25 out of 29 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot

Pull request overview

Copilot reviewed 25 out of 29 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot

Pull request overview

Copilot reviewed 25 out of 29 changed files in this pull request and generated 7 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 25 out of 29 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-22T19:59:10Z

+cmake_minimum_required(VERSION 3.18)
+project(nvbench_benchmarks LANGUAGES CUDA)
+
+# Check if we have a recent enough CMake for nvbench (which requires 3.30.4)
+if(CMAKE_VERSION VERSION_LESS "3.30.4")
+  message(STATUS "CMake version ${CMAKE_VERSION} is less than 3.30.4 (required by nvbench), skipping nvbench benchmarks")
+  return()
+endif()


This CMakeLists declares project(... LANGUAGES CUDA) before checking the CMake version / CUDA availability. If this directory is configured on a machine without a CUDA toolchain (or when CMake < 3.30.4), configuration can fail before reaching the intended “skip” logic. Consider moving the CMake version guard above project() and using project(... LANGUAGES CXX) + include(cuda_common.cmake)/enable_language(CUDA) only inside the CUDAToolkit_FOUND branch.

Copilot · 2026-04-22T19:59:11Z

 	cd ./nvbandwidth && git apply ../nvbandwidth.patch && cp ../nvbandwidth_testcases_patched.h ./testcases_patched.h && cmake . && make && cd ..
 	cp -v ./nvbandwidth/nvbandwidth $(SB_MICRO_PATH)/bin
+
+# Build nvbench


New cuda_nvbench target isn’t listed in the Makefile’s .PHONY targets. If a file/directory named cuda_nvbench exists, make cuda_nvbench may become a no-op. Add cuda_nvbench to the .PHONY list to ensure the recipe always runs.

Suggested change

# Build nvbench

# Build nvbench

.PHONY: cuda_nvbench

Copilot

Pull request overview

Copilot reviewed 25 out of 29 changed files in this pull request and generated 4 comments.

+def parse_time_to_us(raw: str) -> float:
+    """Parse a time string like '123.45 us' or '1.5 s' to float microseconds."""
+    raw = raw.strip()
+    m = re.match(r'^([\d.]+)\s*([mun]?s)?$', raw)
+    if not m:
+        raise ValueError(f'Invalid time string: {raw!r}')
+    val, unit = float(m.group(1)), (m.group(2) or 'us')
+    if unit == 's':
+        return val * 1e6


+    'BlasLtBaseBenchmark', 'ComputationCommunicationOverlap', 'CpuMemBwLatencyBenchmark', 'CpuHplBenchmark',
+    'CpuStreamBenchmark', 'CublasBenchmark', 'CublasLtBenchmark', 'CudaGemmFlopsBenchmark', 'CudaMemBwBenchmark',
+    'CudaNcclBwBenchmark', 'CudnnBenchmark', 'DiskBenchmark', 'DistInference', 'HipBlasLtBenchmark', 'GPCNetBenchmark',
+    'GemmFlopsBenchmark', 'GpuBurnBenchmark', 'GpuCopyBwBenchmark', 'GpuStreamBenchmark', 'IBBenchmark',
+    'IBLoopbackBenchmark', 'KernelLaunch', 'MemBwBenchmark', 'MicroBenchmark', 'MicroBenchmarkWithInvoke',
+    'ORTInferenceBenchmark', 'RocmGemmFlopsBenchmark', 'RocmMemBwBenchmark', 'ShardingMatmul',
+    'TCPConnectivityBenchmark', 'TensorRTInferenceBenchmark', 'DirectXGPUEncodingLatency', 'DirectXGPUCopyBw',
+    'DirectXGPUMemBw', 'DirectXGPUCoreFlops', 'NvBandwidthBenchmark', 'NvbenchKernelLaunch', 'NvbenchSleepKernel',
+    'NvbenchAutoThroughput'


+find_package(CUDAToolkit QUIET)
+if (CUDAToolkit_FOUND)
+  include(../cuda_common.cmake)
+
+  # Try to find nvbench, but don't require it
+  find_package(nvbench CONFIG QUIET)
+
+  if (nvbench_FOUND)
+    message(STATUS "Found nvbench, building nvbench benchmarks")


+        uses: lukka/get-cmake@latest
+        with:
+          cmakeVersion: '3.20.0'


WenqingLan1 and others added 15 commits July 22, 2025 16:03

add nvbench kernel launch

741ee98

submodule update

0ae7864

init sleep kernel

35bfb61

Merge branch 'microsoft:main' into feat/third_party/nvbench

66b4786

Merge branch 'microsoft:main' into feat/third_party/nvbench

82aed0c

Merge branch 'microsoft:main' into feat/third_party/nvbench

24ee0a5

test sleep kernel

bd87f50

add sm 103

a663db6

add arg parsing logic

32fe197

Merge branch 'microsoft:main' into feat/third_party/nvbench

76562dc

add arg parsing tests

3eb5525

refactor

4785fe6

refine logic - remove gpu_id

1fb7c05

add doc

83c442c

refine regex & update nvbench submodule

4b274c4

WenqingLan1 requested a review from a team as a code owner October 9, 2025 23:12

WenqingLan1 added benchmarks SuperBench Benchmarks micro-benchmarks Micro Benchmark Test for SuperBench Benchmarks labels Oct 9, 2025

WenqingLan1 added 8 commits October 10, 2025 16:48

update cmake

0cf48bb

fix lint

5905647

fix lint

baa57c9

fix import

ecce2d9

fix

3a58ead

fix

d0d8773

fix

fbb5969

fix

f007745

WenqingLan1 added 3 commits October 10, 2025 21:23

fix

b6b6082

fix

0f2c838

fix

5bd20f6

polarG reviewed Mar 5, 2026

View reviewed changes

Comment thread dockerfile/rocm5.0.x.dockerfile

microsoft deleted a comment from Copilot AI Mar 10, 2026

WenqingLan1 added 2 commits March 10, 2026 10:54

resolve comments

0bde332

fix lint

7c456cf

Copilot AI review requested due to automatic review settings March 10, 2026 20:55

Copilot AI reviewed Mar 10, 2026

View reviewed changes

WenqingLan1 added 2 commits March 10, 2026 14:18

fix pipeline & resolve comments

9643150

fix lint

f1a3b6d

Copilot AI review requested due to automatic review settings March 10, 2026 21:31

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Comment thread tests/benchmarks/micro_benchmarks/test_nvbench_sleep_kernel.py Outdated

microsoft deleted a comment from Copilot AI Mar 10, 2026

fix test

fe48e35

WenqingLan1 changed the title ~~Benchmarks: Micro benchmark - add nvbench based kernel-launch & sleep-kernel~~ Benchmarks: Micro benchmark - add nvbench based kernel-launch, sleep-kernel & auto-throughput Mar 25, 2026

Merge branch 'microsoft:main' into feat/third_party/nvbench

e1e12d2

Copilot AI review requested due to automatic review settings April 8, 2026 20:27

Copilot started reviewing on behalf of WenqingLan1 April 8, 2026 20:29 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

WenqingLan1 and others added 2 commits April 22, 2026 11:04

Merge branch 'microsoft:main' into feat/third_party/nvbench

6fc5afb

resolve comments

e253b85

Copilot AI review requested due to automatic review settings April 22, 2026 19:50

Copilot started reviewing on behalf of WenqingLan1 April 22, 2026 19:51 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

WenqingLan1 added 2 commits May 13, 2026 11:44

Merge branch 'microsoft:main' into feat/third_party/nvbench

7dd26d3

Merge branch 'microsoft:main' into feat/third_party/nvbench

4c6113d

Copilot AI review requested due to automatic review settings May 20, 2026 18:48

Copilot started reviewing on behalf of WenqingLan1 May 20, 2026 18:48 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

Conversation

WenqingLan1 commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

WenqingLan1 commented Oct 9, 2025 •

edited

Loading

codecov Bot commented Oct 10, 2025 •

edited

Loading