Skip to content

Dockerfile - Add cuda12.9 docker image#716

Merged
polarG merged 21 commits into
mainfrom
guzhao/add_cuda12.9
Jun 25, 2025
Merged

Dockerfile - Add cuda12.9 docker image#716
polarG merged 21 commits into
mainfrom
guzhao/add_cuda12.9

Conversation

@guoshzhao
Copy link
Copy Markdown
Contributor

Description
Add cuda 12.9 dockerfile and build in pipeline.

@guoshzhao guoshzhao requested a review from a team as a code owner May 27, 2025 16:51
@codecov
Copy link
Copy Markdown

codecov Bot commented May 27, 2025

Codecov Report

Attention: Patch coverage is 66.66667% with 1 line in your changes missing coverage. Please review.

Project coverage is 86.19%. Comparing base (b795477) to head (8763cb5).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...rbench/benchmarks/model_benchmarks/pytorch_base.py 66.66% 1 Missing ⚠️

❌ Your patch status has failed because the patch coverage (66.66%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #716      +/-   ##
==========================================
- Coverage   86.20%   86.19%   -0.01%     
==========================================
  Files         100      100              
  Lines        7261     7263       +2     
==========================================
+ Hits         6259     6260       +1     
- Misses       1002     1003       +1     
Flag Coverage Δ
cpu-python3.10-unit-test 71.65% <66.66%> (-0.01%) ⬇️
cpu-python3.12-unit-test 71.65% <66.66%> (-0.01%) ⬇️
cpu-python3.7-unit-test 71.24% <66.66%> (-0.01%) ⬇️
cuda-unit-test 83.65% <66.66%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@abuccts abuccts requested a review from dong0321 May 27, 2025 23:01
@guoshzhao guoshzhao requested a review from cp5555 as a code owner May 28, 2025 04:11
Copy link
Copy Markdown
Contributor

@polarG polarG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dpower4 dpower4 mentioned this pull request Jun 6, 2025
@polarG polarG enabled auto-merge (squash) June 25, 2025 04:33
@polarG polarG merged commit a56356d into main Jun 25, 2025
23 of 24 checks passed
@polarG polarG deleted the guzhao/add_cuda12.9 branch June 25, 2025 17:47
@guoshzhao guoshzhao mentioned this pull request Jul 2, 2025
40 tasks
polarG added a commit that referenced this pull request Aug 11, 2025
Description

Add release note for v0.12.0

# Main Features
## SuperBench Improvement
1. - [x] Update Image Build Pipeline (#659)
2. - [x] Add support for arm64 build (#660)
3. - [x] Upgrade dependency versions in pipeline (#671)
4. - [x] Fix installation and lint issues (#684)
5. - [x] Update Flake8 repo (#683)
6. - [x] Init latest python support. (#687)
7. - [x] Add image build on arm64 arch (#690)
8. - [x] Enhancement of ignoring errors for import pkg_resources (#692)
9. - [x] Update label in the ROCm image build (#693)
10. - [x] Support cuda12.8 for Blackwell arch (#682)
11. - [x] Merge multi-arch image (#696)
12. - [x] Update OS of runner to the latest. (#702)
13. - [x] cuda arch flag for cublaslt (#701)


## Micro-benchmark Improvement
1. - [x] Bug Fix - Fix numa error on grace cpu in gpu-copy (#658)
2. - [x] Dependency - Bump onnxruntime-gpu version from 1.10.0 to 1.12.0
(#663)
3. - [x] Benchmarks: micro benchmarks - add general CPU bandwidth and
latency benchmark (#662)
4. - [x] Benchmarks: micro benchmarks - add nvbandwidth build and
benchmark (#665 and #669)
5. - [x] Fix stderr message in gpu-copy benchmark (#673)
6. - [x] Add arch support for 10.0 in gemm-flops (#680)
7. - [x] Fix tensorrt-inference parsing (#674)
8. - [x] nvbandwidth benchmark need to handle N/A value (#675)
9. - [x] Avoid Unintended nvbandwidth Function Calls in All Benchmarks
(#685)
10. - [x] Add GPU Stream Micro Benchmark (#697)
11. - [x] Cuda arch flag for cublaslt (#701)
12. - [x] Support autotuning in cublaslt gemm (#706)
14. - [x] Add FP4 GEMM FLOPS support for cublaslt_gemm benchmark (#711)
15. - [x] CPU Stream Benchmark Revise (#712)
16. - [x] Add cuda12.9 docker image (#716)
17. - [x] Add Grace CPU support for CPU Stream (#719)


## Model Benchmark Improvement
1. - [x] Add LLaMA-2 Models (#668)
2. - [x] Fix typos in documentation and code files (#686)
3. - [x] Add Mixture of Experts Model (#679) 
4. - [ ] Add DeepSeek Training Benchmark
5. - [x] Add DeepSeek Inference Benchmark (AMD GPU) (#713)


## Documentation
1. - [x] Update CODEOWNERS (#670)
2. - [x] Update CODEOWNERS (#718)

## Result Analysis
1. - [x] Enhance logging information for diagnosis rule op baseline
errors. (#689)
polarG added a commit that referenced this pull request Aug 12, 2025
Description

Add release note for v0.12.0

# Main Features
## SuperBench Improvement
1. - [x] Update Image Build Pipeline (#659)
2. - [x] Add support for arm64 build (#660)
3. - [x] Upgrade dependency versions in pipeline (#671)
4. - [x] Fix installation and lint issues (#684)
5. - [x] Update Flake8 repo (#683)
6. - [x] Init latest python support. (#687)
7. - [x] Add image build on arm64 arch (#690)
8. - [x] Enhancement of ignoring errors for import pkg_resources (#692)
9. - [x] Update label in the ROCm image build (#693)
10. - [x] Support cuda12.8 for Blackwell arch (#682)
11. - [x] Merge multi-arch image (#696)
12. - [x] Update OS of runner to the latest. (#702)
13. - [x] cuda arch flag for cublaslt (#701)


## Micro-benchmark Improvement
1. - [x] Bug Fix - Fix numa error on grace cpu in gpu-copy (#658)
2. - [x] Dependency - Bump onnxruntime-gpu version from 1.10.0 to 1.12.0
(#663)
3. - [x] Benchmarks: micro benchmarks - add general CPU bandwidth and
latency benchmark (#662)
4. - [x] Benchmarks: micro benchmarks - add nvbandwidth build and
benchmark (#665 and #669)
5. - [x] Fix stderr message in gpu-copy benchmark (#673)
6. - [x] Add arch support for 10.0 in gemm-flops (#680)
7. - [x] Fix tensorrt-inference parsing (#674)
8. - [x] nvbandwidth benchmark need to handle N/A value (#675)
9. - [x] Avoid Unintended nvbandwidth Function Calls in All Benchmarks
(#685)
10. - [x] Add GPU Stream Micro Benchmark (#697)
11. - [x] Cuda arch flag for cublaslt (#701)
12. - [x] Support autotuning in cublaslt gemm (#706)
14. - [x] Add FP4 GEMM FLOPS support for cublaslt_gemm benchmark (#711)
15. - [x] CPU Stream Benchmark Revise (#712)
16. - [x] Add cuda12.9 docker image (#716)
17. - [x] Add Grace CPU support for CPU Stream (#719)


## Model Benchmark Improvement
1. - [x] Add LLaMA-2 Models (#668)
2. - [x] Fix typos in documentation and code files (#686)
3. - [x] Add Mixture of Experts Model (#679) 
4. - [ ] Add DeepSeek Training Benchmark
5. - [x] Add DeepSeek Inference Benchmark (AMD GPU) (#713)


## Documentation
1. - [x] Update CODEOWNERS (#670)
2. - [x] Update CODEOWNERS (#718)

## Result Analysis
1. - [x] Enhance logging information for diagnosis rule op baseline
errors. (#689)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants