ONNXRuntime: update to version 1.25.1 by smuzaffar · Pull Request #10516 · cms-sw/cmsdist

smuzaffar · 2026-04-29T07:28:08Z

This PR updates ONNXRuntime to latest version v1.25.1.

Sources are taken directly from upstream github repo
Drop cuda architecture 60 as compilation of https://github.com/microsoft/onnxruntime/blob/v1.25.1/onnxruntime/contrib_ops/cuda/quantization/matmul_8bits.cu failed with error [a] when cude arch 60 is used
CMS related changes are applied via a local patch
Spec and patch are moved to onnxruntime sub-directory

[a]

FAILED: [code=139] CMakeFiles/onnxruntime_providers_cuda.dir<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/onnxruntime/contrib_ops/cuda/quantization/matmul_8bits.cu.o
<path>/external/cuda/12.9.1-2f902b8cd69fc02665180a65ec16b3a4/bin/nvcc -forward-unknown-to-host-compiler \
-DCOMPILE_HOPPER_TMA_GEMMS -DCPUINFO_SUPPORTED -DCPUINFO_SUPPORTED_PLATFORM=1 \
-DEIGEN_MPL2_ONLY -DEIGEN_USE_THREADS -DENABLE_BF16 -DENABLE_CUDA_NHWC_OPS -DENABLE_DLPACK \
-DENABLE_FP4 -DENABLE_FP8 -DEXCLUDE_SM_100 -DEXCLUDE_SM_110 -DEXCLUDE_SM_120 -DEXCLUDE_SM_86 \
-DHAS_SM80_OR_LATER -DONLY_C_LOCALE=0 -DONNX_ML=1 -DONNX_NAMESPACE=onnx -DONNX_USE_LITE_PROTO=1 \
-DORT_ENABLE_STREAM -DPLATFORM_POSIX -DPROTOBUF_USE_DLLS -DUSE_CUDA=1 -DUSE_DX_INTEROP=0 \
-DUSE_FLASH_ATTENTION=1 -DUSE_FP8_KV_CACHE=1 -DUSE_MEMORY_EFFICIENT_ATTENTION=1 -D_GNU_SOURCE \
-D__ONNX_NO_DOC_STRINGS -Donnxruntime_providers_cuda_EXPORTS \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/include/onnxruntime \
-I/data/muz/onnx/w1/BUILD/el8_amd64_gcc13/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/include/onnxruntime/core/session \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release/_deps/pytorch_cpuinfo-src/include \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/onnxruntime \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release/_deps/gsl-src/include \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release/_deps/abseil_cpp-src \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release/_deps/date-src/include \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release/_deps/onnx-src \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release/_deps/onnx-build \
-I<path>/external/protobuf/3.21.9-b07b8f47dd1983d3cd0c0f051c28c6a1/include \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release/_deps/flatbuffers-src/include \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release/_deps/cutlass-src/include \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release/_deps/cutlass-src/examples \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release/_deps/cutlass-src/tools/util/include \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release/_deps/cudnn_frontend-src/include \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release/_deps/eigen3-src \
-isystem <path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release/_deps/safeint-src \
-isystem <path>/external/cuda/12.9.1-2f902b8cd69fc02665180a65ec16b3a4/include \
-isystem /data/muz/onnx/w1/el8_amd64_gcc13/external/cudnn/9.9.0.52-4fbb88df393a89af358e6a46743e3763/include \
-isystem <path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release/_deps/mp11-src/include \
-DTHRUST_IGNORE_DEPRECATED_API -DCUB_IGNORE_DEPRECATED_API \
-Wno-deprecated-gpu-targets --static-global-template-stub=false -cudart shared -Xfatbin=-compress-all \
--expt-relaxed-constexpr --Werror default-stream-launch -Xcudafe "--diag_suppress=bad_friend_decl" \
-Xcudafe "--diag_suppress=unsigned_compare_with_zero" -Xcudafe "--diag_suppress=expr_has_no_effect" \
-O3 -DNDEBUG -std=c++20 \
"--generate-code=arch=compute_60,code=[sm_60]" \
"--generate-code=arch=compute_70,code=[sm_70]" \
"--generate-code=arch=compute_75,code=[sm_75]" \
"--generate-code=arch=compute_80,code=[sm_80]" \
"--generate-code=arch=compute_89,code=[sm_89]" \
"--generate-code=arch=compute_90a,code=[sm_90a]" \
-Xcompiler=-fPIC -Xcudafe --diag_suppress=conversion_function_not_usable --compiler-options \
-Wall --compiler-options -Wno-deprecated-copy --compiler-options -Wno-nonnull-compare --compiler-options \
-Wno-interference-size -Xcompiler -Wno-nonnull-compare -Xcompiler -Wno-interference-size \
--threads 1 --diag-suppress=177 --static-global-template-stub=false --diag-suppress=221 -Xcompiler -Wno-reorder \
-Xcompiler -Wno-error=sign-compare -Xptxas=-w -Werror all-warnings \
-MD -MT CMakeFiles/onnxruntime_providers_cuda.dir<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/onnxruntime/contrib_ops/cuda/quantization/matmul_8bits.cu.o \
-MF CMakeFiles/onnxruntime_providers_cuda.dir<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/onnxruntime/contrib_ops/cuda/quantization/matmul_8bits.cu.o.d \
-x cu -c <path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/onnxruntime/contrib_ops/cuda/quantization/matmul_8bits.cu \
-o CMakeFiles/onnxruntime_providers_cuda.dir<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/onnxruntime/contrib_ops/cuda/quantization/matmul_8bits.cu.o
nvcc error   : 'ptxas' died due to signal 11 (Invalid memory reference)
nvcc error   : 'ptxas' core dumped

cmsbuild · 2026-04-29T07:28:36Z

A new Pull Request was created by @smuzaffar for branch IB/CMSSW_17_0_X/master.

@akritkbehera, @cmsbuild, @iarspider, @raoatifshad, @smuzaffar can you please review it and eventually sign? Thanks.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.
cms-bot commands are listed here

cmsbuild · 2026-04-29T07:28:37Z

cms-bot internal usage

smuzaffar · 2026-04-29T07:32:54Z

please test

smuzaffar · 2026-04-29T10:48:22Z

@fwyzard , looks like newer version of ONNXRuntime (ORT) fails for cuda arch 60 (Pascal). It only fails for one ORT source file. For now in this PR I propose to build ORT with cuda 6.x support.

smuzaffar · 2026-04-29T10:49:00Z

please test for el9_amd64_gcc14

cmsbuild · 2026-04-29T10:58:26Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0a93d4/52935/summary.html
COMMIT: 5efd317
CMSSW: CMSSW_17_0_X_2026-04-28-2300/el8_amd64_gcc13
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/10516/52935/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

You potentially removed 2 lines from the logs
Reco comparison results: 24 differences found in the comparisons
DQMHistoTests: Total files compared: 53
DQMHistoTests: Total histograms compared: 4186963
DQMHistoTests: Total failures: 35
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 4186908
DQMHistoTests: Total skipped: 20
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 52 files compared)
Checked 227 log files, 197 edm output root files, 53 DQM output files
TriggerResults: no differences found

Max Memory Comparisons exceeding threshold

@cms-sw/core-l2 , I found 42 workflow step(s) with memory usage exceeding the error threshold:

Expand to see workflows ...

Error: Workflow 4.53_RunPhoton2012B step3 max memory diff -200.3 exceeds +/- 90.0 MiB
Error: Workflow 9.0_Higgs200ChargedTaus step3 max memory diff -126.2 exceeds +/- 90.0 MiB
Error: Workflow 25.0_TTbar step3 max memory diff -118.0 exceeds +/- 90.0 MiB
Error: Workflow 135.4_ZEEFS_13 step3 max memory diff -157.2 exceeds +/- 90.0 MiB
Error: Workflow 136.731_RunSinglePh2016B step3 max memory diff -136.5 exceeds +/- 90.0 MiB
Error: Workflow 136.793_RunDoubleEG2017C step3 max memory diff -128.3 exceeds +/- 90.0 MiB
Error: Workflow 136.874_RunEGamma2018C step3 max memory diff -134.5 exceeds +/- 90.0 MiB
Error: Workflow 139.001_RunMinimumBias2021 step3 max memory diff -251.2 exceeds +/- 90.0 MiB
Error: Workflow 1306.0_SingleMuPt1_UP15 step3 max memory diff -177.8 exceeds +/- 90.0 MiB
Error: Workflow 1330.0_ZMM_13 step3 max memory diff -223.2 exceeds +/- 90.0 MiB
Error: Workflow 2022.0010001_RunTau2022D_10k step3 max memory diff -237.6 exceeds +/- 90.0 MiB
Error: Workflow 2023.0020001_RunJetMET02023D_10k step3 max memory diff -212.9 exceeds +/- 90.0 MiB
Error: Workflow 2024.0000001_RunZeroBias2024B_10k step3 max memory diff -258.3 exceeds +/- 90.0 MiB
Error: Workflow 2024.0010001_RunJetMET02024C_10k step3 max memory diff -237.7 exceeds +/- 90.0 MiB
Error: Workflow 2024.0020001_RunEGamma02024D_10k step3 max memory diff -187.1 exceeds +/- 90.0 MiB
Error: Workflow 2024.0030001_RunDisplacedJet2024E_10k step3 max memory diff -220.1 exceeds +/- 90.0 MiB
Error: Workflow 2024.0040001_RunPark2MuonLowMass02024F_10k step3 max memory diff -247.2 exceeds +/- 90.0 MiB
Error: Workflow 2024.0050001_RunBTagMu2024G_10k step3 max memory diff -229.4 exceeds +/- 90.0 MiB
Error: Workflow 2024.0060001_RunMuon02024H_10k step3 max memory diff -155.2 exceeds +/- 90.0 MiB
Error: Workflow 2024.0070001_RunTau2024I_10k step3 max memory diff -164.4 exceeds +/- 90.0 MiB
Error: Workflow 2025.0010001_RunJetMET02025C_10k step2 max memory diff -99.4 exceeds +/- 90.0 MiB
Error: Workflow 2025.0010001_RunJetMET02025C_10k step3 max memory diff -176.2 exceeds +/- 90.0 MiB
Error: Workflow 10224.0_TTbar_13+2017PU step3 max memory diff -139.5 exceeds +/- 90.0 MiB
Error: Workflow 11634.0_TTbar_14TeV+2022 step3 max memory diff -166.5 exceeds +/- 90.0 MiB
Error: Workflow 12434.0_TTbar_14TeV+2023 step3 max memory diff -178.9 exceeds +/- 90.0 MiB
Error: Workflow 12834.0_TTbar_14TeV+2024 step3 max memory diff -178.9 exceeds +/- 90.0 MiB
Error: Workflow 12846.0_ZEE_14+2024 step3 max memory diff -194.3 exceeds +/- 90.0 MiB
Error: Workflow 13034.0_TTbar_14TeV+2024PU step3 max memory diff -246.9 exceeds +/- 90.0 MiB
Error: Workflow 13234.0_TTbar_14TeV+2022FS step2 max memory diff -178.9 exceeds +/- 90.0 MiB
Error: Workflow 14034.0_TTbar_14TeV+2023FS step2 max memory diff -171.4 exceeds +/- 90.0 MiB
Error: Workflow 14234.0_TTbar_14TeV+2023FSPU step2 max memory diff -182.0 exceeds +/- 90.0 MiB
Error: Workflow 16834.0_TTbar_14TeV+2025 step3 max memory diff -242.8 exceeds +/- 90.0 MiB
Error: Workflow 17034.0_TTbar_14TeV+2025PU step3 max memory diff -246.3 exceeds +/- 90.0 MiB
Error: Workflow 18434.0_TTbar_14TeV+2026 step3 max memory diff -182.4 exceeds +/- 90.0 MiB
Error: Workflow 18634.0_TTbar_14TeV+2026PU step3 max memory diff -183.2 exceeds +/- 90.0 MiB
Error: Workflow 25202.0_TTbar_13 step3 max memory diff -135.3 exceeds +/- 90.0 MiB
Error: Workflow 34434.0_TTbar_14TeV+Run4D121 step3 max memory diff -216.0 exceeds +/- 90.0 MiB
Error: Workflow 34434.911_TTbar_14TeV+Run4D121_DD4hep step3 max memory diff -125.3 exceeds +/- 90.0 MiB
Error: Workflow 34496.0_CloseByPGun_CE_E_Front_120um+Run4D121 step3 max memory diff -168.6 exceeds +/- 90.0 MiB
Error: Workflow 34500.0_CloseByPGun_CE_H_Coarse_Scint+Run4D121 step3 max memory diff -167.5 exceeds +/- 90.0 MiB
Error: Workflow 34634.999_TTbar_14TeV+Run4D121PU_PMXS1S2PR step4 max memory diff -158.3 exceeds +/- 90.0 MiB
Error: Workflow 250202.181_TTbar13TeVPUppmx2018 step4 max memory diff -144.8 exceeds +/- 90.0 MiB

cmsbuild · 2026-04-29T11:01:31Z

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0a93d4/52940/summary.html
COMMIT: 5efd317
CMSSW: CMSSW_17_0_X_2026-04-28-1100/el9_amd64_gcc14
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/10516/52940/install.sh to create a dev area with all the needed externals and cmssw changes.

Failed External Build

I found compilation error when building:

++ mv /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el9_amd64_gcc14/external/onnxruntime/1.25.1-67747e4c1218bf29cba2862949bd66ba/cuda_gcc_supported.txt /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/cache/cuda_gcc_supported.txt
++ cat /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/cache/cuda_gcc_supported.txt
+ '[' true = true ']'
+ USE_CUDA=ON
/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.9OHiTp: line 69: syntax error near unexpected token `<<<'
error: Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.9OHiTp (%build)

RPM build warnings:
Macro expanded in comment on line 488: %{pkginstroot}/${PYTHON3_LIB_SITE_PACKAGES}

fwyzard · 2026-04-29T12:50:15Z

Drop cuda architecture 60 as compilation of https://github.com/microsoft/onnxruntime/blob/v1.25.1/onnxruntime/contrib_ops/cuda/quantization/matmul_8bits.cu failed with error [a] when cude arch 60 is used

We have not removed CUDA arch 6.0 from CMSSW (yet).

Should we do that, then ?

makortel · 2026-04-29T13:06:19Z

Drop cuda architecture 60 as compilation of https://github.com/microsoft/onnxruntime/blob/v1.25.1/onnxruntime/contrib_ops/cuda/quantization/matmul_8bits.cu failed with error [a] when cude arch 60 is used

We have not removed CUDA arch 6.0 from CMSSW (yet).

Should we do that, then ?

Dropping 6.0 in general now makes sense to me.

fwyzard · 2026-04-29T13:57:18Z

OK, let's re-remove Pascal globally.

smuzaffar · 2026-04-30T09:32:53Z

please test with #10493

cmsbuild · 2026-04-30T09:33:51Z

Pull request #10516 was updated.

smuzaffar · 2026-04-30T09:33:59Z

please test with #10493

cmsbuild · 2026-04-30T18:05:07Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0a93d4/52980/summary.html
COMMIT: 171135a
CMSSW: CMSSW_17_0_X_2026-04-30-1100/el8_amd64_gcc13
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/10516/52980/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0a93d4/52980/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0a93d4/52980/git-merge-result

Comparison Summary

Summary:

You potentially removed 2 lines from the logs
ROOTFileChecks: Some differences in event products or their sizes found
Reco comparison results: 17 differences found in the comparisons
DQMHistoTests: Total files compared: 53
DQMHistoTests: Total histograms compared: 4187168
DQMHistoTests: Total failures: 22
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 4187126
DQMHistoTests: Total skipped: 20
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 52 files compared)
Checked 227 log files, 197 edm output root files, 53 DQM output files
TriggerResults: found differences in 1 / 51 workflows

Max Memory Comparisons exceeding threshold

@cms-sw/core-l2 , I found 42 workflow step(s) with memory usage exceeding the error threshold:

Expand to see workflows ...

Error: Workflow 4.53_RunPhoton2012B step3 max memory diff -134.2 exceeds +/- 90.0 MiB
Error: Workflow 9.0_Higgs200ChargedTaus step3 max memory diff -118.0 exceeds +/- 90.0 MiB
Error: Workflow 25.0_TTbar step3 max memory diff -184.0 exceeds +/- 90.0 MiB
Error: Workflow 135.4_ZEEFS_13 step3 max memory diff -157.2 exceeds +/- 90.0 MiB
Error: Workflow 136.731_RunSinglePh2016B step3 max memory diff -134.5 exceeds +/- 90.0 MiB
Error: Workflow 136.793_RunDoubleEG2017C step3 max memory diff -123.1 exceeds +/- 90.0 MiB
Error: Workflow 136.874_RunEGamma2018C step3 max memory diff -118.0 exceeds +/- 90.0 MiB
Error: Workflow 139.001_RunMinimumBias2021 step3 max memory diff -251.2 exceeds +/- 90.0 MiB
Error: Workflow 1306.0_SingleMuPt1_UP15 step3 max memory diff -244.9 exceeds +/- 90.0 MiB
Error: Workflow 1330.0_ZMM_13 step3 max memory diff -161.3 exceeds +/- 90.0 MiB
Error: Workflow 2022.0010001_RunTau2022D_10k step3 max memory diff -237.7 exceeds +/- 90.0 MiB
Error: Workflow 2023.0020001_RunJetMET02023D_10k step3 max memory diff -155.2 exceeds +/- 90.0 MiB
Error: Workflow 2024.0000001_RunZeroBias2024B_10k step3 max memory diff -195.5 exceeds +/- 90.0 MiB
Error: Workflow 2024.0010001_RunJetMET02024C_10k step3 max memory diff -171.7 exceeds +/- 90.0 MiB
Error: Workflow 2024.0020001_RunEGamma02024D_10k step3 max memory diff -242.8 exceeds +/- 90.0 MiB
Error: Workflow 2024.0030001_RunDisplacedJet2024E_10k step3 max memory diff -220.1 exceeds +/- 90.0 MiB
Error: Workflow 2024.0040001_RunPark2MuonLowMass02024F_10k step3 max memory diff -177.1 exceeds +/- 90.0 MiB
Error: Workflow 2024.0050001_RunBTagMu2024G_10k step3 max memory diff -229.4 exceeds +/- 90.0 MiB
Error: Workflow 2024.0060001_RunMuon02024H_10k step3 max memory diff -148.6 exceeds +/- 90.0 MiB
Error: Workflow 2024.0070001_RunTau2024I_10k step3 max memory diff -229.4 exceeds +/- 90.0 MiB
Error: Workflow 2025.0010001_RunJetMET02025C_10k step3 max memory diff -242.2 exceeds +/- 90.0 MiB
Error: Workflow 2025.0010001_RunJetMET02025C_10k step2 max memory diff -99.4 exceeds +/- 90.0 MiB
Error: Workflow 10224.0_TTbar_13+2017PU step3 max memory diff -199.8 exceeds +/- 90.0 MiB
Error: Workflow 11634.0_TTbar_14TeV+2022 step3 max memory diff -166.5 exceeds +/- 90.0 MiB
Error: Workflow 12434.0_TTbar_14TeV+2023 step3 max memory diff -244.9 exceeds +/- 90.0 MiB
Error: Workflow 12834.0_TTbar_14TeV+2024 step3 max memory diff -181.0 exceeds +/- 90.0 MiB
Error: Workflow 12846.0_ZEE_14+2024 step3 max memory diff -256.2 exceeds +/- 90.0 MiB
Error: Workflow 13034.0_TTbar_14TeV+2024PU step3 max memory diff -239.7 exceeds +/- 90.0 MiB
Error: Workflow 13234.0_TTbar_14TeV+2022FS step2 max memory diff -185.0 exceeds +/- 90.0 MiB
Error: Workflow 14034.0_TTbar_14TeV+2023FS step2 max memory diff -179.6 exceeds +/- 90.0 MiB
Error: Workflow 14234.0_TTbar_14TeV+2023FSPU step2 max memory diff -247.9 exceeds +/- 90.0 MiB
Error: Workflow 16834.0_TTbar_14TeV+2025 step3 max memory diff -172.7 exceeds +/- 90.0 MiB
Error: Workflow 17034.0_TTbar_14TeV+2025PU step3 max memory diff -245.8 exceeds +/- 90.0 MiB
Error: Workflow 18434.0_TTbar_14TeV+2026 step3 max memory diff -248.4 exceeds +/- 90.0 MiB
Error: Workflow 18634.0_TTbar_14TeV+2026PU step3 max memory diff -187.7 exceeds +/- 90.0 MiB
Error: Workflow 25202.0_TTbar_13 step3 max memory diff -194.4 exceeds +/- 90.0 MiB
Error: Workflow 34434.0_TTbar_14TeV+Run4D121 step3 max memory diff -150.0 exceeds +/- 90.0 MiB
Error: Workflow 34434.911_TTbar_14TeV+Run4D121_DD4hep step3 max memory diff -133.5 exceeds +/- 90.0 MiB
Error: Workflow 34496.0_CloseByPGun_CE_E_Front_120um+Run4D121 step3 max memory diff -233.5 exceeds +/- 90.0 MiB
Error: Workflow 34500.0_CloseByPGun_CE_H_Coarse_Scint+Run4D121 step3 max memory diff -241.8 exceeds +/- 90.0 MiB
Error: Workflow 34634.999_TTbar_14TeV+Run4D121PU_PMXS1S2PR step4 max memory diff -215.0 exceeds +/- 90.0 MiB
Error: Workflow 250202.181_TTbar13TeVPUppmx2018 step4 max memory diff -144.8 exceeds +/- 90.0 MiB

smuzaffar · 2026-05-01T15:46:01Z

enable gpu

smuzaffar · 2026-05-01T15:46:09Z

please test with #10493

cmsbuild · 2026-05-01T21:52:45Z

-1

Failed Tests: RelVals nvidia_l40sUnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0a93d4/52996/summary.html
COMMIT: 171135a
CMSSW: CMSSW_17_0_X_2026-05-01-1100/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/10516/52996/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

@cms-sw Update tag for RecoTauTag-TrainingFiles to V00-11-00 #10524

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0a93d4/52996/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0a93d4/52996/git-merge-result

Failed RelVals

135.4135.4_ZEEFS_13/step1_ZEEFS_13.log
2025.00000022025.0000002_RunZeroBias2025B_10k/step2_RunZeroBias2025B_10k.log
9.09.0_Higgs200ChargedTaus/step1_Higgs200ChargedTaus.log

Expand to see more relval errors ...

AMD_MI300X Comparison Summary

Summary:

You potentially removed 6 lines from the logs
Reco comparison results: 354 differences found in the comparisons
DQMHistoTests: Total files compared: 13
DQMHistoTests: Total histograms compared: 216259
DQMHistoTests: Total failures: 37244
DQMHistoTests: Total nulls: 33
DQMHistoTests: Total successes: 178982
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 12 files compared)
Checked 49 log files, 50 edm output root files, 13 DQM output files
TriggerResults: no differences found

AMD_W7900 Comparison Summary

Summary:

You potentially added 9 lines to the logs
Reco comparison results: 341 differences found in the comparisons
DQMHistoTests: Total files compared: 13
DQMHistoTests: Total histograms compared: 216259
DQMHistoTests: Total failures: 34996
DQMHistoTests: Total nulls: 31
DQMHistoTests: Total successes: 181232
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 12 files compared)
Checked 49 log files, 50 edm output root files, 13 DQM output files
TriggerResults: found differences in 6 / 12 workflows

smuzaffar · 2026-05-02T07:10:14Z

please test with #10493

cmsbuild · 2026-05-03T01:19:25Z

-1

Failed Tests: RelVals-AMD_MI300X
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0a93d4/53015/summary.html
COMMIT: 171135a
CMSSW: CMSSW_17_0_X_2026-05-01-2300/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/10516/53015/install.sh to create a dev area with all the needed externals and cmssw changes.

Failed RelVals-AMD_MI300X

The relvals timed out after 4 hours.

Comparison Summary

Summary:

You potentially added 19 lines to the logs
ROOTFileChecks: Some differences in event products or their sizes found
Reco comparison results: 13 differences found in the comparisons
DQMHistoTests: Total files compared: 53
DQMHistoTests: Total histograms compared: 4187168
DQMHistoTests: Total failures: 3
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 4187145
DQMHistoTests: Total skipped: 20
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 52 files compared)
Checked 227 log files, 197 edm output root files, 53 DQM output files
TriggerResults: no differences found

AMD_W7900 Comparison Summary

Summary:

You potentially removed 1 lines from the logs
Reco comparison results: 336 differences found in the comparisons
DQMHistoTests: Total files compared: 13
DQMHistoTests: Total histograms compared: 216259
DQMHistoTests: Total failures: 41155
DQMHistoTests: Total nulls: 32
DQMHistoTests: Total successes: 175072
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 12 files compared)
Checked 49 log files, 50 edm output root files, 13 DQM output files
TriggerResults: found differences in 6 / 12 workflows

NVIDIA_H100 Comparison Summary

Summary:

You potentially added 74 lines to the logs
Reco comparison results: 343 differences found in the comparisons
DQMHistoTests: Total files compared: 13
DQMHistoTests: Total histograms compared: 216259
DQMHistoTests: Total failures: 32482
DQMHistoTests: Total nulls: 31
DQMHistoTests: Total successes: 183746
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 12 files compared)
Checked 49 log files, 50 edm output root files, 13 DQM output files
TriggerResults: found differences in 2 / 12 workflows

NVIDIA_L40S Comparison Summary

Summary:

You potentially removed 20 lines from the logs
Reco comparison results: 342 differences found in the comparisons
DQMHistoTests: Total files compared: 13
DQMHistoTests: Total histograms compared: 216259
DQMHistoTests: Total failures: 34378
DQMHistoTests: Total nulls: 30
DQMHistoTests: Total successes: 181851
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 12 files compared)
Checked 49 log files, 50 edm output root files, 13 DQM output files
TriggerResults: no differences found

Max Memory Comparisons exceeding threshold

@cms-sw/core-l2 , I found 42 workflow step(s) with memory usage exceeding the error threshold:

Expand to see workflows ...

Error: Workflow 4.53_RunPhoton2012B step3 max memory diff -201.5 exceeds +/- 90.0 MiB
Error: Workflow 9.0_Higgs200ChargedTaus step3 max memory diff -185.0 exceeds +/- 90.0 MiB
Error: Workflow 25.0_TTbar step3 max memory diff -127.4 exceeds +/- 90.0 MiB
Error: Workflow 135.4_ZEEFS_13 step3 max memory diff -158.1 exceeds +/- 90.0 MiB
Error: Workflow 136.731_RunSinglePh2016B step3 max memory diff -193.3 exceeds +/- 90.0 MiB
Error: Workflow 136.793_RunDoubleEG2017C step3 max memory diff -187.1 exceeds +/- 90.0 MiB
Error: Workflow 136.874_RunEGamma2018C step3 max memory diff -135.5 exceeds +/- 90.0 MiB
Error: Workflow 139.001_RunMinimumBias2021 step3 max memory diff -182.1 exceeds +/- 90.0 MiB
Error: Workflow 1306.0_SingleMuPt1_UP15 step3 max memory diff -178.9 exceeds +/- 90.0 MiB
Error: Workflow 1330.0_ZMM_13 step3 max memory diff -225.2 exceeds +/- 90.0 MiB
Error: Workflow 2022.0010001_RunTau2022D_10k step3 max memory diff -224.7 exceeds +/- 90.0 MiB
Error: Workflow 2023.0020001_RunJetMET02023D_10k step3 max memory diff -156.2 exceeds +/- 90.0 MiB
Error: Workflow 2024.0000001_RunZeroBias2024B_10k step3 max memory diff -192.3 exceeds +/- 90.0 MiB
Error: Workflow 2024.0010001_RunJetMET02024C_10k step3 max memory diff -172.7 exceeds +/- 90.0 MiB
Error: Workflow 2024.0020001_RunEGamma02024D_10k step3 max memory diff -235.6 exceeds +/- 90.0 MiB
Error: Workflow 2024.0030001_RunDisplacedJet2024E_10k step3 max memory diff -221.2 exceeds +/- 90.0 MiB
Error: Workflow 2024.0040001_RunPark2MuonLowMass02024F_10k step3 max memory diff -245.2 exceeds +/- 90.0 MiB
Error: Workflow 2024.0050001_RunBTagMu2024G_10k step3 max memory diff -164.4 exceeds +/- 90.0 MiB
Error: Workflow 2024.0060001_RunMuon02024H_10k step3 max memory diff -156.3 exceeds +/- 90.0 MiB
Error: Workflow 2024.0070001_RunTau2024I_10k step3 max memory diff -163.2 exceeds +/- 90.0 MiB
Error: Workflow 2025.0010001_RunJetMET02025C_10k step3 max memory diff -243.3 exceeds +/- 90.0 MiB
Error: Workflow 2025.0010001_RunJetMET02025C_10k step2 max memory diff -100.5 exceeds +/- 90.0 MiB
Error: Workflow 10224.0_TTbar_13+2017PU step3 max memory diff -200.7 exceeds +/- 90.0 MiB
Error: Workflow 11634.0_TTbar_14TeV+2022 step3 max memory diff -233.5 exceeds +/- 90.0 MiB
Error: Workflow 12434.0_TTbar_14TeV+2023 step3 max memory diff -179.9 exceeds +/- 90.0 MiB
Error: Workflow 12834.0_TTbar_14TeV+2024 step3 max memory diff -182.0 exceeds +/- 90.0 MiB
Error: Workflow 12846.0_ZEE_14+2024 step3 max memory diff -197.4 exceeds +/- 90.0 MiB
Error: Workflow 13034.0_TTbar_14TeV+2024PU step3 max memory diff -247.8 exceeds +/- 90.0 MiB
Error: Workflow 13234.0_TTbar_14TeV+2022FS step2 max memory diff -179.9 exceeds +/- 90.0 MiB
Error: Workflow 14034.0_TTbar_14TeV+2023FS step2 max memory diff -176.5 exceeds +/- 90.0 MiB
Error: Workflow 14234.0_TTbar_14TeV+2023FSPU step2 max memory diff -182.9 exceeds +/- 90.0 MiB
Error: Workflow 16834.0_TTbar_14TeV+2025 step3 max memory diff -239.7 exceeds +/- 90.0 MiB
Error: Workflow 17034.0_TTbar_14TeV+2025PU step3 max memory diff -180.9 exceeds +/- 90.0 MiB
Error: Workflow 18434.0_TTbar_14TeV+2026 step3 max memory diff -245.3 exceeds +/- 90.0 MiB
Error: Workflow 18634.0_TTbar_14TeV+2026PU step3 max memory diff -242.1 exceeds +/- 90.0 MiB
Error: Workflow 25202.0_TTbar_13 step3 max memory diff -203.6 exceeds +/- 90.0 MiB
Error: Workflow 34434.0_TTbar_14TeV+Run4D121 step3 max memory diff -151.1 exceeds +/- 90.0 MiB
Error: Workflow 34434.911_TTbar_14TeV+Run4D121_DD4hep step3 max memory diff -127.4 exceeds +/- 90.0 MiB
Error: Workflow 34496.0_CloseByPGun_CE_E_Front_120um+Run4D121 step3 max memory diff -168.6 exceeds +/- 90.0 MiB
Error: Workflow 34500.0_CloseByPGun_CE_H_Coarse_Scint+Run4D121 step3 max memory diff -168.6 exceeds +/- 90.0 MiB
Error: Workflow 34634.999_TTbar_14TeV+Run4D121PU_PMXS1S2PR step4 max memory diff -158.3 exceeds +/- 90.0 MiB
Error: Workflow 250202.181_TTbar13TeVPUppmx2018 step4 max memory diff -211.8 exceeds +/- 90.0 MiB

ONNXRuntime: update to version 1.25.1

5efd317

cmsbuild added tests-pending externals-pending pending-signatures orp-pending labels Apr 29, 2026

cmsbuild added tests-started and removed tests-pending labels Apr 29, 2026

cmsbuild added tests-approved and removed tests-started labels Apr 29, 2026

fwyzard mentioned this pull request Apr 29, 2026

Update GPU architectures #10493

Merged

cmsbuild added requires-external tests-started and removed tests-approved labels Apr 30, 2026

Fix CUDA architectures handling in spec file

171135a

cmsbuild added tests-pending and removed tests-started requires-external labels Apr 30, 2026

cmsbuild added tests-started requires-external and removed tests-pending labels Apr 30, 2026

cmsbuild added tests-approved and removed tests-started labels Apr 30, 2026

cmsbuild added tests-started and removed tests-approved labels May 1, 2026

cmsbuild added tests-rejected and removed tests-started labels May 1, 2026

cmsbuild added tests-started and removed tests-rejected labels May 2, 2026

cmsbuild added tests-rejected and removed tests-started labels May 3, 2026

Conversation

smuzaffar commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmsbuild commented Apr 29, 2026

Uh oh!

cmsbuild commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

smuzaffar commented Apr 29, 2026

Uh oh!

smuzaffar commented Apr 29, 2026

Uh oh!

smuzaffar commented Apr 29, 2026

Uh oh!

cmsbuild commented Apr 29, 2026

Comparison Summary

Max Memory Comparisons exceeding threshold

Uh oh!

cmsbuild commented Apr 29, 2026

Failed External Build

Uh oh!

fwyzard commented Apr 29, 2026

Uh oh!

makortel commented Apr 29, 2026

Uh oh!

fwyzard commented Apr 29, 2026

Uh oh!

smuzaffar commented Apr 30, 2026

Uh oh!

cmsbuild commented Apr 30, 2026

Uh oh!

smuzaffar commented Apr 30, 2026

Uh oh!

cmsbuild commented Apr 30, 2026

Comparison Summary

Max Memory Comparisons exceeding threshold

Uh oh!

smuzaffar commented May 1, 2026

Uh oh!

smuzaffar commented May 1, 2026

Uh oh!

cmsbuild commented May 1, 2026

Failed RelVals

AMD_MI300X Comparison Summary

AMD_W7900 Comparison Summary

Uh oh!

smuzaffar commented May 2, 2026

Uh oh!

cmsbuild commented May 3, 2026

Failed RelVals-AMD_MI300X

Comparison Summary

AMD_W7900 Comparison Summary

NVIDIA_H100 Comparison Summary

NVIDIA_L40S Comparison Summary

Max Memory Comparisons exceeding threshold

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

smuzaffar commented Apr 29, 2026 •

edited

Loading

cmsbuild commented Apr 29, 2026 •

edited

Loading