Skip to content

ONNXRuntime: update to version 1.25.1#10516

Open
smuzaffar wants to merge 2 commits intoIB/CMSSW_17_0_X/masterfrom
onnxruntime-1.25.1
Open

ONNXRuntime: update to version 1.25.1#10516
smuzaffar wants to merge 2 commits intoIB/CMSSW_17_0_X/masterfrom
onnxruntime-1.25.1

Conversation

@smuzaffar
Copy link
Copy Markdown
Contributor

@smuzaffar smuzaffar commented Apr 29, 2026

This PR updates ONNXRuntime to latest version v1.25.1.

[a]

FAILED: [code=139] CMakeFiles/onnxruntime_providers_cuda.dir<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/onnxruntime/contrib_ops/cuda/quantization/matmul_8bits.cu.o
<path>/external/cuda/12.9.1-2f902b8cd69fc02665180a65ec16b3a4/bin/nvcc -forward-unknown-to-host-compiler \
-DCOMPILE_HOPPER_TMA_GEMMS -DCPUINFO_SUPPORTED -DCPUINFO_SUPPORTED_PLATFORM=1 \
-DEIGEN_MPL2_ONLY -DEIGEN_USE_THREADS -DENABLE_BF16 -DENABLE_CUDA_NHWC_OPS -DENABLE_DLPACK \
-DENABLE_FP4 -DENABLE_FP8 -DEXCLUDE_SM_100 -DEXCLUDE_SM_110 -DEXCLUDE_SM_120 -DEXCLUDE_SM_86 \
-DHAS_SM80_OR_LATER -DONLY_C_LOCALE=0 -DONNX_ML=1 -DONNX_NAMESPACE=onnx -DONNX_USE_LITE_PROTO=1 \
-DORT_ENABLE_STREAM -DPLATFORM_POSIX -DPROTOBUF_USE_DLLS -DUSE_CUDA=1 -DUSE_DX_INTEROP=0 \
-DUSE_FLASH_ATTENTION=1 -DUSE_FP8_KV_CACHE=1 -DUSE_MEMORY_EFFICIENT_ATTENTION=1 -D_GNU_SOURCE \
-D__ONNX_NO_DOC_STRINGS -Donnxruntime_providers_cuda_EXPORTS \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/include/onnxruntime \
-I/data/muz/onnx/w1/BUILD/el8_amd64_gcc13/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/include/onnxruntime/core/session \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release/_deps/pytorch_cpuinfo-src/include \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/onnxruntime \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release/_deps/gsl-src/include \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release/_deps/abseil_cpp-src \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release/_deps/date-src/include \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release/_deps/onnx-src \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release/_deps/onnx-build \
-I<path>/external/protobuf/3.21.9-b07b8f47dd1983d3cd0c0f051c28c6a1/include \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release/_deps/flatbuffers-src/include \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release/_deps/cutlass-src/include \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release/_deps/cutlass-src/examples \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release/_deps/cutlass-src/tools/util/include \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release/_deps/cudnn_frontend-src/include \
-I<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release/_deps/eigen3-src \
-isystem <path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release/_deps/safeint-src \
-isystem <path>/external/cuda/12.9.1-2f902b8cd69fc02665180a65ec16b3a4/include \
-isystem /data/muz/onnx/w1/el8_amd64_gcc13/external/cudnn/9.9.0.52-4fbb88df393a89af358e6a46743e3763/include \
-isystem <path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/build/Linux/Release/_deps/mp11-src/include \
-DTHRUST_IGNORE_DEPRECATED_API -DCUB_IGNORE_DEPRECATED_API \
-Wno-deprecated-gpu-targets --static-global-template-stub=false -cudart shared -Xfatbin=-compress-all \
--expt-relaxed-constexpr --Werror default-stream-launch -Xcudafe "--diag_suppress=bad_friend_decl" \
-Xcudafe "--diag_suppress=unsigned_compare_with_zero" -Xcudafe "--diag_suppress=expr_has_no_effect" \
-O3 -DNDEBUG -std=c++20 \
"--generate-code=arch=compute_60,code=[sm_60]" \
"--generate-code=arch=compute_70,code=[sm_70]" \
"--generate-code=arch=compute_75,code=[sm_75]" \
"--generate-code=arch=compute_80,code=[sm_80]" \
"--generate-code=arch=compute_89,code=[sm_89]" \
"--generate-code=arch=compute_90a,code=[sm_90a]" \
-Xcompiler=-fPIC -Xcudafe --diag_suppress=conversion_function_not_usable --compiler-options \
-Wall --compiler-options -Wno-deprecated-copy --compiler-options -Wno-nonnull-compare --compiler-options \
-Wno-interference-size -Xcompiler -Wno-nonnull-compare -Xcompiler -Wno-interference-size \
--threads 1 --diag-suppress=177 --static-global-template-stub=false --diag-suppress=221 -Xcompiler -Wno-reorder \
-Xcompiler -Wno-error=sign-compare -Xptxas=-w -Werror all-warnings \
-MD -MT CMakeFiles/onnxruntime_providers_cuda.dir<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/onnxruntime/contrib_ops/cuda/quantization/matmul_8bits.cu.o \
-MF CMakeFiles/onnxruntime_providers_cuda.dir<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/onnxruntime/contrib_ops/cuda/quantization/matmul_8bits.cu.o.d \
-x cu -c <path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/onnxruntime/contrib_ops/cuda/quantization/matmul_8bits.cu \
-o CMakeFiles/onnxruntime_providers_cuda.dir<path>/external/onnxruntime/1.25.1-846cc440c50afc678935ce37826f1bd0/onnxruntime-1.25.1/onnxruntime/contrib_ops/cuda/quantization/matmul_8bits.cu.o
nvcc error   : 'ptxas' died due to signal 11 (Invalid memory reference)
nvcc error   : 'ptxas' core dumped

@cmsbuild
Copy link
Copy Markdown
Contributor

A new Pull Request was created by @smuzaffar for branch IB/CMSSW_17_0_X/master.

@akritkbehera, @cmsbuild, @iarspider, @raoatifshad, @smuzaffar can you please review it and eventually sign? Thanks.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.
cms-bot commands are listed here

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Apr 29, 2026

cms-bot internal usage

@smuzaffar
Copy link
Copy Markdown
Contributor Author

please test

@smuzaffar
Copy link
Copy Markdown
Contributor Author

@fwyzard , looks like newer version of ONNXRuntime (ORT) fails for cuda arch 60 (Pascal). It only fails for one ORT source file. For now in this PR I propose to build ORT with cuda 6.x support.

@smuzaffar
Copy link
Copy Markdown
Contributor Author

please test for el9_amd64_gcc14

@cmsbuild
Copy link
Copy Markdown
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0a93d4/52935/summary.html
COMMIT: 5efd317
CMSSW: CMSSW_17_0_X_2026-04-28-2300/el8_amd64_gcc13
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/10516/52935/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

Max Memory Comparisons exceeding threshold

@cms-sw/core-l2 , I found 42 workflow step(s) with memory usage exceeding the error threshold:

Expand to see workflows ...
  • Error: Workflow 4.53_RunPhoton2012B step3 max memory diff -200.3 exceeds +/- 90.0 MiB
  • Error: Workflow 9.0_Higgs200ChargedTaus step3 max memory diff -126.2 exceeds +/- 90.0 MiB
  • Error: Workflow 25.0_TTbar step3 max memory diff -118.0 exceeds +/- 90.0 MiB
  • Error: Workflow 135.4_ZEEFS_13 step3 max memory diff -157.2 exceeds +/- 90.0 MiB
  • Error: Workflow 136.731_RunSinglePh2016B step3 max memory diff -136.5 exceeds +/- 90.0 MiB
  • Error: Workflow 136.793_RunDoubleEG2017C step3 max memory diff -128.3 exceeds +/- 90.0 MiB
  • Error: Workflow 136.874_RunEGamma2018C step3 max memory diff -134.5 exceeds +/- 90.0 MiB
  • Error: Workflow 139.001_RunMinimumBias2021 step3 max memory diff -251.2 exceeds +/- 90.0 MiB
  • Error: Workflow 1306.0_SingleMuPt1_UP15 step3 max memory diff -177.8 exceeds +/- 90.0 MiB
  • Error: Workflow 1330.0_ZMM_13 step3 max memory diff -223.2 exceeds +/- 90.0 MiB
  • Error: Workflow 2022.0010001_RunTau2022D_10k step3 max memory diff -237.6 exceeds +/- 90.0 MiB
  • Error: Workflow 2023.0020001_RunJetMET02023D_10k step3 max memory diff -212.9 exceeds +/- 90.0 MiB
  • Error: Workflow 2024.0000001_RunZeroBias2024B_10k step3 max memory diff -258.3 exceeds +/- 90.0 MiB
  • Error: Workflow 2024.0010001_RunJetMET02024C_10k step3 max memory diff -237.7 exceeds +/- 90.0 MiB
  • Error: Workflow 2024.0020001_RunEGamma02024D_10k step3 max memory diff -187.1 exceeds +/- 90.0 MiB
  • Error: Workflow 2024.0030001_RunDisplacedJet2024E_10k step3 max memory diff -220.1 exceeds +/- 90.0 MiB
  • Error: Workflow 2024.0040001_RunPark2MuonLowMass02024F_10k step3 max memory diff -247.2 exceeds +/- 90.0 MiB
  • Error: Workflow 2024.0050001_RunBTagMu2024G_10k step3 max memory diff -229.4 exceeds +/- 90.0 MiB
  • Error: Workflow 2024.0060001_RunMuon02024H_10k step3 max memory diff -155.2 exceeds +/- 90.0 MiB
  • Error: Workflow 2024.0070001_RunTau2024I_10k step3 max memory diff -164.4 exceeds +/- 90.0 MiB
  • Error: Workflow 2025.0010001_RunJetMET02025C_10k step2 max memory diff -99.4 exceeds +/- 90.0 MiB
  • Error: Workflow 2025.0010001_RunJetMET02025C_10k step3 max memory diff -176.2 exceeds +/- 90.0 MiB
  • Error: Workflow 10224.0_TTbar_13+2017PU step3 max memory diff -139.5 exceeds +/- 90.0 MiB
  • Error: Workflow 11634.0_TTbar_14TeV+2022 step3 max memory diff -166.5 exceeds +/- 90.0 MiB
  • Error: Workflow 12434.0_TTbar_14TeV+2023 step3 max memory diff -178.9 exceeds +/- 90.0 MiB
  • Error: Workflow 12834.0_TTbar_14TeV+2024 step3 max memory diff -178.9 exceeds +/- 90.0 MiB
  • Error: Workflow 12846.0_ZEE_14+2024 step3 max memory diff -194.3 exceeds +/- 90.0 MiB
  • Error: Workflow 13034.0_TTbar_14TeV+2024PU step3 max memory diff -246.9 exceeds +/- 90.0 MiB
  • Error: Workflow 13234.0_TTbar_14TeV+2022FS step2 max memory diff -178.9 exceeds +/- 90.0 MiB
  • Error: Workflow 14034.0_TTbar_14TeV+2023FS step2 max memory diff -171.4 exceeds +/- 90.0 MiB
  • Error: Workflow 14234.0_TTbar_14TeV+2023FSPU step2 max memory diff -182.0 exceeds +/- 90.0 MiB
  • Error: Workflow 16834.0_TTbar_14TeV+2025 step3 max memory diff -242.8 exceeds +/- 90.0 MiB
  • Error: Workflow 17034.0_TTbar_14TeV+2025PU step3 max memory diff -246.3 exceeds +/- 90.0 MiB
  • Error: Workflow 18434.0_TTbar_14TeV+2026 step3 max memory diff -182.4 exceeds +/- 90.0 MiB
  • Error: Workflow 18634.0_TTbar_14TeV+2026PU step3 max memory diff -183.2 exceeds +/- 90.0 MiB
  • Error: Workflow 25202.0_TTbar_13 step3 max memory diff -135.3 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.0_TTbar_14TeV+Run4D121 step3 max memory diff -216.0 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.911_TTbar_14TeV+Run4D121_DD4hep step3 max memory diff -125.3 exceeds +/- 90.0 MiB
  • Error: Workflow 34496.0_CloseByPGun_CE_E_Front_120um+Run4D121 step3 max memory diff -168.6 exceeds +/- 90.0 MiB
  • Error: Workflow 34500.0_CloseByPGun_CE_H_Coarse_Scint+Run4D121 step3 max memory diff -167.5 exceeds +/- 90.0 MiB
  • Error: Workflow 34634.999_TTbar_14TeV+Run4D121PU_PMXS1S2PR step4 max memory diff -158.3 exceeds +/- 90.0 MiB
  • Error: Workflow 250202.181_TTbar13TeVPUppmx2018 step4 max memory diff -144.8 exceeds +/- 90.0 MiB

@cmsbuild
Copy link
Copy Markdown
Contributor

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0a93d4/52940/summary.html
COMMIT: 5efd317
CMSSW: CMSSW_17_0_X_2026-04-28-1100/el9_amd64_gcc14
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/10516/52940/install.sh to create a dev area with all the needed externals and cmssw changes.

Failed External Build

I found compilation error when building:

++ mv /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el9_amd64_gcc14/external/onnxruntime/1.25.1-67747e4c1218bf29cba2862949bd66ba/cuda_gcc_supported.txt /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/cache/cuda_gcc_supported.txt
++ cat /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/cache/cuda_gcc_supported.txt
+ '[' true = true ']'
+ USE_CUDA=ON
/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.9OHiTp: line 69: syntax error near unexpected token `<<<'
error: Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.9OHiTp (%build)

RPM build warnings:
Macro expanded in comment on line 488: %{pkginstroot}/${PYTHON3_LIB_SITE_PACKAGES}




@fwyzard
Copy link
Copy Markdown
Contributor

fwyzard commented Apr 29, 2026

Drop cuda architecture 60 as compilation of https://github.com/microsoft/onnxruntime/blob/v1.25.1/onnxruntime/contrib_ops/cuda/quantization/matmul_8bits.cu failed with error [a] when cude arch 60 is used

We have not removed CUDA arch 6.0 from CMSSW (yet).

Should we do that, then ?

@makortel
Copy link
Copy Markdown
Contributor

Drop cuda architecture 60 as compilation of https://github.com/microsoft/onnxruntime/blob/v1.25.1/onnxruntime/contrib_ops/cuda/quantization/matmul_8bits.cu failed with error [a] when cude arch 60 is used

We have not removed CUDA arch 6.0 from CMSSW (yet).

Should we do that, then ?

Dropping 6.0 in general now makes sense to me.

@fwyzard
Copy link
Copy Markdown
Contributor

fwyzard commented Apr 29, 2026

OK, let's re-remove Pascal globally.

@smuzaffar
Copy link
Copy Markdown
Contributor Author

please test with #10493

@cmsbuild
Copy link
Copy Markdown
Contributor

Pull request #10516 was updated.

@smuzaffar
Copy link
Copy Markdown
Contributor Author

please test with #10493

@cmsbuild
Copy link
Copy Markdown
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0a93d4/52980/summary.html
COMMIT: 171135a
CMSSW: CMSSW_17_0_X_2026-04-30-1100/el8_amd64_gcc13
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/10516/52980/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0a93d4/52980/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0a93d4/52980/git-merge-result

Comparison Summary

Summary:

  • You potentially removed 2 lines from the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 17 differences found in the comparisons
  • DQMHistoTests: Total files compared: 53
  • DQMHistoTests: Total histograms compared: 4187168
  • DQMHistoTests: Total failures: 22
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4187126
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 52 files compared)
  • Checked 227 log files, 197 edm output root files, 53 DQM output files
  • TriggerResults: found differences in 1 / 51 workflows

Max Memory Comparisons exceeding threshold

@cms-sw/core-l2 , I found 42 workflow step(s) with memory usage exceeding the error threshold:

Expand to see workflows ...
  • Error: Workflow 4.53_RunPhoton2012B step3 max memory diff -134.2 exceeds +/- 90.0 MiB
  • Error: Workflow 9.0_Higgs200ChargedTaus step3 max memory diff -118.0 exceeds +/- 90.0 MiB
  • Error: Workflow 25.0_TTbar step3 max memory diff -184.0 exceeds +/- 90.0 MiB
  • Error: Workflow 135.4_ZEEFS_13 step3 max memory diff -157.2 exceeds +/- 90.0 MiB
  • Error: Workflow 136.731_RunSinglePh2016B step3 max memory diff -134.5 exceeds +/- 90.0 MiB
  • Error: Workflow 136.793_RunDoubleEG2017C step3 max memory diff -123.1 exceeds +/- 90.0 MiB
  • Error: Workflow 136.874_RunEGamma2018C step3 max memory diff -118.0 exceeds +/- 90.0 MiB
  • Error: Workflow 139.001_RunMinimumBias2021 step3 max memory diff -251.2 exceeds +/- 90.0 MiB
  • Error: Workflow 1306.0_SingleMuPt1_UP15 step3 max memory diff -244.9 exceeds +/- 90.0 MiB
  • Error: Workflow 1330.0_ZMM_13 step3 max memory diff -161.3 exceeds +/- 90.0 MiB
  • Error: Workflow 2022.0010001_RunTau2022D_10k step3 max memory diff -237.7 exceeds +/- 90.0 MiB
  • Error: Workflow 2023.0020001_RunJetMET02023D_10k step3 max memory diff -155.2 exceeds +/- 90.0 MiB
  • Error: Workflow 2024.0000001_RunZeroBias2024B_10k step3 max memory diff -195.5 exceeds +/- 90.0 MiB
  • Error: Workflow 2024.0010001_RunJetMET02024C_10k step3 max memory diff -171.7 exceeds +/- 90.0 MiB
  • Error: Workflow 2024.0020001_RunEGamma02024D_10k step3 max memory diff -242.8 exceeds +/- 90.0 MiB
  • Error: Workflow 2024.0030001_RunDisplacedJet2024E_10k step3 max memory diff -220.1 exceeds +/- 90.0 MiB
  • Error: Workflow 2024.0040001_RunPark2MuonLowMass02024F_10k step3 max memory diff -177.1 exceeds +/- 90.0 MiB
  • Error: Workflow 2024.0050001_RunBTagMu2024G_10k step3 max memory diff -229.4 exceeds +/- 90.0 MiB
  • Error: Workflow 2024.0060001_RunMuon02024H_10k step3 max memory diff -148.6 exceeds +/- 90.0 MiB
  • Error: Workflow 2024.0070001_RunTau2024I_10k step3 max memory diff -229.4 exceeds +/- 90.0 MiB
  • Error: Workflow 2025.0010001_RunJetMET02025C_10k step3 max memory diff -242.2 exceeds +/- 90.0 MiB
  • Error: Workflow 2025.0010001_RunJetMET02025C_10k step2 max memory diff -99.4 exceeds +/- 90.0 MiB
  • Error: Workflow 10224.0_TTbar_13+2017PU step3 max memory diff -199.8 exceeds +/- 90.0 MiB
  • Error: Workflow 11634.0_TTbar_14TeV+2022 step3 max memory diff -166.5 exceeds +/- 90.0 MiB
  • Error: Workflow 12434.0_TTbar_14TeV+2023 step3 max memory diff -244.9 exceeds +/- 90.0 MiB
  • Error: Workflow 12834.0_TTbar_14TeV+2024 step3 max memory diff -181.0 exceeds +/- 90.0 MiB
  • Error: Workflow 12846.0_ZEE_14+2024 step3 max memory diff -256.2 exceeds +/- 90.0 MiB
  • Error: Workflow 13034.0_TTbar_14TeV+2024PU step3 max memory diff -239.7 exceeds +/- 90.0 MiB
  • Error: Workflow 13234.0_TTbar_14TeV+2022FS step2 max memory diff -185.0 exceeds +/- 90.0 MiB
  • Error: Workflow 14034.0_TTbar_14TeV+2023FS step2 max memory diff -179.6 exceeds +/- 90.0 MiB
  • Error: Workflow 14234.0_TTbar_14TeV+2023FSPU step2 max memory diff -247.9 exceeds +/- 90.0 MiB
  • Error: Workflow 16834.0_TTbar_14TeV+2025 step3 max memory diff -172.7 exceeds +/- 90.0 MiB
  • Error: Workflow 17034.0_TTbar_14TeV+2025PU step3 max memory diff -245.8 exceeds +/- 90.0 MiB
  • Error: Workflow 18434.0_TTbar_14TeV+2026 step3 max memory diff -248.4 exceeds +/- 90.0 MiB
  • Error: Workflow 18634.0_TTbar_14TeV+2026PU step3 max memory diff -187.7 exceeds +/- 90.0 MiB
  • Error: Workflow 25202.0_TTbar_13 step3 max memory diff -194.4 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.0_TTbar_14TeV+Run4D121 step3 max memory diff -150.0 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.911_TTbar_14TeV+Run4D121_DD4hep step3 max memory diff -133.5 exceeds +/- 90.0 MiB
  • Error: Workflow 34496.0_CloseByPGun_CE_E_Front_120um+Run4D121 step3 max memory diff -233.5 exceeds +/- 90.0 MiB
  • Error: Workflow 34500.0_CloseByPGun_CE_H_Coarse_Scint+Run4D121 step3 max memory diff -241.8 exceeds +/- 90.0 MiB
  • Error: Workflow 34634.999_TTbar_14TeV+Run4D121PU_PMXS1S2PR step4 max memory diff -215.0 exceeds +/- 90.0 MiB
  • Error: Workflow 250202.181_TTbar13TeVPUppmx2018 step4 max memory diff -144.8 exceeds +/- 90.0 MiB

@smuzaffar
Copy link
Copy Markdown
Contributor Author

enable gpu

@smuzaffar
Copy link
Copy Markdown
Contributor Author

please test with #10493

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented May 1, 2026

-1

Failed Tests: RelVals nvidia_l40sUnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0a93d4/52996/summary.html
COMMIT: 171135a
CMSSW: CMSSW_17_0_X_2026-05-01-1100/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/10516/52996/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0a93d4/52996/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0a93d4/52996/git-merge-result

Failed RelVals

  • 135.4135.4_ZEEFS_13/step1_ZEEFS_13.log
  • 2025.00000022025.0000002_RunZeroBias2025B_10k/step2_RunZeroBias2025B_10k.log
  • 9.09.0_Higgs200ChargedTaus/step1_Higgs200ChargedTaus.log
Expand to see more relval errors ...

AMD_MI300X Comparison Summary

Summary:

AMD_W7900 Comparison Summary

Summary:

  • You potentially added 9 lines to the logs
  • Reco comparison results: 341 differences found in the comparisons
  • DQMHistoTests: Total files compared: 13
  • DQMHistoTests: Total histograms compared: 216259
  • DQMHistoTests: Total failures: 34996
  • DQMHistoTests: Total nulls: 31
  • DQMHistoTests: Total successes: 181232
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 12 files compared)
  • Checked 49 log files, 50 edm output root files, 13 DQM output files
  • TriggerResults: found differences in 6 / 12 workflows

@smuzaffar
Copy link
Copy Markdown
Contributor Author

please test with #10493

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented May 3, 2026

-1

Failed Tests: RelVals-AMD_MI300X
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0a93d4/53015/summary.html
COMMIT: 171135a
CMSSW: CMSSW_17_0_X_2026-05-01-2300/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/10516/53015/install.sh to create a dev area with all the needed externals and cmssw changes.

Failed RelVals-AMD_MI300X

The relvals timed out after 4 hours.

Comparison Summary

Summary:

  • You potentially added 19 lines to the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 13 differences found in the comparisons
  • DQMHistoTests: Total files compared: 53
  • DQMHistoTests: Total histograms compared: 4187168
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4187145
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 52 files compared)
  • Checked 227 log files, 197 edm output root files, 53 DQM output files
  • TriggerResults: no differences found

AMD_W7900 Comparison Summary

Summary:

NVIDIA_H100 Comparison Summary

Summary:

NVIDIA_L40S Comparison Summary

Summary:

Max Memory Comparisons exceeding threshold

@cms-sw/core-l2 , I found 42 workflow step(s) with memory usage exceeding the error threshold:

Expand to see workflows ...
  • Error: Workflow 4.53_RunPhoton2012B step3 max memory diff -201.5 exceeds +/- 90.0 MiB
  • Error: Workflow 9.0_Higgs200ChargedTaus step3 max memory diff -185.0 exceeds +/- 90.0 MiB
  • Error: Workflow 25.0_TTbar step3 max memory diff -127.4 exceeds +/- 90.0 MiB
  • Error: Workflow 135.4_ZEEFS_13 step3 max memory diff -158.1 exceeds +/- 90.0 MiB
  • Error: Workflow 136.731_RunSinglePh2016B step3 max memory diff -193.3 exceeds +/- 90.0 MiB
  • Error: Workflow 136.793_RunDoubleEG2017C step3 max memory diff -187.1 exceeds +/- 90.0 MiB
  • Error: Workflow 136.874_RunEGamma2018C step3 max memory diff -135.5 exceeds +/- 90.0 MiB
  • Error: Workflow 139.001_RunMinimumBias2021 step3 max memory diff -182.1 exceeds +/- 90.0 MiB
  • Error: Workflow 1306.0_SingleMuPt1_UP15 step3 max memory diff -178.9 exceeds +/- 90.0 MiB
  • Error: Workflow 1330.0_ZMM_13 step3 max memory diff -225.2 exceeds +/- 90.0 MiB
  • Error: Workflow 2022.0010001_RunTau2022D_10k step3 max memory diff -224.7 exceeds +/- 90.0 MiB
  • Error: Workflow 2023.0020001_RunJetMET02023D_10k step3 max memory diff -156.2 exceeds +/- 90.0 MiB
  • Error: Workflow 2024.0000001_RunZeroBias2024B_10k step3 max memory diff -192.3 exceeds +/- 90.0 MiB
  • Error: Workflow 2024.0010001_RunJetMET02024C_10k step3 max memory diff -172.7 exceeds +/- 90.0 MiB
  • Error: Workflow 2024.0020001_RunEGamma02024D_10k step3 max memory diff -235.6 exceeds +/- 90.0 MiB
  • Error: Workflow 2024.0030001_RunDisplacedJet2024E_10k step3 max memory diff -221.2 exceeds +/- 90.0 MiB
  • Error: Workflow 2024.0040001_RunPark2MuonLowMass02024F_10k step3 max memory diff -245.2 exceeds +/- 90.0 MiB
  • Error: Workflow 2024.0050001_RunBTagMu2024G_10k step3 max memory diff -164.4 exceeds +/- 90.0 MiB
  • Error: Workflow 2024.0060001_RunMuon02024H_10k step3 max memory diff -156.3 exceeds +/- 90.0 MiB
  • Error: Workflow 2024.0070001_RunTau2024I_10k step3 max memory diff -163.2 exceeds +/- 90.0 MiB
  • Error: Workflow 2025.0010001_RunJetMET02025C_10k step3 max memory diff -243.3 exceeds +/- 90.0 MiB
  • Error: Workflow 2025.0010001_RunJetMET02025C_10k step2 max memory diff -100.5 exceeds +/- 90.0 MiB
  • Error: Workflow 10224.0_TTbar_13+2017PU step3 max memory diff -200.7 exceeds +/- 90.0 MiB
  • Error: Workflow 11634.0_TTbar_14TeV+2022 step3 max memory diff -233.5 exceeds +/- 90.0 MiB
  • Error: Workflow 12434.0_TTbar_14TeV+2023 step3 max memory diff -179.9 exceeds +/- 90.0 MiB
  • Error: Workflow 12834.0_TTbar_14TeV+2024 step3 max memory diff -182.0 exceeds +/- 90.0 MiB
  • Error: Workflow 12846.0_ZEE_14+2024 step3 max memory diff -197.4 exceeds +/- 90.0 MiB
  • Error: Workflow 13034.0_TTbar_14TeV+2024PU step3 max memory diff -247.8 exceeds +/- 90.0 MiB
  • Error: Workflow 13234.0_TTbar_14TeV+2022FS step2 max memory diff -179.9 exceeds +/- 90.0 MiB
  • Error: Workflow 14034.0_TTbar_14TeV+2023FS step2 max memory diff -176.5 exceeds +/- 90.0 MiB
  • Error: Workflow 14234.0_TTbar_14TeV+2023FSPU step2 max memory diff -182.9 exceeds +/- 90.0 MiB
  • Error: Workflow 16834.0_TTbar_14TeV+2025 step3 max memory diff -239.7 exceeds +/- 90.0 MiB
  • Error: Workflow 17034.0_TTbar_14TeV+2025PU step3 max memory diff -180.9 exceeds +/- 90.0 MiB
  • Error: Workflow 18434.0_TTbar_14TeV+2026 step3 max memory diff -245.3 exceeds +/- 90.0 MiB
  • Error: Workflow 18634.0_TTbar_14TeV+2026PU step3 max memory diff -242.1 exceeds +/- 90.0 MiB
  • Error: Workflow 25202.0_TTbar_13 step3 max memory diff -203.6 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.0_TTbar_14TeV+Run4D121 step3 max memory diff -151.1 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.911_TTbar_14TeV+Run4D121_DD4hep step3 max memory diff -127.4 exceeds +/- 90.0 MiB
  • Error: Workflow 34496.0_CloseByPGun_CE_E_Front_120um+Run4D121 step3 max memory diff -168.6 exceeds +/- 90.0 MiB
  • Error: Workflow 34500.0_CloseByPGun_CE_H_Coarse_Scint+Run4D121 step3 max memory diff -168.6 exceeds +/- 90.0 MiB
  • Error: Workflow 34634.999_TTbar_14TeV+Run4D121PU_PMXS1S2PR step4 max memory diff -158.3 exceeds +/- 90.0 MiB
  • Error: Workflow 250202.181_TTbar13TeVPUppmx2018 step4 max memory diff -211.8 exceeds +/- 90.0 MiB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants