Skip to content

Update GPU wfs for 1601 after renumbering in cmssw#2663

Merged
smuzaffar merged 1 commit intocms-sw:masterfrom
VourMa:updateGPUWfsIn1601
Feb 10, 2026
Merged

Update GPU wfs for 1601 after renumbering in cmssw#2663
smuzaffar merged 1 commit intocms-sw:masterfrom
VourMa:updateGPUWfsIn1601

Conversation

@VourMa
Copy link
Copy Markdown
Contributor

@VourMa VourMa commented Jan 22, 2026

cms-sw/cmssw#49832 updated the numbering of a few workflows to adhere to the conventions of Alpaka workflows. This PR updates the corresponding hardcoded list for the cms-bot tests to avoid failures.

Edit: The aforementioned PR has been superseded by cms-sw/cmssw#49984, which still needs this update.

@cmsbuild
Copy link
Copy Markdown
Contributor

A new Pull Request was created by @VourMa for branch master.

@akritkbehera, @cmsbuild, @iarspider, @raoatifshad, @smuzaffar can you please review it and eventually sign? Thanks.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.
cms-bot commands are listed here

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Jan 22, 2026

cms-bot internal usage

@smuzaffar
Copy link
Copy Markdown
Contributor

enable gpu

@smuzaffar
Copy link
Copy Markdown
Contributor

please test

@cmsbuild
Copy link
Copy Markdown
Contributor

-1

Failed Tests: RelVals-AMD_W7900
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b29148/50846/summary.html
COMMIT: e22052e
CMSSW: CMSSW_16_1_X_2026-01-23-1100/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cms-bot/2663/50846/install.sh to create a dev area with all the needed externals and cmssw changes.

Failed RelVals-AMD_W7900

ValueError: Undefined workflows: 34634.712, 34634.713

Comparison Summary

Summary:

  • You potentially added 2 lines to the logs
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 52
  • DQMHistoTests: Total histograms compared: 4025536
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4025516
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 51 files compared)
  • Checked 222 log files, 193 edm output root files, 52 DQM output files
  • TriggerResults: no differences found

@VourMa
Copy link
Copy Markdown
Contributor Author

VourMa commented Jan 23, 2026

I think this needs to be tested together with cms-sw/cmssw#49832 cms-sw/cmssw#49984 to succeed (not sure whether this is possible in this repo, as it is for the cmssw one: cms-sw/cmssw#49832 (comment))

@mmusich
Copy link
Copy Markdown
Contributor

mmusich commented Jan 30, 2026

@VourMa there is a conflict to resolve now.

@cmsbuild
Copy link
Copy Markdown
Contributor

Pull request #2663 was updated.

@VourMa
Copy link
Copy Markdown
Contributor Author

VourMa commented Jan 30, 2026

@VourMa there is a conflict to resolve now.

Thanks for the heads-up, the conflict was now resolved. If the PR needs to be tested, I leave a note that this needs to be done in combination with cms-sw/cmssw#49984.

@mmusich
Copy link
Copy Markdown
Contributor

mmusich commented Jan 30, 2026

test parameters:

@mmusich
Copy link
Copy Markdown
Contributor

mmusich commented Jan 30, 2026

enable gpu

@mmusich
Copy link
Copy Markdown
Contributor

mmusich commented Jan 30, 2026

@cmsbuild, please test

@cmsbuild
Copy link
Copy Markdown
Contributor

-1

Failed Tests: RelVals-AMD_MI300X
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b29148/51020/summary.html
COMMIT: 686c996
CMSSW: CMSSW_16_1_X_2026-01-30-1100/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cms-bot/2663/51020/install.sh to create a dev area with all the needed externals and cmssw changes.

Failed RelVals-AMD_MI300X

  • 34634.71334634.713_TTbar_14TeV+Run4D121PU_lstOnGPUIters01TrackingOnlyAlpakaValidationLST/step2_TTbar_14TeV+Run4D121PU_lstOnGPUIters01TrackingOnlyAlpakaValidationLST.log
  • 34634.71234634.712_TTbar_14TeV+Run4D121PU_lstOnGPUIters01TrackingOnly/step2_TTbar_14TeV+Run4D121PU_lstOnGPUIters01TrackingOnly.log
  • 34634.40334634.403_TTbar_14TeV+Run4D121PU_Patatrack_PixelOnlyAlpaka_Validation/step2_TTbar_14TeV+Run4D121PU_Patatrack_PixelOnlyAlpaka_Validation.log
Expand to see more relval errors ...

Comparison Summary

Summary:

  • You potentially added 3 lines to the logs
  • Reco comparison results: 11 differences found in the comparisons
  • DQMHistoTests: Total files compared: 52
  • DQMHistoTests: Total histograms compared: 4028550
  • DQMHistoTests: Total failures: 9
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4028521
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 51 files compared)
  • Checked 222 log files, 193 edm output root files, 52 DQM output files
  • TriggerResults: no differences found

@smuzaffar
Copy link
Copy Markdown
Contributor

please test

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Feb 5, 2026

REMINDER @sextonkennedy, @mandrenguyen, @ftenchini: This PR was tested with cms-sw/cmssw#49984, please check if they should be merged together

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Feb 5, 2026

-1

Failed Tests: RelVals-NVIDIA_H100
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b29148/51123/summary.html
COMMIT: 686c996
CMSSW: CMSSW_16_1_X_2026-02-04-2300/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cms-bot/2663/51123/install.sh to create a dev area with all the needed externals and cmssw changes.

Failed RelVals-NVIDIA_H100

  • 34634.71334634.713_TTbar_14TeV+Run4D121PU_lstOnGPUIters01TrackingOnlyAlpakaValidationLST/step2_TTbar_14TeV+Run4D121PU_lstOnGPUIters01TrackingOnlyAlpakaValidationLST.log
  • 34634.71234634.712_TTbar_14TeV+Run4D121PU_lstOnGPUIters01TrackingOnly/step2_TTbar_14TeV+Run4D121PU_lstOnGPUIters01TrackingOnly.log
  • 34634.40334634.403_TTbar_14TeV+Run4D121PU_Patatrack_PixelOnlyAlpaka_Validation/step2_TTbar_14TeV+Run4D121PU_Patatrack_PixelOnlyAlpaka_Validation.log
Expand to see more relval errors ...

Comparison Summary

Summary:

  • No significant changes to the logs found
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 9 differences found in the comparisons
  • DQMHistoTests: Total files compared: 52
  • DQMHistoTests: Total histograms compared: 4029600
  • DQMHistoTests: Total failures: 11
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4029569
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 51 files compared)
  • Checked 222 log files, 193 edm output root files, 52 DQM output files
  • TriggerResults: no differences found

@VourMa
Copy link
Copy Markdown
Contributor Author

VourMa commented Feb 5, 2026

This also needs cms-sw/cmssw#50039 to pass the tests.

@mandrenguyen
Copy link
Copy Markdown
Contributor

test parameters:

pull_request = cms-sw/cmssw#49984, cms-sw/cmssw#50039
workflows_gpu = 34434.712, 34434.713
relvals_opt = -w upgrade,standard
relvals_opt_gpu = -w upgrade,standard

@mandrenguyen
Copy link
Copy Markdown
Contributor

please test

@smuzaffar
Copy link
Copy Markdown
Contributor

please test

lets re-run tests. There was a bug in bot due to which bot pr changes were not properly merged during gpu tests, #2676 should fix that issue

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Feb 9, 2026

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b29148/51189/summary.html
COMMIT: 686c996
CMSSW: CMSSW_16_1_X_2026-02-09-1100/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cms-bot/2663/51189/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 7155 differences found in the comparisons
  • DQMHistoTests: Total files compared: 52
  • DQMHistoTests: Total histograms compared: 4031484
  • DQMHistoTests: Total failures: 19382
  • DQMHistoTests: Total nulls: 21
  • DQMHistoTests: Total successes: 4012061
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 51 files compared)
  • Checked 222 log files, 193 edm output root files, 52 DQM output files
  • TriggerResults: found differences in 3 / 50 workflows

AMD_MI300X Comparison Summary

Summary:

AMD_W7900 Comparison Summary

Summary:

NVIDIA_H100 Comparison Summary

Summary:

NVIDIA_L40S Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 1
  • DQMHistoTests: Total histograms compared: 0
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 0
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0 KiB( 0 files compared)
  • Checked 0 log files, 0 edm output root files, 1 DQM output files

Max Memory Comparisons exceeding threshold NVIDIA_H100

@cms-sw/core-l2 , I found 6 workflow step(s) with memory usage exceeding the error threshold:

Expand to see workflows ...
  • Error: Workflow 34634.402_TTbar_14TeV+Run4D121PU_Patatrack_PixelOnlyAlpaka step3 max memory diff -168.4 exceeds +/- 90.0 MiB
  • Error: Workflow 34634.402_TTbar_14TeV+Run4D121PU_Patatrack_PixelOnlyAlpaka step2 max memory diff -234.9 exceeds +/- 90.0 MiB
  • Error: Workflow 34634.403_TTbar_14TeV+Run4D121PU_Patatrack_PixelOnlyAlpaka_Validation step2 max memory diff -235.4 exceeds +/- 90.0 MiB
  • Error: Workflow 34634.403_TTbar_14TeV+Run4D121PU_Patatrack_PixelOnlyAlpaka_Validation step3 max memory diff -172.5 exceeds +/- 90.0 MiB
  • Error: Workflow 34634.404_TTbar_14TeV+Run4D121PU_Patatrack_PixelOnlyAlpaka_Profiling step2 max memory diff -235.2 exceeds +/- 90.0 MiB
  • Error: Workflow 34634.751_TTbar_14TeV+Run4D121PU_HLT75e33TimingAlpaka step2 max memory diff 265.4 exceeds +/- 90.0 MiB

@mmusich
Copy link
Copy Markdown
Contributor

mmusich commented Feb 10, 2026

@smuzaffar @mandrenguyen @ftenchini now that cms-sw/cmssw#49984 is merged, I think we should merge also this one.

@smuzaffar
Copy link
Copy Markdown
Contributor

+externals

@smuzaffar smuzaffar merged commit 3233f40 into cms-sw:master Feb 10, 2026
23 checks passed
@cmsbuild
Copy link
Copy Markdown
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @mandrenguyen, @ftenchini, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants