Skip to content

TRK CPU vs. GPU path and infrastructure for Phase 2 HLT#50336

Merged
cmsbuild merged 2 commits intocms-sw:masterfrom
SegmentLinking:GPUValidationForPhase2HLTTRK
Mar 23, 2026
Merged

TRK CPU vs. GPU path and infrastructure for Phase 2 HLT#50336
cmsbuild merged 2 commits intocms-sw:masterfrom
SegmentLinking:GPUValidationForPhase2HLTTRK

Conversation

@VourMa
Copy link
Copy Markdown
Contributor

@VourMa VourMa commented Mar 6, 2026

This PR introduces CPU vs. GPU validation for the Phase 2 HLT, implementing the comparison for the tracking sequence. The relevant workflow is 0.7503.

The approach has been presented at a recent HLT Upgrade meeting, and has been updated according to the feedback received there, i.e.:

  • No new menu: the path is added to the "regular" menu;
  • No new validation sequence: A DQM sequence is run instead.
  • Instead of the hltInitialStepTracks, tracks produced directly from the seeds given to the building algorithm are used;
  • Monitoring of the pixel tracks SoA-s and the harvesting step for the track-to-track comparisons have been added;
  • The procModifier has been generalized to apply to all HLT heterogeneous comparisons, modifying all reconstruction sequences to add the serial sync modules.
Here is the new DQM output (click to show): image image image

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Mar 6, 2026

cms-bot internal usage

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Mar 6, 2026

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50336/48426

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Mar 6, 2026

A new Pull Request was created by @VourMa for master.

It involves the following packages:

  • Configuration/EventContent (operations)
  • Configuration/ProcessModifiers (operations)
  • Configuration/PyReleaseValidation (pdmv)
  • DQM/TrackingMonitorSource (dqm)
  • HLTrigger/Configuration (hlt)
  • HLTriggerOffline/Common (dqm)
  • Validation/Configuration (dqm, simulation)
  • Validation/RecoTrack (dqm)

@AdrianoDee, @DickyChant, @Martin-Grunewald, @antoniovagnerini, @civanch, @cmsbuild, @ctarricone, @davidlange6, @fabiocos, @ftenchini, @gabrielmscampos, @kpedro88, @mandrenguyen, @mdhildreth, @miquork, @mmusich, @nothingface0, @rseidita can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @Martin-Grunewald, @SohamBhattacharya, @VinInn, @VourMa, @apsallid, @arossi83, @denizsun, @dgulhan, @elusian, @fabiocos, @felicepantaleo, @fioriNTU, @idebruyn, @jandrea, @makortel, @missirol, @mmasciov, @mmusich, @mtosi, @richa2710, @rovere, @salimcerci, @slomeo, @sroychow, @threus, @wmtford this is something you requested to watch as well.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@mmusich
Copy link
Copy Markdown
Contributor

mmusich commented Mar 9, 2026

test parameters:

  • enable_tests = gpu, hlt_p2_integration, hlt_p2_timing
  • workflows_gpu = 34434.7503
  • workflows = ph2_hlt
  • relvals_opt = -w upgrade,standard
  • relvals_opt_gpu = -w upgrade,standard

)

from DQM.TrackingMonitorSource.TrackToTrackComparisonHists_cfi import TrackToTrackComparisonHists as _TrackToTrackComparisonHists
hltPixelTrackToTrackSerialSync = _TrackToTrackComparisonHists.clone(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this uses only "data" and not simultation truth, I would naively assume it should go into "DQM" rather than "Validation"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can put it in a more appropriate file. Do you have any proposal about where to move it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would create a new file in DQM/TrackingMonitorSource/python and reference it in the phase-2 hlt DQM sequence

Copy link
Copy Markdown
Contributor

@mmusich mmusich Mar 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VourMa

While we wait for a resolution of #48976 that would allow to run the DQM step in the same job as HLT, I think an acceptable compromise is to run the DQM modules themselves within the HLT menu as we did for most of Run 3 for the pixel heterogeneous tracking.
Here is a commit to that effect mmusich@07693a7.
I profited to:

  • add the monitoring of the pixel tracks SoA-s
  • add the harvesting step for the track-to-track comparisons

Tested with runTheMatrix.py --what upgrade -l 34434.7503 -t 4 -j 8 on lxplus-gpu

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, @mmusich! I will take a look and included today/tomorrow so that we can move forward.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I applied the commit verbatim

outputCommands = FEVTDEBUGHLTEventContent.outputCommands+[
'keep *_hltInitialStepTrajectorySeedsLSTTracks_*_*',
'keep *_hltInitialStepTrajectorySeedsLSTTracksSerialSync_*_*',
'keep *_hltPhase2PixelTracks_*_*',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some of these are already saved by default please clean up the redundant keep statements.

phase2_tracker.toModify(FEVTDEBUGHLTEventContent,
outputCommands = FEVTDEBUGHLTEventContent.outputCommands+[
'keep *_hltSiPixelClusters_*_*',
'keep *_hltSiPhase2Clusters_*_*',
'keep *_hltPhase2PixelTracks_*_*',
'keep *_hltPhase2PixelVertices_*_*',
'keep *_hltGeneralTracks_*_*',
'keep *_hltInitialStepTrackSelectionHighPurity_*_*',
'keep *_hltHighPtTripletStepTrackSelectionHighPurity_*_*',
'keep *_hltInitialStepTracksT4T5TCLST_*_*',
'keep *_hltOfflinePrimaryVertices_*_*',
])

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the file accordingly.

@cmsbuild
Copy link
Copy Markdown
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50336/48552

@cmsbuild
Copy link
Copy Markdown
Contributor

@mmusich
Copy link
Copy Markdown
Contributor

mmusich commented Mar 16, 2026

@cmsbuild, please test

@cmsbuild
Copy link
Copy Markdown
Contributor

+1

Size: This PR adds an extra 28KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-dfc9e5/52071/summary.html
COMMIT: 2cd1f9b
CMSSW: CMSSW_16_1_X_2026-03-17-2300/el8_amd64_gcc13
Additional Tests: GPU,HLT_P2_INTEGRATION,HLT_P2_TIMING,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/50336/52071/install.sh to create a dev area with all the needed externals and cmssw changes.

HLT P2 Timing: chart

Comparison Summary

Summary:

AMD_MI300X Comparison Summary

Summary:

AMD_W7900 Comparison Summary

Summary:

NVIDIA_H100 Comparison Summary

Summary:

NVIDIA_L40S Comparison Summary

Summary:

@mmusich
Copy link
Copy Markdown
Contributor

mmusich commented Mar 19, 2026

workflows_gpu = 34434.7503

It looks like this workflow did run in the GPU matrix: logs, but I can't see it in the bin-by-bin comparisons.

To maximize usefulness I would suggest to move the workflow from the upgrade to the gpu matrix and then put it in here in the cms-bot configuration. This can be done at a later time.

@mmusich
Copy link
Copy Markdown
Contributor

mmusich commented Mar 19, 2026

+hlt

@VourMa
Copy link
Copy Markdown
Contributor Author

VourMa commented Mar 19, 2026

To maximize usefulness I would suggest to move the workflow from the upgrade to the gpu matrix and then put it in here in the cms-bot configuration. This can be done at a later time.

Yes, since this requires an external, let's do it in a separate (set of) PR(s). I will prepare it and submit when this is in.

PR according to description (@VourMa you might want to edit it to reflect the current status of the PR) and follow-up review

Right, I will do so later today.

@civanch
Copy link
Copy Markdown
Contributor

civanch commented Mar 19, 2026

+1

@gabrielmscampos
Copy link
Copy Markdown
Member

+dqm

@mmusich
Copy link
Copy Markdown
Contributor

mmusich commented Mar 20, 2026

@cms-sw/pdmv-l2 kind ping

@AdrianoDee
Copy link
Copy Markdown
Contributor

+pdmv

@cmsbuild
Copy link
Copy Markdown
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @ftenchini, @mandrenguyen, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2)

@ftenchini
Copy link
Copy Markdown

+1

@cmsbuild cmsbuild merged commit 5b39379 into cms-sw:master Mar 23, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants