Skip to content

CPU vs. GPU for LST in HLT and updates to the offline#49832

Closed
VourMa wants to merge 1 commit intocms-sw:masterfrom
SegmentLinking:CMSSW_16_0_0_pre3_serialSync
Closed

CPU vs. GPU for LST in HLT and updates to the offline#49832
VourMa wants to merge 1 commit intocms-sw:masterfrom
SegmentLinking:CMSSW_16_0_0_pre3_serialSync

Conversation

@VourMa
Copy link
Copy Markdown
Contributor

@VourMa VourMa commented Jan 14, 2026

The goal of this PR is to introduce two HLT workflows to monitor the agreement between LST on CPU and LST on GPU:

  1. Workflow 0.7541 monitors the LST output tracks when LST is used for track building (most direct comparison of LST), i.e. for alpakaValidationLST,singleIterPatatrack,trackingLST.
  2. Workflow 0.7573 monitors the built tracks in the upcoming new tracking baseline, where LST is used as an extended seeding algorithm (comparison of LST output in a "production" configuration), i.e. for singleIterPatatrack,phase2CAExtension,trackingLST,seedingLST,trackingMkFitCommon,hltTrackingMkFitInitialStep.

The additional CPU reconstruction (SerialSync) and comparison plots are implemented with a new procModifier, alpakaValidationLST. This procModifier needs to be run only in the procModifier combinations mentioned above to take effect, otherwise it produces neither the additional products nor the comparison plots. It is also included in the alpakaValidation modifier chain.

The analyzer that produces the comparison plots has been improved with a new parameter option to skip luminosity and PU plots.

With the introduction of the alpakaValidationLST modifier, the offline workflow testing LST on CPU vs. LST on GPU can be made explicit. The code is changed so that the heterogeneous workflow 0.712 (previously 0.704) runs the offline reconstruction without any additional CPU reconstruction, while a new workflow, 0.713, runs the comparison. Workflow 0.703 has also been renamed to 0.711. The workflow numbering changes are made so that the offline LST workflows follow the numbering conventions for Alpaka workflows.

Some screenshots of the content of the DQM file:
Screenshot from 2026-01-07 19-38-56
image
image

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Jan 14, 2026

cms-bot internal usage

@cmsbuild
Copy link
Copy Markdown
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49832/47487

@cmsbuild
Copy link
Copy Markdown
Contributor

A new Pull Request was created by @VourMa for master.

It involves the following packages:

  • Configuration/EventContent (operations)
  • Configuration/ProcessModifiers (operations)
  • Configuration/PyReleaseValidation (pdmv)
  • DQM/TrackingMonitorClient (dqm)
  • DQM/TrackingMonitorSource (dqm)
  • HLTrigger/Configuration (hlt)
  • RecoTracker/IterativeTracking (reconstruction)
  • Validation/RecoTrack (dqm)

@AdrianoDee, @DickyChant, @Martin-Grunewald, @Moanwar, @antoniovagnerini, @cmsbuild, @ctarricone, @davidlange6, @fabiocos, @ftenchini, @gabrielmscampos, @jfernan2, @mandrenguyen, @miquork, @mmusich, @nothingface0, @rseidita, @srimanob can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @Martin-Grunewald, @SohamBhattacharya, @VinInn, @VourMa, @arossi83, @dgulhan, @elusian, @fabiocos, @felicepantaleo, @fioriNTU, @gpetruc, @idebruyn, @jandrea, @makortel, @missirol, @mmasciov, @mmusich, @mtosi, @richa2710, @rovere, @slomeo, @sroychow, @threus, @wmtford this is something you requested to watch as well.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@mmusich
Copy link
Copy Markdown
Contributor

mmusich commented Jan 14, 2026

assign heterogeneous

@cmsbuild
Copy link
Copy Markdown
Contributor

New categories assigned: heterogeneous

@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

numWFIB.extend([prefixDet+34.7521])# HLTTiming75e33, ticl_v5, ticlv5TrackLinkingGNN
numWFIB.extend([prefixDet+34.753]) # HLTTiming75e33, alpaka,singleIterPatatrack
numWFIB.extend([prefixDet+34.754]) # HLTTiming75e33, alpaka,singleIterPatatrack,trackingLST
numWFIB.extend([prefixDet+34.7541]) # HLTTiming75e33, alpakaValidationLST,singleIterPatatrack,trackingLST
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this go rather in the gpu matrix? How do I test this from the bot with a GPU backend available?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, my bad. Should be fixed in the last push.

@cmsbuild
Copy link
Copy Markdown
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49832/47496

@cmsbuild
Copy link
Copy Markdown
Contributor

@mmusich
Copy link
Copy Markdown
Contributor

mmusich commented Jan 15, 2026

enable gpu

@mmusich
Copy link
Copy Markdown
Contributor

mmusich commented Jan 15, 2026

test parameters:

  • enable = hlt_p2_integration, hlt_p2_timing
  • workflows = ph2_hlt
  • enable_tests = gpu
  • workflows_gpu = 34434.7041, 34434.7541, 34434.7573
  • relvals_opt = -w upgrade,standard
  • relvals_opt_gpu = -w upgrade,standard

@mmusich
Copy link
Copy Markdown
Contributor

mmusich commented Jan 15, 2026

@cmsbuild, please test

@VourMa
Copy link
Copy Markdown
Contributor Author

VourMa commented Jan 21, 2026

I do not see where I might have missed a 0.704 workflow. If anyone has any suggestions, please let me know...

I think here

Oh, OK, thanks!
For my understanding, is this supposed to hard-coded and not controlled by some subset of workflows from this repository?
In any case, I can make a PR to the bot repo as well, if that's the recommended way.

@mmusich
Copy link
Copy Markdown
Contributor

mmusich commented Jan 21, 2026

For my understanding, is this supposed to hard-coded and not controlled by some subset of workflows from this repository?

🤷‍♂️

@VourMa
Copy link
Copy Markdown
Contributor Author

VourMa commented Jan 22, 2026

In any case, I can make a PR to the bot repo as well, if that's the recommended way.

The relevant PR has been made: cms-sw/cms-bot#2663

@mmusich
Copy link
Copy Markdown
Contributor

mmusich commented Jan 22, 2026

test parameters:

@mmusich
Copy link
Copy Markdown
Contributor

mmusich commented Jan 22, 2026

@cmsbuild, please test

@cmsbuild
Copy link
Copy Markdown
Contributor

-1

Failed Tests: UnitTests RelVals-NVIDIA_L40S
Size: This PR adds an extra 16KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-371ebe/50817/summary.html
COMMIT: b807f98
CMSSW: CMSSW_16_1_X_2026-01-22-1100/el8_amd64_gcc13
Additional Tests: GPU,HLT_P2_INTEGRATION,HLT_P2_TIMING,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/49832/50817/install.sh to create a dev area with all the needed externals and cmssw changes.

HLT P2 Timing: chart

Failed Unit Tests

I found 1 errors in the following unit tests:

---> test RecoTrackerLSTCore-standalone-compilation had ERRORS

Failed RelVals-NVIDIA_L40S

  • 34634.71334634.713_TTbar_14TeV+Run4D121PU_lstOnGPUIters01TrackingOnlyAlpakaValidationLST/step2_TTbar_14TeV+Run4D121PU_lstOnGPUIters01TrackingOnlyAlpakaValidationLST.log
  • 34634.71234634.712_TTbar_14TeV+Run4D121PU_lstOnGPUIters01TrackingOnly/step2_TTbar_14TeV+Run4D121PU_lstOnGPUIters01TrackingOnly.log
  • 34634.40334634.403_TTbar_14TeV+Run4D121PU_Patatrack_PixelOnlyAlpaka_Validation/step2_TTbar_14TeV+Run4D121PU_Patatrack_PixelOnlyAlpaka_Validation.log
Expand to see more relval errors ...

Comparison Summary

Summary:

  • You potentially removed 3 lines from the logs
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 73
  • DQMHistoTests: Total histograms compared: 4814076
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4814053
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 72 files compared)
  • Checked 293 log files, 250 edm output root files, 73 DQM output files
  • TriggerResults: no differences found

@VourMa
Copy link
Copy Markdown
Contributor Author

VourMa commented Jan 22, 2026

The failed RelVals are due to the recent, usual error:

----- Begin Fatal Exception 22-Jan-2026 19:34:42 CET-----------------------
An exception of category 'OutOfBound' occurred while
   [0] Processing  Event run: 1 lumi: 1 event: 1 stream: 0
   [1] Running path 'HLTriggerFinalPath'
   [2] Prefetching for module TriggerSummaryProducerAOD/'hltTriggerSummaryAOD'
   [3] Prefetching for module L1HPSPFTauProducer/'l1tHPSPFTauProducer'
   [4] Prefetching for module L1TPFCandMultiMerger/'l1tLayer1'
   [5] Prefetching for module L1TCorrelatorLayer1Producer/'l1tLayer1HGCal'
   [6] Calling method for module HGCalBackendLayer2Producer/'l1tHGCalBackEndLayer2Producer'
Exception Message:
TC X1 = 0.0683642 out of the seeding histogram bounds 0.076 - 0.58
----- End Fatal Exception -------------------------------------------------

while the failed unit test is unrelated and fixed in #49895.

@nothingface0
Copy link
Copy Markdown
Contributor

+dqm

@VourMa
Copy link
Copy Markdown
Contributor Author

VourMa commented Jan 28, 2026

Kindly ping to the reviewers of this PR: are there any followups? This is needed for upcoming developments, so I would appreciate any feedback, so that it can be finalized.

@mmusich
Copy link
Copy Markdown
Contributor

mmusich commented Jan 28, 2026

Kindly ping to the reviewers of this PR: are there any followups? This is needed for upcoming developments, so I would appreciate any feedback, so that it can be finalized.

I am not entirely convinced of the proposed changes in the menu. I think an approach Run-3 like in which we have a dedicated path (e.g. DQM_TrackerHeterogeneousReco where we put both flavours of the modules) would be desirable.

@VourMa
Copy link
Copy Markdown
Contributor Author

VourMa commented Jan 28, 2026

I am not entirely convinced of the proposed changes in the menu. I think an approach Run-3 like in which we have a dedicated path (e.g. DQM_TrackerHeterogeneousReco where we put both flavours of the modules) would be desirable.

Thanks for the feedback. Nothing changes in the menu currently (in terms of paths, or maybe I misinterpreted what your comment). The changes are made so that a workflow with CPU vs. GPU comparison with the actual HLT configuration can be added to the matrix and run for RelVals (and that target should be satisfied by the proposed changes).

If the alternative solution has extra advantages on top of that, please let me know.

@mmusich
Copy link
Copy Markdown
Contributor

mmusich commented Jan 28, 2026

If the alternative solution has extra advantages on top of that, please let me know.

yes it should be much more self-contained.

Nothing changes in the menu currently (in terms of paths, or maybe I misinterpreted what your comment).

this is the point I don't like. I would not like to change the whole behaviour of the menu, but just selected targeted modules in a given path.

@VourMa
Copy link
Copy Markdown
Contributor Author

VourMa commented Jan 28, 2026

If the alternative solution has extra advantages on top of that, please let me know.

yes it should be much more self-contained.

Nothing changes in the menu currently (in terms of paths, or maybe I misinterpreted what your comment).

this is the point I don't like. I would not like to change the whole behaviour of the menu, but just selected targeted modules in a given path.

I see. Would it be satisfactory if a copy of MC_TRK_cfi.py was made, which would then be modified to run the two reconstructions (CPU & GPU) and then compare them in the same way as proposed in this PR?

@mmusich
Copy link
Copy Markdown
Contributor

mmusich commented Jan 28, 2026

Would it be satisfactory if a copy of MC_TRK_cfi.py was made, which would then be modified to run the two reconstructions (CPU & GPU) and then compare them in the same way as proposed in this PR?

I think so. Two points I would advise:

  • if the goal is entirely to make GPU vs CPU comparison I would gate the path with process.hltBackend + process.hltStatusOnGPUFilter where:
process.hltBackend = cms.EDProducer( "AlpakaBackendProducer@alpaka"
)
process.hltStatusOnGPUFilter = cms.EDFilter( "AlpakaBackendFilter",
    producer = cms.InputTag( 'hltBackend','backend' ),
    backends = cms.vstring( 'CudaAsync',
      'ROCmAsync' )
)

to avoid running the path at all if there isn't a GPU.

  • I would name the path DQM_something (to keep the same nomenclature as in Run3 and potentially inspire other groups to use the same mechanism).

@VourMa
Copy link
Copy Markdown
Contributor Author

VourMa commented Jan 28, 2026

Got it, thanks for the advice. Then my proposal is the following:

  • I close this PR;
  • I open another PR with the changes in the offline part only - these are well factorized and well motivated;
  • I open another PR for the HLT part after the tracking configuration has been simplified - we are reasonably close to update of the tracking baseline and that would severely facilitate the work.

@mmusich
Copy link
Copy Markdown
Contributor

mmusich commented Jan 28, 2026

Then my proposal is the following:

No objections to this plan. BTW, I think the mechanism on how to implement CPU vs GPU comparisons in the phase-2 menu would be an excellent topic to discuss at some upcoming TSG/Upgrade meeting.
Thanks!

@VourMa
Copy link
Copy Markdown
Contributor Author

VourMa commented Jan 29, 2026

Superseded by #49984 for the offline part. The HLT part to be followed after more urgent updates have been pushed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants