CPU vs. GPU for LST in HLT and updates to the offline#49832
CPU vs. GPU for LST in HLT and updates to the offline#49832VourMa wants to merge 1 commit intocms-sw:masterfrom
Conversation
|
cms-bot internal usage |
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49832/47487
|
|
A new Pull Request was created by @VourMa for master. It involves the following packages:
@AdrianoDee, @DickyChant, @Martin-Grunewald, @Moanwar, @antoniovagnerini, @cmsbuild, @ctarricone, @davidlange6, @fabiocos, @ftenchini, @gabrielmscampos, @jfernan2, @mandrenguyen, @miquork, @mmusich, @nothingface0, @rseidita, @srimanob can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
|
assign heterogeneous |
| numWFIB.extend([prefixDet+34.7521])# HLTTiming75e33, ticl_v5, ticlv5TrackLinkingGNN | ||
| numWFIB.extend([prefixDet+34.753]) # HLTTiming75e33, alpaka,singleIterPatatrack | ||
| numWFIB.extend([prefixDet+34.754]) # HLTTiming75e33, alpaka,singleIterPatatrack,trackingLST | ||
| numWFIB.extend([prefixDet+34.7541]) # HLTTiming75e33, alpakaValidationLST,singleIterPatatrack,trackingLST |
There was a problem hiding this comment.
Shouldn't this go rather in the gpu matrix? How do I test this from the bot with a GPU backend available?
There was a problem hiding this comment.
Oops, my bad. Should be fixed in the last push.
6fd4c49 to
2cf3f6b
Compare
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49832/47496
|
|
Pull request #49832 was updated. @AdrianoDee, @DickyChant, @Martin-Grunewald, @Moanwar, @antoniovagnerini, @cmsbuild, @ctarricone, @davidlange6, @fabiocos, @ftenchini, @fwyzard, @gabrielmscampos, @jfernan2, @makortel, @mandrenguyen, @miquork, @mmusich, @nothingface0, @rseidita, @srimanob can you please check and sign again. |
|
enable gpu |
|
test parameters:
|
|
@cmsbuild, please test |
Oh, OK, thanks! |
🤷♂️ |
The relevant PR has been made: cms-sw/cms-bot#2663 |
|
test parameters:
|
|
@cmsbuild, please test |
|
-1 Failed Tests: UnitTests RelVals-NVIDIA_L40S HLT P2 Timing: chart Failed Unit TestsI found 1 errors in the following unit tests: ---> test RecoTrackerLSTCore-standalone-compilation had ERRORS Failed RelVals-NVIDIA_L40S
Comparison SummarySummary:
|
|
The failed RelVals are due to the recent, usual error: while the failed unit test is unrelated and fixed in #49895. |
|
+dqm |
|
Kindly ping to the reviewers of this PR: are there any followups? This is needed for upcoming developments, so I would appreciate any feedback, so that it can be finalized. |
I am not entirely convinced of the proposed changes in the menu. I think an approach Run-3 like in which we have a dedicated path (e.g. |
Thanks for the feedback. Nothing changes in the menu currently (in terms of paths, or maybe I misinterpreted what your comment). The changes are made so that a workflow with CPU vs. GPU comparison with the actual HLT configuration can be added to the matrix and run for RelVals (and that target should be satisfied by the proposed changes). If the alternative solution has extra advantages on top of that, please let me know. |
yes it should be much more self-contained.
this is the point I don't like. I would not like to change the whole behaviour of the menu, but just selected targeted modules in a given path. |
I see. Would it be satisfactory if a copy of MC_TRK_cfi.py was made, which would then be modified to run the two reconstructions (CPU & GPU) and then compare them in the same way as proposed in this PR? |
I think so. Two points I would advise:
process.hltBackend = cms.EDProducer( "AlpakaBackendProducer@alpaka"
)
process.hltStatusOnGPUFilter = cms.EDFilter( "AlpakaBackendFilter",
producer = cms.InputTag( 'hltBackend','backend' ),
backends = cms.vstring( 'CudaAsync',
'ROCmAsync' )
)to avoid running the path at all if there isn't a GPU.
|
|
Got it, thanks for the advice. Then my proposal is the following:
|
No objections to this plan. BTW, I think the mechanism on how to implement CPU vs GPU comparisons in the phase-2 menu would be an excellent topic to discuss at some upcoming TSG/Upgrade meeting. |
|
Superseded by #49984 for the offline part. The HLT part to be followed after more urgent updates have been pushed. |
The goal of this PR is to introduce two HLT workflows to monitor the agreement between LST on CPU and LST on GPU:
alpakaValidationLST,singleIterPatatrack,trackingLST.singleIterPatatrack,phase2CAExtension,trackingLST,seedingLST,trackingMkFitCommon,hltTrackingMkFitInitialStep.The additional CPU reconstruction (
SerialSync) and comparison plots are implemented with a new procModifier,alpakaValidationLST. This procModifier needs to be run only in the procModifier combinations mentioned above to take effect, otherwise it produces neither the additional products nor the comparison plots. It is also included in thealpakaValidationmodifier chain.The analyzer that produces the comparison plots has been improved with a new parameter option to skip luminosity and PU plots.
With the introduction of the
alpakaValidationLSTmodifier, the offline workflow testing LST on CPU vs. LST on GPU can be made explicit. The code is changed so that the heterogeneous workflow0.712(previously0.704) runs the offline reconstruction without any additional CPU reconstruction, while a new workflow,0.713, runs the comparison. Workflow0.703has also been renamed to0.711. The workflow numbering changes are made so that the offline LST workflows follow the numbering conventions for Alpaka workflows.Some screenshots of the content of the DQM file:


