Implement GPU vs CPU comparison for HLT heterogeneous products in patatrack workflows#49105
Conversation
|
cms-bot internal usage |
|
enable gpu |
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49105/46346
|
|
A new Pull Request was created by @mmusich for master. It involves the following packages:
@AdrianoDee, @DickyChant, @Moanwar, @antoniovagnerini, @cmsbuild, @ctarricone, @gabrielmscampos, @miquork, @nothingface0, @rseidita, @srimanob, @subirsarkar can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
|
@cmsbuild please test |
|
+1 Size: This PR adds an extra 56KB to repository Comparison SummarySummary:
AMD_MI300X Comparison SummarySummary:
AMD_W7900 Comparison SummarySummary:
NVIDIA_H100 Comparison SummarySummary:
NVIDIA_L40S Comparison SummarySummary:
NVIDIA_T4 Comparison SummarySummary:
|
|
Quick question, I would expect histograms to be filled in the HLT workspace, for the root files produced by the GPU bin by bin comparisons, but I don't see anything, e.g. here: https://cern.ch/xsnvd. Is it simply because there are no comparison failures? |
no, it's because the workflow you chose is for phase-2 and we don't produce yet any of those products in the phase-2 menu. |
Ah got it, thanks! |
|
+dqm |
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49105/46404
|
|
Pull request #49105 was updated. @AdrianoDee, @DickyChant, @Moanwar, @antoniovagnerini, @cmsbuild, @ctarricone, @gabrielmscampos, @miquork, @nothingface0, @rseidita, @srimanob, @subirsarkar can you please check and sign again. |
|
@cmsbuild, please test |
|
+1 Size: This PR adds an extra 32KB to repository Comparison SummarySummary:
AMD_MI300X Comparison SummarySummary:
AMD_W7900 Comparison SummarySummary:
NVIDIA_H100 Comparison SummarySummary:
NVIDIA_L40S Comparison SummarySummary:
NVIDIA_T4 Comparison SummarySummary:
|
the changes in the phase-2 workflows have been removed. |
|
+dqm |
|
@cms-sw/upgrade-l2 @cms-sw/pdmv-l2 just a kind ping. |
|
+Upgrade |
|
+pdmv |
|
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @mandrenguyen, @ftenchini (and backports should be raised in the release meeting by the corresponding L2) |
|
+1 |
PR description:
It has been discussed that it would be desirable to have means to assess the discrepancies in the heterogeneous reconstruction chains when run on CPU vs GPU backends at release validation level by submitting some of the existing patatrack workflows on dedicated resources.
In all those workflows the HLT menu is run as part of the step 2 and in PR #49079 we have percolated the
DQMGPUvsCPUstream event content (as it is run online) to theHLTDebugRAWandHLTDebugFEVTfor release validation purposes.This means we can thus finally profit of the existing DQM infrastructure (developed for online DQM) to generate such comparisons in relvals.
The goal of this PR is to provide the infrastructural changes to do so, while making sure to not crash the process in case some of the input collections are not available.
PR validation:
I have run the following workflow
both in a machine equipped with a NVIDIA T4 GPU and in one without GPU attached, and I was able to inspect the output comparison plots in the earlier case.
I have also run the subset of relval test in the
gpumatrix run in PR tests via:and did not observe issues.
If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:
Not a backport, I think it would be useful to backport at least to CMSSW_15_1_X.
Cc: @AdrianoDee @fwyzard @bainbrid @mtosi