Skip to content

Removing CUDA/gpu from Pixel code configs and dropping all CUDA wfs#46853

Merged
cmsbuild merged 3 commits intocms-sw:masterfrom
AdrianoDee:remove_cuda_wfs_and_pixel_cfgs
Jan 9, 2025
Merged

Removing CUDA/gpu from Pixel code configs and dropping all CUDA wfs#46853
cmsbuild merged 3 commits intocms-sw:masterfrom
AdrianoDee:remove_cuda_wfs_and_pixel_cfgs

Conversation

@AdrianoDee
Copy link
Copy Markdown
Contributor

@AdrianoDee AdrianoDee commented Dec 3, 2024

PR description:

This PR proposes:

  • the removal of all the CUDA modules from pixel-related configs and of all the CUDA Patatrack wfs;
  • the removal of pixelNtupletFit_cff modifier.

A subsequent step would be to remove the gpu modifier, but since this involves code from many parties, I prefer to have it separated.

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Dec 3, 2024

cms-bot internal usage

@AdrianoDee
Copy link
Copy Markdown
Contributor Author

AdrianoDee commented Dec 3, 2024

Well, apparently already these changes involve many parties. So maybe I'll push the drop of the gpu modifier already here.

@AdrianoDee
Copy link
Copy Markdown
Contributor Author

enable gpu

@AdrianoDee
Copy link
Copy Markdown
Contributor Author

please test

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Dec 3, 2024

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46853/42877

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Dec 3, 2024

A new Pull Request was created by @AdrianoDee for master.

It involves the following packages:

  • Configuration/ProcessModifiers (operations)
  • Configuration/PyReleaseValidation (pdmv, upgrade)
  • EventFilter/SiPixelRawToDigi (reconstruction)
  • RecoHI/HiTracking (reconstruction)
  • RecoLocalTracker/SiPixelClusterizer (reconstruction)
  • RecoLocalTracker/SiPixelRecHits (reconstruction)
  • RecoTracker/PixelTrackFitting (reconstruction)
  • RecoVertex/BeamSpotProducer (reconstruction, alca)
  • RecoVertex/Configuration (reconstruction)

@AdrianoDee, @Moanwar, @antoniovilela, @atpathak, @consuegs, @davidlange6, @DickyChant, @fabiocos, @jfernan2, @mandrenguyen, @miquork, @perrotta, @rappoccio, @srimanob, @subirsarkar can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @Martin-Grunewald, @VinInn, @VourMa, @dgulhan, @dkotlins, @fabiocos, @felicepantaleo, @ferencek, @francescobrivio, @gpetruc, @jazzitup, @kurtejung, @makortel, @mandrenguyen, @martinamalberti, @missirol, @mmusich, @mroguljic, @mtosi, @rovere, @rsreds, @slomeo, @threus, @tocheng, @tsusa, @tvami, @yenjie, @yetkinyilmaz, @yuanchao this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@jfernan2
Copy link
Copy Markdown
Contributor

jfernan2 commented Dec 3, 2024

assign heterogeneous

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Dec 3, 2024

New categories assigned: heterogeneous

@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Dec 3, 2024

-1

Failed Tests: UnitTests RelVals RelVals-GPU RelVals-INPUT AddOn
Size: This PR adds an extra 56KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-693680/43213/summary.html
COMMIT: 5d626de
CMSSW: CMSSW_15_0_X_2024-12-03-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/46853/43213/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found 34 errors in the following unit tests:

---> test TestDQMOnlineClient-es_dqm_sourceclient had ERRORS
---> test TestDQMOnlineClient-beamhlt_dqm_sourceclient had ERRORS
---> test TestDQMOnlineClient-csc_dqm_sourceclient had ERRORS
and more ...

RelVals

  • 135.4135.4_ZEEFS_13/step1_ZEEFS_13.log
  • 1306.01306.0_SingleMuPt1_UP15/step1_SingleMuPt1_UP15.log
  • 7.37.3_CosmicsSPLoose2018/step1_CosmicsSPLoose2018.log
Expand to see more relval errors ...

RelVals-GPU

  • 12834.42312834.423_TTbar_14TeV+2024_Patatrack_HCALOnlyGPUandAlpaka_Validation/step1_TTbar_14TeV+2024_Patatrack_HCALOnlyGPUandAlpaka_Validation.log
  • 12834.42212834.422_TTbar_14TeV+2024_Patatrack_HCALOnlyAlpaka_Validation/step1_TTbar_14TeV+2024_Patatrack_HCALOnlyAlpaka_Validation.log
  • 12834.40612834.406_TTbar_14TeV+2024_Patatrack_PixelOnlyTripletsAlpaka/step1_TTbar_14TeV+2024_Patatrack_PixelOnlyTripletsAlpaka.log
Expand to see more relval errors ...

RelVals-INPUT

  • 159.01159.01_HydjetQ_reminiaodPbPb2022_INPUT/step2_HydjetQ_reminiaodPbPb2022_INPUT.log
  • 136.875136.875_RunDoubleMuon2018C/step2_RunDoubleMuon2018C.log
  • 2500.2252500.225_jmeNANOrePuppimc140X/step2_jmeNANOrePuppimc140X.log
Expand to see more relval errors ...

AddOn Tests

UNKNOWN
UNKNOWN
UNKNOWN
Expand to see more addon errors ...

@AdrianoDee
Copy link
Copy Markdown
Contributor Author

Seems not this time, unfortunately. Let me fix the commit history.

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Jan 7, 2025

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46853/43192

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Jan 7, 2025

@AdrianoDee
Copy link
Copy Markdown
Contributor Author

please test

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Jan 7, 2025

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46853/43193

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Jan 7, 2025

Pull request #46853 was updated. @Moanwar, @antoniovilela, @atpathak, @consuegs, @davidlange6, @fabiocos, @fwyzard, @jfernan2, @makortel, @mandrenguyen, @perrotta, @rappoccio, @srimanob, @subirsarkar can you please check and sign again.

@fwyzard
Copy link
Copy Markdown
Contributor

fwyzard commented Jan 7, 2025

please test

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Jan 7, 2025

-1

Failed Tests: UnitTests
Size: This PR adds an extra 24KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-693680/43653/summary.html
COMMIT: cebdb4d
CMSSW: CMSSW_15_0_X_2025-01-07-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/46853/43653/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-693680/43653/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-693680/43653/git-merge-result

Unit Tests

I found 1 errors in the following unit tests:

---> test runtestPhysicsToolsPatAlgos had ERRORS

Comparison Summary

Summary:

  • You potentially removed 135 lines from the logs
  • Reco comparison results: 271 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3931808
  • DQMHistoTests: Total failures: 6176
  • DQMHistoTests: Total nulls: 12
  • DQMHistoTests: Total successes: 3925600
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.8790000000000004 KiB( 49 files compared)
  • DQMHistoSizes: changed ( 1000.0,... ): -0.012 KiB MessageLogger/Errors
  • DQMHistoSizes: changed ( 1000.0,... ): -0.012 KiB MessageLogger/Warnings
  • DQMHistoSizes: changed ( 24834.911,... ): -0.008 KiB MessageLogger/Errors
  • DQMHistoSizes: changed ( 24834.911,... ): -0.008 KiB MessageLogger/Warnings
  • DQMHistoSizes: changed ( 4.22,... ): -0.004 KiB MessageLogger/Errors
  • DQMHistoSizes: changed ( 4.22,... ): -0.004 KiB MessageLogger/Warnings
  • Checked 218 log files, 189 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

GPU Comparison Summary

Summary:

@mmusich
Copy link
Copy Markdown
Contributor

mmusich commented Jan 7, 2025

---> test runtestPhysicsToolsPatAlgos had ERRORS

my bad, will fix.

@mmusich
Copy link
Copy Markdown
Contributor

mmusich commented Jan 9, 2025

@cmsbuild, please test

  • in CMSSW_15_0_X_2025-01-08-2300 hopefully all issues are resolved

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Jan 9, 2025

+1

Size: This PR adds an extra 16KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-693680/43689/summary.html
COMMIT: cebdb4d
CMSSW: CMSSW_15_0_X_2025-01-08-2300/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/46853/43689/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 131 lines from the logs
  • Reco comparison results: 11 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3931808
  • DQMHistoTests: Total failures: 438
  • DQMHistoTests: Total nulls: 12
  • DQMHistoTests: Total successes: 3931338
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.8790000000000004 KiB( 49 files compared)
  • DQMHistoSizes: changed ( 12434.7,... ): -0.012 KiB MessageLogger/Errors
  • DQMHistoSizes: changed ( 12434.7,... ): -0.012 KiB MessageLogger/Warnings
  • DQMHistoSizes: changed ( 24834.911,... ): -0.008 KiB MessageLogger/Errors
  • DQMHistoSizes: changed ( 24834.911,... ): -0.008 KiB MessageLogger/Warnings
  • DQMHistoSizes: changed ( 4.22,... ): -0.004 KiB MessageLogger/Errors
  • DQMHistoSizes: changed ( 4.22,... ): -0.004 KiB MessageLogger/Warnings
  • Checked 218 log files, 189 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

GPU Comparison Summary

Summary:

@AdrianoDee
Copy link
Copy Markdown
Contributor Author

Thanks @mmusich

@cms-sw/alca-l2 @cms-sw/reconstruction-l2 @cms-sw/heterogeneous-l2 @cms-sw/upgrade-l2 this should be ready to go (the last couple of commits were only rebases).

@fwyzard
Copy link
Copy Markdown
Contributor

fwyzard commented Jan 9, 2025

+1

@perrotta
Copy link
Copy Markdown
Contributor

perrotta commented Jan 9, 2025

+alca

@Moanwar
Copy link
Copy Markdown
Contributor

Moanwar commented Jan 9, 2025

+Upgrade

@jfernan2
Copy link
Copy Markdown
Contributor

jfernan2 commented Jan 9, 2025

+1

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Jan 9, 2025

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @mandrenguyen, @sextonkennedy, @rappoccio, @antoniovilela (and backports should be raised in the release meeting by the corresponding L2)

@mandrenguyen
Copy link
Copy Markdown
Contributor

+1


(alpaka & pp_on_AA & ~phase2_tracker).toModify(siPixelRecHitsPreSplitting,
cpu = _siPixelRecHitFromSoAAlpakaHIonPhase1.clone(
(alpaka & pp_on_AA & ~phase2_tracker).toModify(siPixelRecHitsPreSplitting, _siPixelRecHitFromSoAAlpakaHIonPhase1.clone(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should have been

(alpaka & pp_on_AA & ~phase2_tracker).toReplaceWith(siPixelRecHitsPreSplitting, _siPixelRecHitFromSoAAlpakaHIonPhase1.clone(

Workflow 160.401 fails in the IB with

Traceback (most recent call last):
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02871/el8_amd64_gcc12/cms/cmssw/CMSSW_15_0_X_2025-01-08-2300/bin/el8_amd64_gcc12/cmsDriver.py", line 40, in <module>
    run()
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02871/el8_amd64_gcc12/cms/cmssw/CMSSW_15_0_X_2025-01-08-2300/bin/el8_amd64_gcc12/cmsDriver.py", line 16, in run
    configBuilder.prepare()
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02871/el8_amd64_gcc12/cms/cmssw/CMSSW_15_0_X_2025-01-08-2300/src/Configuration/Applications/python/ConfigBuilder.py", line 2284, in prepare
    self.addStandardSequences()
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02871/el8_amd64_gcc12/cms/cmssw/CMSSW_15_0_X_2025-01-08-2300/src/Configuration/Applications/python/ConfigBuilder.py", line 826, in addStandardSequences
    getattr(self,"prepare_"+stepName)(stepSpec = getattr(self,stepName+"DefaultSeq"))
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02871/el8_amd64_gcc12/cms/cmssw/CMSSW_15_0_X_2025-01-08-2300/src/Configuration/Applications/python/ConfigBuilder.py", line 1743, in prepare_RECO
    _,_recoSeq,_ = self.loadDefaultOrSpecifiedCFF(stepSpec,self.RECODefaultCFF)
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02871/el8_amd64_gcc12/cms/cmssw/CMSSW_15_0_X_2025-01-08-2300/src/Configuration/Applications/python/ConfigBuilder.py", line 1275, in loadDefaultOrSpecifiedCFF
    l=self.loadAndRemember(_cff)
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02871/el8_amd64_gcc12/cms/cmssw/CMSSW_15_0_X_2025-01-08-2300/src/Configuration/Applications/python/ConfigBuilder.py", line 354, in loadAndRemember
    self.process.load(includeFile)
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02871/el8_amd64_gcc12/cms/cmssw/CMSSW_15_0_X_2025-01-08-2300/src/FWCore/ParameterSet/python/Config.py", line 760, in load
    module = __import__(moduleName)
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02871/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_15_0_X_2025-01-09-2300/src/Configuration/StandardSequences/python/Reconstruction_cff.py", line 8, in <module>
    from RecoTracker.Configuration.RecoTracker_cff import *
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02871/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_15_0_X_2025-01-09-2300/src/RecoTracker/Configuration/python/RecoTracker_cff.py", line 5, in <module>
    from RecoTracker.IterativeTracking.iterativeTk_cff import *
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02871/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_15_0_X_2025-01-09-2300/src/RecoTracker/IterativeTracking/python/iterativeTk_cff.py", line 4, in <module>
    from RecoTracker.IterativeTracking.InitialStepPreSplitting_cff import *
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02871/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_15_0_X_2025-01-09-2300/src/RecoTracker/IterativeTracking/python/InitialStepPreSplitting_cff.py", line 245, in <module>
    from RecoLocalTracker.SiPixelRecHits.SiPixelRecHits_cfi import siPixelRecHits
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02871/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_15_0_X_2025-01-09-2300/src/RecoLocalTracker/SiPixelRecHits/python/SiPixelRecHits_cfi.py", line 62, in <module>
    (alpaka & pp_on_AA & ~phase2_tracker).toModify(siPixelRecHitsPreSplitting, _siPixelRecHitFromSoAAlpakaHIonPhase1.clone(
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02871/el8_amd64_gcc12/cms/cmssw/CMSSW_15_0_X_2025-01-08-2300/src/FWCore/ParameterSet/python/Config.py", line 1783, in toModify
    Modifier._toModify(obj,func,**kw)
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02871/el8_amd64_gcc12/cms/cmssw/CMSSW_15_0_X_2025-01-08-2300/src/FWCore/ParameterSet/python/Config.py", line 1866, in _toModify
    func(obj)
TypeError: 'EDProducer' object is not callable

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el8_amd64_gcc12/CMSSW_15_0_X_2025-01-09-2300/pyRelValMatrixLogs/run/160.401_HydjetQ_MinBias_5362GeV_2023_ppReco/step4_HydjetQ_MinBias_5362GeV_2023_ppReco.log#/

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Matti, I opened this #47078 to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.