Skip to content

TICL: Consolidate v5 as Default Configuration and Cleanup Legacy Code#49932

Open
felicepantaleo wants to merge 29 commits intocms-sw:masterfrom
felicepantaleo:make_ticlv5_default_16_1_0_pre1
Open

TICL: Consolidate v5 as Default Configuration and Cleanup Legacy Code#49932
felicepantaleo wants to merge 29 commits intocms-sw:masterfrom
felicepantaleo:make_ticlv5_default_16_1_0_pre1

Conversation

@felicepantaleo
Copy link
Copy Markdown
Contributor

This PR establishes TICL v5 as the default reconstruction configuration, removing the need for explicit process modifiers and cleaning up significant amounts of legacy code associated with previous versions.

Key Changes:

  • Core Configuration:

  • Updated RecoHGCal/TICL/python/iterativeTICL_cff.py to use the TICL v5 chain (CLUE3DHigh -> TracksterLinks -> TICLCandidate) by default.

  • Removed fallback logic for v4.

  • Switched CLUE3DHighStep to use PFN inference by default.

  • Introduced ticl_dev process modifier for future development.

  • C++ Plugins:

  • PFTICLProducer: Simplified logic by removing the isTICLv5_ switch. The default behavior now assumes TICL v5 timing from TICLCandidate.

  • PatternRecognition: Enabled computeLocalTime by default in PatternRecognitionbyCLUE3D and PatternRecognitionbyCA. Enabled usePCACleaning by default in PatternRecognitionbyCLUE3D.

  • Validation & DQM:

  • Updated HGCalValidator to validate ticlCandidate and ticlTracksterLinks collections by default.

  • Updated makeHGCalValidationPlots.py to plot v5 collections.

  • Updated RecoHGCal_EventContent_cff.py to consolidate keep statements for v5 collections.

  • Fixed handling of empty collections in validator plugins.

  • Cleanup:

  • Removed obsolete process modifiers: ticl_v4, ticl_v5, clue3D, fastJetTICL, enableCPfromPU.

  • Removed deprecated Python configurations: ticl_iterations.py, customiseForTICLv5_cff.py, customiseTICLFromReco.py.

  • Removed legacy TracksterInferenceByCNNv4 implementation.

  • Removed deprecated harvestHGCalValidationPlots.py script.

  • HLT & Workflows:

  • Updated HLT 75e33 and Scouting menus to use TICL v5 components.

  • Removed obsolete HLT modules (hltParticleFlowSuperClusterHGCalFromTICLL1Seeded, hltParticleFlowSuperClusterHGCalFromTICLUnseeded).

  • Updated PyReleaseValidation workflows to reflect the removal of the ticl_v5 modifier.

Testing:

  • Standard matrix workflows.
  • HGCal validation suites.

Notes:

This PR represents a major consolidation of the TICL configuration, simplifying the codebase and establishing a clean baseline for future developments.

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Jan 24, 2026

cms-bot internal usage

@felicepantaleo
Copy link
Copy Markdown
Contributor Author

@cmsbuild please test

@cmsbuild
Copy link
Copy Markdown
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49932/47681

@cmsbuild
Copy link
Copy Markdown
Contributor

A new Pull Request was created by @felicepantaleo for master.

It involves the following packages:

  • Configuration/EventContent (operations)
  • Configuration/ProcessModifiers (operations)
  • Configuration/PyReleaseValidation (pdmv)
  • HLTrigger/Configuration (hlt)
  • RecoEcal/EgammaClusterProducers (reconstruction)
  • RecoHGCal/Configuration (reconstruction)
  • RecoHGCal/TICL (reconstruction)
  • RecoLocalCalo/HGCalRecProducers (reconstruction)
  • RecoParticleFlow/PFClusterProducer (reconstruction)
  • SimCalorimetry/HGCalAssociatorProducers (simulation)
  • SimCalorimetry/HGCalSimProducers (simulation)
  • Validation/Configuration (dqm, simulation)
  • Validation/HGCalValidation (dqm)

@AdrianoDee, @DickyChant, @Martin-Grunewald, @Moanwar, @antoniovagnerini, @civanch, @ctarricone, @davidlange6, @fabiocos, @ftenchini, @gabrielmscampos, @jfernan2, @kpedro88, @mandrenguyen, @mdhildreth, @miquork, @mmusich, @nothingface0, @rseidita, @srimanob can you please review it and eventually sign? Thanks.
@Martin-Grunewald, @Prasant1993, @ReyerBand, @Sam-Harper, @SohamBhattacharya, @VourMa, @a-kapoor, @afiqaize, @apsallid, @argiro, @bsunanda, @cseez, @denizsun, @edjtscott, @fabiocos, @forthommel, @hatakeyamak, @jainshilpi, @lecriste, @lgray, @makortel, @missirol, @mmarionncern, @mmusich, @pfs, @ram1123, @rchatter, @rovere, @salimcerci, @sameasy, @seemasharmafnal, @sethzenz, @slomeo, @sobhatta, @thomreis, @valsdav, @vandreev11, @varuns23, @wang0jin this is something you requested to watch as well.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@cmsbuild
Copy link
Copy Markdown
Contributor

-1

Failed Tests: UnitTests
Size: This PR adds an extra 452KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-667298/50874/summary.html
COMMIT: 9311241
CMSSW: CMSSW_16_1_X_2026-01-24-1100/el8_amd64_gcc13
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/49932/50874/install.sh to create a dev area with all the needed externals and cmssw changes.

Failed Unit Tests

I found 1 errors in the following unit tests:

---> test testProduceNanoHLT had ERRORS

Comparison Summary

Summary:

  • You potentially removed 127 lines from the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 4274 differences found in the comparisons
  • DQMHistoTests: Total files compared: 52
  • DQMHistoTests: Total histograms compared: 4021886
  • DQMHistoTests: Total failures: 1526
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4020340
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 5501.189 KiB( 51 files compared)
  • DQMHistoSizes: changed ( 34434.0,... ): 844.816 KiB HGCAL/HGCalValidator
  • DQMHistoSizes: changed ( 34434.0,... ): 424.385 KiB HLT/HGCAL
  • Checked 222 log files, 193 edm output root files, 52 DQM output files
  • TriggerResults: found differences in 5 / 50 workflows

Max Memory Comparisons exceeding threshold

@cms-sw/core-l2 , I found 9 workflow step(s) with memory usage exceeding the error threshold:

Expand to see workflows ...
  • Error: Workflow 34434.0_TTbar_14TeV+Run4D121 step3 max memory diff 270.5 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.0_TTbar_14TeV+Run4D121 step2 max memory diff 508.8 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.75_TTbar_14TeV+Run4D121_HLT75e33Timing step2 max memory diff 330.1 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.911_TTbar_14TeV+Run4D121_DD4hep step3 max memory diff 323.8 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.911_TTbar_14TeV+Run4D121_DD4hep step2 max memory diff 575.5 exceeds +/- 90.0 MiB
  • Error: Workflow 34496.0_CloseByPGun_CE_E_Front_120um+Run4D121 step3 max memory diff 205.6 exceeds +/- 90.0 MiB
  • Error: Workflow 34496.0_CloseByPGun_CE_E_Front_120um+Run4D121 step2 max memory diff 382.0 exceeds +/- 90.0 MiB
  • Error: Workflow 34500.0_CloseByPGun_CE_H_Coarse_Scint+Run4D121 step2 max memory diff 402.7 exceeds +/- 90.0 MiB
  • Error: Workflow 34500.0_CloseByPGun_CE_H_Coarse_Scint+Run4D121 step3 max memory diff 138.8 exceeds +/- 90.0 MiB

@felicepantaleo
Copy link
Copy Markdown
Contributor Author

@cmsbuild please test

@cmsbuild
Copy link
Copy Markdown
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49932/47683

@felicepantaleo
Copy link
Copy Markdown
Contributor Author

@cmsbuild please test

@cmsbuild
Copy link
Copy Markdown
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49932/49092

@cmsbuild
Copy link
Copy Markdown
Contributor

@cmsbuild
Copy link
Copy Markdown
Contributor

-1

Failed Tests: RelVals RelVals-INPUT AddOn
Size: This PR adds an extra 1020KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-667298/52799/summary.html
COMMIT: 9bac3b7
CMSSW: CMSSW_17_0_X_2026-04-21-1100/el8_amd64_gcc13
Additional Tests: HLT_P2_INTEGRATION,HLT_P2_TIMING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/49932/52799/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-667298/52799/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-667298/52799/git-merge-result

HLT P2 Timing: chart

Failed RelVals

  • 135.4135.4_ZEEFS_13/step1_ZEEFS_13.log
  • 9.09.0_Higgs200ChargedTaus/step3_Higgs200ChargedTaus.log
  • 34434.7534434.75_TTbar_14TeV+Run4D121_HLT75e33Timing/step2_TTbar_14TeV+Run4D121_HLT75e33Timing.log
Expand to see more relval errors ...

Failed RelVals-INPUT

  • 159.01159.01_HydjetQ_reminiaodPbPb2022_INPUT/step2_HydjetQ_reminiaodPbPb2022_INPUT.log
  • 2500.02042500.0204_NANOmcUL18reMINI/step2_NANOmcUL18reMINI.log
  • 2500.02012500.0201_NANOmcUL16APVreMINI/step2_NANOmcUL16APVreMINI.log
Expand to see more relval errors ...

Failed AddOn Tests

UNKNOWN
UNKNOWN
UNKNOWN

@felicepantaleo
Copy link
Copy Markdown
Contributor Author

@cmsbuild please test

@cmsbuild
Copy link
Copy Markdown
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49932/49094

@cmsbuild
Copy link
Copy Markdown
Contributor

@cmsbuild
Copy link
Copy Markdown
Contributor

-1

Failed Tests: RelVals RelVals-INPUT AddOn
Size: This PR adds an extra 1040KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-667298/52814/summary.html
COMMIT: b595e52
CMSSW: CMSSW_17_0_X_2026-04-21-2300/el8_amd64_gcc13
Additional Tests: HLT_P2_INTEGRATION,HLT_P2_TIMING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/49932/52814/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-667298/52814/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-667298/52814/git-merge-result

HLT P2 Timing: chart

Failed RelVals

  • 135.4135.4_ZEEFS_13/step1_ZEEFS_13.log
  • 9.09.0_Higgs200ChargedTaus/step3_Higgs200ChargedTaus.log
  • 34434.7534434.75_TTbar_14TeV+Run4D121_HLT75e33Timing/step2_TTbar_14TeV+Run4D121_HLT75e33Timing.log
Expand to see more relval errors ...

Failed RelVals-INPUT

  • 159.01159.01_HydjetQ_reminiaodPbPb2022_INPUT/step2_HydjetQ_reminiaodPbPb2022_INPUT.log
  • 2500.02042500.0204_NANOmcUL18reMINI/step2_NANOmcUL18reMINI.log
  • 2500.02012500.0201_NANOmcUL16APVreMINI/step2_NANOmcUL16APVreMINI.log
Expand to see more relval errors ...

Failed AddOn Tests

UNKNOWN
UNKNOWN
UNKNOWN

Copy link
Copy Markdown
Contributor

@fwyzard fwyzard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A suggestion to clean up some of the HLT configuration.

@cmsbuild
Copy link
Copy Markdown
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49932/49107

@cmsbuild
Copy link
Copy Markdown
Contributor

@felicepantaleo
Copy link
Copy Markdown
Contributor Author

@cmsbuild please test

@felicepantaleo
Copy link
Copy Markdown
Contributor Author

@cmsbuild please test

@cmsbuild
Copy link
Copy Markdown
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49932/49110

@cmsbuild
Copy link
Copy Markdown
Contributor

@cmsbuild
Copy link
Copy Markdown
Contributor

-1

Failed Tests: RelVals-INPUT
Size: This PR adds an extra 44KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-667298/52843/summary.html
COMMIT: b975a7e
CMSSW: CMSSW_17_0_X_2026-04-23-1100/el8_amd64_gcc13
Additional Tests: HLT_P2_INTEGRATION,HLT_P2_TIMING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/49932/52843/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-667298/52843/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-667298/52843/git-merge-result

HLT P2 Timing: chart

Failed RelVals-INPUT

  • 141.034DAS Error

Comparison Summary

Summary:

  • You potentially removed 417 lines from the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 6461 differences found in the comparisons
  • DQMHistoTests: Total files compared: 65
  • DQMHistoTests: Total histograms compared: 4528507
  • DQMHistoTests: Total failures: 8212
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 4520274
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 26594.260000000002 KiB( 64 files compared)
  • DQMHistoSizes: changed ( 34434.751,... ): 1295.576 KiB HLT/HGCAL
  • DQMHistoSizes: changed ( 34434.7521 ): 884.773 KiB HLT/HGCAL
  • DQMHistoSizes: changed ( 34434.771 ): 1303.584 KiB HLT/HGCAL
  • DQMHistoSizes: changed ( 34434.775,... ): 1727.969 KiB HLT/HGCAL
  • DQMHistoSizes: changed ( 34434.0,... ): 844.816 KiB HGCAL/HGCalValidator
  • DQMHistoSizes: changed ( 34434.0,... ): -22.112 KiB HGCAL/PFCandidates
  • DQMHistoSizes: changed ( 34434.0,... ): -1.207 KiB HGCAL/0
  • DQMHistoSizes: changed ( 34496.0 ): -0.008 KiB MessageLogger/Warnings
  • Checked 273 log files, 234 edm output root files, 65 DQM output files
  • TriggerResults: found differences in 16 / 63 workflows

Max Memory Comparisons exceeding threshold

@cms-sw/core-l2 , I found 19 workflow step(s) with memory usage exceeding the error threshold:

Expand to see workflows ...
  • Error: Workflow 34434.0_TTbar_14TeV+Run4D121 step2 max memory diff 171.2 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.75_TTbar_14TeV+Run4D121_HLT75e33Timing step2 max memory diff 106.6 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.7502_TTbar_14TeV+Run4D121_HLT75e33TrackingNtuple step2 max memory diff 201.3 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.751_TTbar_14TeV+Run4D121_HLT75e33TimingAlpaka step2 max memory diff 109.0 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.7521_TTbar_14TeV+Run4D121_HLT75e33TimingTiclV5TrackLinkGNN step2 max memory diff -255.5 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.753_TTbar_14TeV+Run4D121_HLT75e33TimingLegacyTracking step2 max memory diff 117.1 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.755_TTbar_14TeV+Run4D121_HLT75e33TimingLST step2 max memory diff 106.5 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.757_TTbar_14TeV+Run4D121_HLT75e33TimingMkFitFit step2 max memory diff 105.9 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.758_TTbar_14TeV+Run4D121_HLT75e33TimingTiclBarrel step2 max memory diff 106.9 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.759_TTbar_14TeV+Run4D121_HLTPhase2WithNano step2 max memory diff 147.5 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.7591_TTbar_14TeV+Run4D121_HLTPhase2WithNanoValid step2 max memory diff 211.6 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.771_TTbar_14TeV+Run4D121_NGTScoutingAll step2 max memory diff -343.9 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.772_TTbar_14TeV+Run4D121_NGTScoutingWithNano step2 max memory diff 94.3 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.773_TTbar_14TeV+Run4D121_NGTScoutingWithNanoVal step2 max memory diff 122.7 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.775_TTbar_14TeV+Run4D121_NGTScoutingCAExtensionMergeT5 step2 max memory diff 107.7 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.911_TTbar_14TeV+Run4D121_DD4hep step2 max memory diff 208.4 exceeds +/- 90.0 MiB
  • Error: Workflow 34496.0_CloseByPGun_CE_E_Front_120um+Run4D121 step2 max memory diff 155.0 exceeds +/- 90.0 MiB
  • Error: Workflow 34500.0_CloseByPGun_CE_H_Coarse_Scint+Run4D121 step2 max memory diff 172.7 exceeds +/- 90.0 MiB
  • Error: Workflow 34634.999_TTbar_14TeV+Run4D121PU_PMXS1S2PR step3 max memory diff 186.4 exceeds +/- 90.0 MiB

@felicepantaleo
Copy link
Copy Markdown
Contributor Author

@cmsbuild ignore tests-rejected with external-failure

@felicepantaleo
Copy link
Copy Markdown
Contributor Author

@cmsbuild please test

@cmsbuild
Copy link
Copy Markdown
Contributor

+1

Size: This PR adds an extra 16KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-667298/52884/summary.html
COMMIT: b975a7e
CMSSW: CMSSW_17_0_X_2026-04-26-2300/el8_amd64_gcc13
Additional Tests: HLT_P2_INTEGRATION,HLT_P2_TIMING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/49932/52884/install.sh to create a dev area with all the needed externals and cmssw changes.

HLT P2 Timing: chart

Comparison Summary

Summary:

  • You potentially removed 476 lines from the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 6248 differences found in the comparisons
  • DQMHistoTests: Total files compared: 65
  • DQMHistoTests: Total histograms compared: 4528617
  • DQMHistoTests: Total failures: 7990
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 4520606
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 26594.260000000002 KiB( 64 files compared)
  • DQMHistoSizes: changed ( 34434.751,... ): 1295.576 KiB HLT/HGCAL
  • DQMHistoSizes: changed ( 34434.7521 ): 884.773 KiB HLT/HGCAL
  • DQMHistoSizes: changed ( 34434.771 ): 1303.584 KiB HLT/HGCAL
  • DQMHistoSizes: changed ( 34434.775,... ): 1727.969 KiB HLT/HGCAL
  • DQMHistoSizes: changed ( 34434.0,... ): 844.816 KiB HGCAL/HGCalValidator
  • DQMHistoSizes: changed ( 34434.0,... ): -22.112 KiB HGCAL/PFCandidates
  • DQMHistoSizes: changed ( 34434.0,... ): -1.207 KiB HGCAL/0
  • DQMHistoSizes: changed ( 34496.0 ): -0.008 KiB MessageLogger/Warnings
  • Checked 273 log files, 234 edm output root files, 65 DQM output files
  • TriggerResults: found differences in 17 / 63 workflows

Max Memory Comparisons exceeding threshold

@cms-sw/core-l2 , I found 19 workflow step(s) with memory usage exceeding the error threshold:

Expand to see workflows ...
  • Error: Workflow 34434.0_TTbar_14TeV+Run4D121 step2 max memory diff 175.8 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.7502_TTbar_14TeV+Run4D121_HLT75e33TrackingNtuple step2 max memory diff 198.7 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.751_TTbar_14TeV+Run4D121_HLT75e33TimingAlpaka step2 max memory diff 111.3 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.7521_TTbar_14TeV+Run4D121_HLT75e33TimingTiclV5TrackLinkGNN step2 max memory diff -258.5 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.753_TTbar_14TeV+Run4D121_HLT75e33TimingLegacyTracking step2 max memory diff 110.8 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.754_TTbar_14TeV+Run4D121_HLT75e33TimingLegacyTrackingPatatrackQuads step2 max memory diff 112.8 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.755_TTbar_14TeV+Run4D121_HLT75e33TimingLST step2 max memory diff 109.3 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.756_TTbar_14TeV+Run4D121_HLT75e33TimingTrimmedTracking step2 max memory diff 111.4 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.757_TTbar_14TeV+Run4D121_HLT75e33TimingMkFitFit step2 max memory diff 106.9 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.758_TTbar_14TeV+Run4D121_HLT75e33TimingTiclBarrel step2 max memory diff 101.0 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.759_TTbar_14TeV+Run4D121_HLTPhase2WithNano step2 max memory diff 208.2 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.7591_TTbar_14TeV+Run4D121_HLTPhase2WithNanoValid step2 max memory diff 191.7 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.77_TTbar_14TeV+Run4D121_NGTScouting step2 max memory diff 104.3 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.771_TTbar_14TeV+Run4D121_NGTScoutingAll step2 max memory diff -250.9 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.773_TTbar_14TeV+Run4D121_NGTScoutingWithNanoVal step2 max memory diff 121.6 exceeds +/- 90.0 MiB
  • Error: Workflow 34434.911_TTbar_14TeV+Run4D121_DD4hep step2 max memory diff 204.7 exceeds +/- 90.0 MiB
  • Error: Workflow 34496.0_CloseByPGun_CE_E_Front_120um+Run4D121 step2 max memory diff 200.2 exceeds +/- 90.0 MiB
  • Error: Workflow 34500.0_CloseByPGun_CE_H_Coarse_Scint+Run4D121 step2 max memory diff 176.9 exceeds +/- 90.0 MiB
  • Error: Workflow 34634.999_TTbar_14TeV+Run4D121PU_PMXS1S2PR step3 max memory diff 213.3 exceeds +/- 90.0 MiB

@makortel
Copy link
Copy Markdown
Contributor

Is the ~100-200 MB increase in peak allocated memory along expectations?

@felicepantaleo
Copy link
Copy Markdown
Contributor Author

Memory profile of the v5 PR in ttbar + 200PU

The memory profile was studied in ttbar + 200PU events by comparing the maximum allocated bytes per module in step2 and step3. The comparison uses both v4 and the current v5 release as baselines. The most relevant effect of the v5 PR is a large reduction of the peak memory associated with the TICL modules that dominate the v5 memory footprint.

Step2

Using v4 as baseline, the current v5 release shows a very large increase in peak memory for several TICL modules. In particular, hltTiclCandidate is absent in v4 but reaches 2.34 GiB in v5. Similarly, hltTiclTrackstersCLUE3DHigh increases from 189.8 MiB in v4 to 1.15 GiB in v5, corresponding to an increase of about 988 MiB, or +520%. Other sizeable v5-only contributions are observed in hltTiclTrackstersRecovery, at 338.1 MiB, and in hltTiclTracksterLinksSuperclusteringDNNUnseeded, at 272.9 MiB.

The v5 PR reduces these memory peaks very substantially. The largest improvement is observed for hltTiclCandidate, which goes from 2.34 GiB in v5 to 11.05 MiB in the PR, corresponding to a reduction of 2.33 GiB, or -99.5%. The hltTiclTrackstersCLUE3DHigh module is reduced from 1.15 GiB to 16.32 MiB, a reduction of 1.13 GiB, or -98.6%. The hltTiclTrackstersRecovery peak decreases from 338.1 MiB to 17.74 MiB, a reduction of 320.4 MiB, or -94.8%. The DNN superclustering links for the unseeded collection decrease from 272.9 MiB to 1.33 MiB, corresponding to -99.5%.

With respect to the original v4 baseline, the PR also brings hltTiclTrackstersCLUE3DHigh well below the v4 memory level: 16.32 MiB in the PR compared to 189.8 MiB in v4, a reduction of 173.5 MiB, or -91.4%. The L1-seeded CLUE3D collection remains above v4, increasing from 1.06 MiB to 4.60 MiB, but this is much smaller than the v5 release value of 68.33 MiB.

Overall, in step2 the PR removes the multi-GiB TICL memory regression introduced in v5 and reduces the dominant TICL allocations by approximately two orders of magnitude relative to the v5 release.

Step3

The same pattern is visible in step3. With v4 as baseline, the v5 release introduces large TICL memory peaks: ticlCandidate is absent in v4 and reaches 2.33 GiB in v5, while ticlTrackstersCLUE3DHigh increases from 189.8 MiB to 1.16 GiB, corresponding to an increase of 998.0 MiB, or +525.8%. The ticlTrackstersRecovery module is also absent in v4 and reaches 338.1 MiB in v5.

The v5 PR strongly reduces these peaks. Relative to the v5 release, ticlCandidate decreases from 2.33 GiB to 12.54 MiB, a reduction of 2.32 GiB, or -99.5%. The ticlTrackstersCLUE3DHigh module decreases from 1.16 GiB to 16.32 MiB, a reduction of 1.14 GiB, or -98.6%. The ticlTrackstersRecovery module decreases from 338.1 MiB to 17.74 MiB, a reduction of 320.4 MiB, or -94.8%. The DNN trackster links decrease from 15.29 MiB to 1.33 MiB, corresponding to -91.3%.

Compared to v4, the PR again brings the main CLUE3D TICL peak significantly below the v4 value: ticlTrackstersCLUE3DHigh is reduced from 189.8 MiB in v4 to 16.32 MiB in the PR, a reduction of 173.5 MiB, or -91.4%. The PR also removes or avoids several v5-era modules relative to the step3 comparison, such as particleFlowClusterHGCalFromSimCl, hgcalMultiClusters, simPFProducer, and hgcalTrackCollection, which appear as absent in the PR in the v5-baseline comparison.

There are some non-TICL memory increases in the step3 PR. For example, combinatoricRecoTaus increases from 54.65 MiB to 76.99 MiB, a 22.34 MiB increase, and packedPFCandidates increases from 11.68 MiB to 25.70 MiB, a 14.02 MiB increase. deepMETsResolutionTune also increases from 557.1 KiB to 17.05 MiB. These increases are much smaller than the TICL reductions, but they should be kept in mind when evaluating the full reconstruction memory profile.

The v5 PR provides a very large reduction of the dominant TICL memory peaks in both step2 and step3. The most important improvements are:

  • hltTiclCandidate / ticlCandidate: about 2.3 GiB reduced to about 11-13 MiB, corresponding to -99.5%.
  • hltTiclTrackstersCLUE3DHigh / ticlTrackstersCLUE3DHigh: about 1.15-1.16 GiB reduced to 16.32 MiB, corresponding to about -98.6%.
  • hltTiclTrackstersRecovery / ticlTrackstersRecovery: 338.1 MiB reduced to 17.74 MiB, corresponding to -94.8%.
  • DNN superclustering trackster links: reduced by more than 90%, and in step2 by about 99.5-99.7%.

The PR therefore fixes the main TICL memory regression seen in the v5 release. In the dominant TICL modules, the PR memory usage is not only far below the v5 release, but in several cases also below the original v4 baseline. The remaining increases are comparatively smaller and mostly outside the dominant TICL memory peaks.

step2: v4 as baseline

Module v4_step2 max bytes v5_step2 max bytes delta bytes v5_step2 vs v4_step2 delta percent v5_step2 vs v4_step2 v5_PR_step2 max bytes delta bytes v5_PR_step2 vs v4_step2 delta percent v5_PR_step2 vs v4_step2 Presence
hltTiclCandidate (TICLCandidateProducer) absent 2.34 GiB 2.34 GiB n/a 11.05 MiB 11.05 MiB n/a 2/3 files
hltTiclTrackstersCLUE3DHigh (TrackstersProducer) 189.8 MiB 1.15 GiB 987.8 MiB +520.4% 16.32 MiB -173.5 MiB -91.4% 3/3 files
hltTiclTrackstersRecovery (TrackstersProducer) absent 338.1 MiB 338.1 MiB n/a 17.74 MiB 17.74 MiB n/a 2/3 files
hltTiclTracksterLinksSuperclusteringDNNUnseeded (TracksterLinksProduc? absent 272.9 MiB 272.9 MiB n/a 1.33 MiB 1.33 MiB n/a 2/3 files
hltTiclTrackstersMerge (TrackstersMergeProducer) 139.3 MiB absent -139.3 MiB n/a absent -139.3 MiB n/a 1/3 files
hltTiclTrackstersCLUE3DHighL1Seeded (TrackstersProducer) 1.06 MiB 68.33 MiB 67.27 MiB +6346.2% 4.60 MiB 3.54 MiB +334.0% 3/3 files
hltParticleFlowClusterHGCalFromTICLUnseeded (PFClusterProducer) 19.86 MiB absent -19.86 MiB n/a absent -19.86 MiB n/a 1/3 files
hltParticleFlowSuperClusterHGCalFromTICLUnseeded (PFECALSuperClusterP? 19.72 MiB absent -19.72 MiB n/a absent -19.72 MiB n/a 1/3 files
hltTiclTracksterLinksSuperclusteringDNNL1Seeded (TracksterLinksProduc? absent 17.25 MiB 17.25 MiB n/a 55.84 KiB 55.84 KiB n/a 2/3 files
hltTiclTracksterLinks (TracksterLinksProducer) absent 14.21 MiB 14.21 MiB n/a 14.22 MiB 14.22 MiB n/a 2/3 files

step 2: v5 (release) as baseline

Module v5_step2 max bytes v5_PR_step2 max bytes delta bytes v5_PR_step2 vs v5_step2 delta percent v5_PR_step2 vs v5_step2 Presence
hltTiclCandidate (TICLCandidateProducer) 2.34 GiB 11.05 MiB -2.33 GiB -99.5% 2/2 files
hltTiclTrackstersCLUE3DHigh (TrackstersProducer) 1.15 GiB 16.32 MiB -1.13 GiB -98.6% 2/2 files
hltTiclTrackstersRecovery (TrackstersProducer) 338.1 MiB 17.74 MiB -320.4 MiB -94.8% 2/2 files
hltTiclTracksterLinksSuperclusteringDNNUnseeded (TracksterLinksProduc? 272.9 MiB 1.33 MiB -271.6 MiB -99.5% 2/2 files
hltTiclTrackstersCLUE3DHighL1Seeded (TrackstersProducer) 68.33 MiB 4.60 MiB -63.73 MiB -93.3% 2/2 files
mix (MixingModule) 1.90 GiB 1.86 GiB -40.96 MiB -2.1% 2/2 files
hltTiclTracksterLinksSuperclusteringDNNL1Seeded (TracksterLinksProduc? 17.25 MiB 55.84 KiB -17.20 MiB -99.7% 2/2 files

step3 v4 as baseline

Module v4_step3 max bytes v5_step3 max bytes delta bytes v5_step3 vs v4_step3 delta percent v5_step3 vs v4_step3 v5_PR_step3 max bytes delta bytes v5_PR_step3 vs v4_step3 delta percent v5_PR_step3 vs v4_step3 Presence
ticlCandidate (TICLCandidateProducer) absent 2.33 GiB 2.33 GiB n/a 12.54 MiB 12.54 MiB n/a 2/3 files
ticlTrackstersCLUE3DHigh (TrackstersProducer) 189.8 MiB 1.16 GiB 998.0 MiB +525.8% 16.32 MiB -173.5 MiB -91.4% 3/3 files
ticlTrackstersRecovery (TrackstersProducer) absent 338.1 MiB 338.1 MiB n/a 17.74 MiB 17.74 MiB n/a 2/3 files
particleFlowClusterHGCalFromSimCl (PFClusterProducer) 124.6 MiB 119.2 MiB -5.40 MiB -4.3% absent -124.6 MiB n/a 2/3 files
ticlTrackstersMerge (TrackstersMergeProducer) 70.37 MiB absent -70.37 MiB n/a absent -70.37 MiB n/a 1/3 files
combinatoricRecoTausBoosted (RecoTauProducer) 86.72 MiB 29.40 MiB -57.32 MiB -66.1% 73.14 MiB -13.58 MiB -15.7% 3/3 files
allHitToTracksterAssociations (AllHitToTracksterAssociatorsProducer) 69.13 MiB 102.5 MiB 33.37 MiB +48.3% 101.7 MiB 32.57 MiB +47.1% 3/3 files
hltAllHitToTracksterAssociations (AllHitToTracksterAssociatorsProduce? 83.29 MiB 102.4 MiB 19.11 MiB +22.9% 115.6 MiB 32.31 MiB +38.8% 3/3 files
hgcalMultiClusters (HGCalMultiClusterProducer) 24.26 MiB 24.61 MiB 358.4 KiB +1.4% absent -24.26 MiB n/a 2/3 files
simPFProducer (SimPFProducer) 16.12 MiB 16.02 MiB -102.4 KiB -0.6% absent -16.12 MiB n/a 2/3 files
ticlTracksterLinksSuperclusteringDNN (TracksterLinksProducer) absent 15.29 MiB 15.29 MiB n/a 1.33 MiB 1.33 MiB n/a 2/3 files
packedPFCandidates (PATPackedCandidateProducer) 26.28 MiB 11.68 MiB -14.60 MiB -55.6% 25.70 MiB -593.9 KiB -2.2% 3/3 files
ticlTracksterLinks (TracksterLinksProducer) absent 14.21 MiB 14.21 MiB n/a 14.22 MiB 14.22 MiB n/a 2/3 files
electronCkfTrackCandidates (CkfTrackCandidateMaker) 14.04 MiB 13.72 MiB -327.7 KiB -2.3% 1.08 MiB -12.96 MiB -92.3% 3/3 files
hgcalTrackCollection (HGCalTrackCollectionProducer) 12.91 MiB 12.78 MiB -133.1 KiB -1.0% absent -12.91 MiB n/a 2/3 files

step3 v5 as baseline

Module v5_step3 max bytes v5_PR_step3 max bytes delta bytes v5_PR_step3 vs v5_step3 delta percent v5_PR_step3 vs v5_step3 Presence
ticlCandidate (TICLCandidateProducer) 2.33 GiB 12.54 MiB -2.32 GiB -99.5% 2/2 files
ticlTrackstersCLUE3DHigh (TrackstersProducer) 1.16 GiB 16.32 MiB -1.14 GiB -98.6% 2/2 files
ticlTrackstersRecovery (TrackstersProducer) 338.1 MiB 17.74 MiB -320.4 MiB -94.8% 2/2 files
particleFlowClusterHGCalFromSimCl (PFClusterProducer) 119.2 MiB absent -119.2 MiB n/a 1/2 files
mix (MixingModule) 1.76 GiB 1.68 GiB -81.92 MiB -4.5% 2/2 files
hgcalMultiClusters (HGCalMultiClusterProducer) 24.61 MiB absent -24.61 MiB n/a 1/2 files
combinatoricRecoTaus (RecoTauProducer) 54.65 MiB 76.99 MiB 22.34 MiB +40.9% 2/2 files
particleFlowSuperClusterHGCal (PFECALSuperClusterProducer) absent 19.71 MiB 19.71 MiB n/a 1/2 files
electronGsfTracks (GsfTrackProducer) 19.08 MiB absent -19.08 MiB n/a 1/2 files
deepMETsResolutionTune (DeepMETProducer) 557.1 KiB 17.05 MiB 16.51 MiB +3033.9% 2/2 files
simPFProducer (SimPFProducer) 16.02 MiB absent -16.02 MiB n/a 1/2 files
packedPFCandidates (PATPackedCandidateProducer) 11.68 MiB 25.70 MiB 14.02 MiB +120.0% 2/2 files
ticlTracksterLinksSuperclusteringDNN (TracksterLinksProducer) 15.29 MiB 1.33 MiB -13.96 MiB -91.3% 2/2 files
hltAllHitToTracksterAssociations (AllHitToTracksterAssociatorsProduce? 102.4 MiB 115.6 MiB 13.20 MiB +12.9% 2/2 files
hgcalTrackCollection (HGCalTrackCollectionProducer) 12.78 MiB absent -12.78 MiB n/a 1/2 files
electronCkfTrackCandidates (CkfTrackCandidateMaker) 13.72 MiB 1.08 MiB -12.64 MiB -92.1% 2/2 files

@felicepantaleo
Copy link
Copy Markdown
Contributor Author

Is the ~100-200 MB increase in peak allocated memory along expectations?

at job level I have no idea, but we are doing more physics and more reconstruction, so I guess..

@felicepantaleo
Copy link
Copy Markdown
Contributor Author

@mandrenguyen it would be nice to have this in pre1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants