speedup SiStripClusterizer(FromRaw) using ThreeThresholdAlgorithm by slava77 · Pull Request #47061 · cms-sw/cmssw

slava77 · 2025-01-08T19:22:55Z

The primary goal was to speedup the full-unpacking configuration of SiStripClusterizerFromRaw.

The following relatively straightforward updates are made:

concrete ThreeThresholdAlgorithm is passed using templates and most methods of ThreeThresholdAlgorithm are inlined
per-strip noise and quality bit values are precomputed in creation of SiStripClusterizerConditions (the downside here is around 32MB of memory in conditions data and the cost of precomputation equal to around 10 events of strip unpacking cost)

Overall SiStripClusterizerFromRaw is faster by 27% on ttbar relval Run3 MC, running with full unpacking (measured with callgrind on 300 events on the HLT config in CMSSW_14_1_4_patch5):

inlines and templates give 14%
conditions precomputation another 13%

No differences in physics results are expected.

In case backports are needed, the commits are made so that the backport is trivial down to 14_1_X.

@mmasciov

…efiting from .cc inlines

…ines to the .h

cmsbuild · 2025-01-08T19:23:18Z

cms-bot internal usage

cmsbuild · 2025-01-08T19:24:39Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47061/43220

cmsbuild · 2025-01-08T19:25:05Z

A new Pull Request was created by @slava77 for master.

It involves the following packages:

CalibFormats/SiStripObjects (alca)
RecoLocalTracker/SiStripClusterizer (reconstruction)

@atpathak, @cmsbuild, @consuegs, @jfernan2, @mandrenguyen, @perrotta can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @VinInn, @VourMa, @alesaggio, @echabert, @felicepantaleo, @gbenelli, @gpetruc, @jlidrych, @missirol, @mmusich, @mtosi, @robervalwalsh, @rovere, @rsreds, @threus, @tocheng, @yduhm, @yuanchao this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

slava77 · 2025-01-08T19:25:22Z

@cmsbuild please test

mmusich · 2025-01-08T20:35:56Z

type performance-improvements

cmsbuild · 2025-01-08T21:31:33Z

+1

Size: This PR adds an extra 32KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-06aba9/43681/summary.html
COMMIT: 49e294b
CMSSW: CMSSW_15_0_X_2025-01-08-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/47061/43681/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

You potentially removed 3 lines from the logs
Reco comparison results: 351 differences found in the comparisons
DQMHistoTests: Total files compared: 49
DQMHistoTests: Total histograms compared: 3818730
DQMHistoTests: Total failures: 6113
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3812597
DQMHistoTests: Total skipped: 20
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
Checked 214 log files, 184 edm output root files, 49 DQM output files
TriggerResults: no differences found

slava77 · 2025-01-08T23:09:56Z

Reco comparison results: 351 differences found in the comparisons

DQMHistoTests: Total failures: 6113

these are apparently from non-reproducible workflows: .7 (all-mkfit), and phase-2

jfernan2 · 2025-01-09T09:32:10Z

+1

perrotta · 2025-01-09T09:38:25Z

+alca

Memory increase in SiStripClusterizerConditions is 16x768 bool's and 16x768 uint16_t's, which is manageable I guess

cmsbuild · 2025-01-09T09:38:48Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @antoniovilela, @rappoccio, @mandrenguyen (and backports should be raised in the release meeting by the corresponding L2)

mmusich · 2025-01-09T09:45:19Z

      };
      return L == R;
    }
+    static constexpr uint16_t kMaxStrips = 768;


could one think about sparing some empty entries by using vectors and resizing at construction time with the correct number of strips (4 APVs vs 6 APVs)? The size is known via the detid.

@mmusich
do you happen to know if the number of APVs is strictly connected to the already available variables in the SiStripClusterizerConditions::emplace_back method: invGains or connections ?

I think invGains.size() should know about it:

cmssw/RecoLocalTracker/SiStripClusterizer/plugins/SiStripClusterizerConditionsESProducer.cc

Lines 68 to 71 in f0dc95b

std::vector<float> invGains;

invGains.reserve(6);

std::transform(

gainRange.first, gainRange.second, std::back_inserter(invGains), [](auto gain) { return 1.f / gain; });

using vectors

I expect that access to vector<bool> is slower than array<bool. I guess I can use vector<uint8_t> or char to have a matching performance

See also #47470 (comment)

mandrenguyen · 2025-01-09T10:16:43Z

+1
Perhaps we take note of the following comment from Marco as a potential follow-up improvement:
#47061 (comment)

slava77 · 2025-01-09T13:55:13Z

Memory increase in SiStripClusterizerConditions is 16x768 bool's and 16x768 uint16_t's, which is manageable I guess

I'm not sure why 16x; there are 14.5K strip modules.

mandrenguyen · 2025-01-10T08:29:19Z

Could the following failure be related to this PR?
https://cmssdt.cern.ch/SDT/html/cmssdt-ib/#/relVal/CMSSW_15_0/2025-01-09-2300?selectedArchs=el8_amd64_gcc12&selectedFlavors=X&selectedStatus=failed

mmusich · 2025-01-10T08:31:08Z

Could the following failure be related to this PR?

nope, see #46853 (comment)

mandrenguyen · 2025-01-10T08:34:38Z

Could the following failure be related to this PR?

nope, see #46853 (comment)

Ah, thanks. Sorry for the noise.

slava77 · 2025-04-07T22:57:24Z

Following the TSG timing checks of the full unpacking/clustering showing possible slow down, I rechecked the impact and (unfortunately) confirm on data inputs (full menu run in 386593):

valgrind callgrind still shows a speed up of around 25%
FastTimerService shows around 10% slowdown

I can speculate that callgrind is so slow that the underlying hardware memory access costs are mis-represented

…7/stripsTiming" This reverts commit d288c81, reversing changes made to 6eea5fa.

slava77devel added 4 commits January 8, 2025 11:06

use concrete type ThreeThresholdAlgorithm in ClusterFromRaw; keep ben…

2950fe0

…efiting from .cc inlines

use concrete type ThreeThresholdAlgorithm in ClusterFromRaw; move inl…

0cbff7f

…ines to the .h

preserve ability to use non-ThreeThresholdAlgorithm with templates

0223015

precompute strip noise and quality in SiStripClusterizerConditions

49e294b

cmsbuild added this to the CMSSW_15_0_X milestone Jan 8, 2025

cmsbuild added reconstruction-pending alca-pending pending-signatures tests-pending orp-pending code-checks-pending trk labels Jan 8, 2025

cmsbuild added code-checks-approved and removed code-checks-pending labels Jan 8, 2025

cmsbuild added tests-started and removed tests-pending labels Jan 8, 2025

cmsbuild added the performance-improvements label Jan 8, 2025

cmsbuild added tests-approved and removed tests-started labels Jan 8, 2025

cmsbuild added reconstruction-approved and removed reconstruction-pending labels Jan 9, 2025

cmsbuild added alca-approved fully-signed and removed alca-pending pending-signatures labels Jan 9, 2025

mmusich reviewed Jan 9, 2025

View reviewed changes

cmsbuild added orp-approved and removed orp-pending labels Jan 9, 2025

cmsbuild merged commit d288c81 into cms-sw:master Jan 9, 2025

Dr15Jones mentioned this pull request Feb 27, 2025

Increase memory consumption in CMSSW_15_0_X #47470

Closed

slava77 mentioned this pull request Apr 7, 2025

Revert "speedup SiStripClusterizer(FromRaw) using ThreeThresholdAlgorithm" #47803

Merged

slava77 pushed a commit to slava77/cmssw that referenced this pull request Apr 8, 2025

Revert "Merge pull request cms-sw#47061 from slava77/CMSSW_14_1_0_pre…

5e79aee

…7/stripsTiming" This reverts commit d288c81, reversing changes made to 6eea5fa.

slava77 mentioned this pull request Apr 8, 2025

revert "speedup SiStripClusterizer(FromRaw) using ThreeThresholdAlgorithm" #47810

Merged

	std::vector<float> invGains;
	invGains.reserve(6);
	std::transform(
	gainRange.first, gainRange.second, std::back_inserter(invGains), [](auto gain) { return 1.f / gain; });

Conversation

slava77 commented Jan 8, 2025

Uh oh!

cmsbuild commented Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmsbuild commented Jan 8, 2025

Uh oh!

cmsbuild commented Jan 8, 2025

Uh oh!

slava77 commented Jan 8, 2025

Uh oh!

mmusich commented Jan 8, 2025

Uh oh!

cmsbuild commented Jan 8, 2025

Comparison Summary

Uh oh!

slava77 commented Jan 8, 2025

Uh oh!

jfernan2 commented Jan 9, 2025

Uh oh!

perrotta commented Jan 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmsbuild commented Jan 9, 2025

Uh oh!

mmusich Jan 9, 2025

Choose a reason for hiding this comment

Uh oh!

slava77 Jan 9, 2025

Choose a reason for hiding this comment

Uh oh!

mmusich Jan 9, 2025

Choose a reason for hiding this comment

Uh oh!

slava77 Jan 9, 2025

Choose a reason for hiding this comment

Uh oh!

mmusich Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

mandrenguyen commented Jan 9, 2025

Uh oh!

slava77 commented Jan 9, 2025

Uh oh!

mandrenguyen commented Jan 10, 2025

Uh oh!

mmusich commented Jan 10, 2025

Uh oh!

mandrenguyen commented Jan 10, 2025

Uh oh!

slava77 commented Apr 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

cmsbuild commented Jan 8, 2025 •

edited

Loading

perrotta commented Jan 9, 2025 •

edited

Loading