speedup SiStripClusterizer(FromRaw) using ThreeThresholdAlgorithm#47061
speedup SiStripClusterizer(FromRaw) using ThreeThresholdAlgorithm#47061cmsbuild merged 4 commits intocms-sw:masterfrom
Conversation
…efiting from .cc inlines
|
cms-bot internal usage |
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47061/43220 |
|
A new Pull Request was created by @slava77 for master. It involves the following packages:
@atpathak, @cmsbuild, @consuegs, @jfernan2, @mandrenguyen, @perrotta can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
|
@cmsbuild please test |
|
type performance-improvements |
|
+1 Size: This PR adds an extra 32KB to repository Comparison SummarySummary:
|
these are apparently from non-reproducible workflows: .7 (all-mkfit), and phase-2 |
|
+1 |
|
+alca
|
|
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @antoniovilela, @rappoccio, @mandrenguyen (and backports should be raised in the release meeting by the corresponding L2) |
| }; | ||
| return L == R; | ||
| } | ||
| static constexpr uint16_t kMaxStrips = 768; |
There was a problem hiding this comment.
could one think about sparing some empty entries by using vectors and resizing at construction time with the correct number of strips (4 APVs vs 6 APVs)? The size is known via the detid.
There was a problem hiding this comment.
@mmusich
do you happen to know if the number of APVs is strictly connected to the already available variables in the SiStripClusterizerConditions::emplace_back method: invGains or connections ?
There was a problem hiding this comment.
I think invGains.size() should know about it:
There was a problem hiding this comment.
using vectors
I expect that access to vector<bool> is slower than array<bool. I guess I can use vector<uint8_t> or char to have a matching performance
|
+1 |
I'm not sure why 16x; there are 14.5K strip modules. |
|
Could the following failure be related to this PR? |
nope, see #46853 (comment) |
Ah, thanks. Sorry for the noise. |
|
Following the TSG timing checks of the full unpacking/clustering showing possible slow down, I rechecked the impact and (unfortunately) confirm on data inputs (full menu run in 386593):
I can speculate that callgrind is so slow that the underlying hardware memory access costs are mis-represented |
The primary goal was to speedup the full-unpacking configuration of
SiStripClusterizerFromRaw.The following relatively straightforward updates are made:
ThreeThresholdAlgorithmis passed using templates and most methods ofThreeThresholdAlgorithmare inlinedSiStripClusterizerConditions(the downside here is around 32MB of memory in conditions data and the cost of precomputation equal to around 10 events of strip unpacking cost)Overall
SiStripClusterizerFromRawis faster by 27% on ttbar relval Run3 MC, running with full unpacking (measured with callgrind on 300 events on the HLT config in CMSSW_14_1_4_patch5):No differences in physics results are expected.
In case backports are needed, the commits are made so that the backport is trivial down to 14_1_X.
@mmasciov