Skip to content

Improve the Phase-1 pixel raw to cluster [15.0.x]#48581

Merged
cmsbuild merged 1 commit intocms-sw:CMSSW_15_0_Xfrom
fwyzard:fix_alpaka_PixelClustering_150x
Aug 4, 2025
Merged

Improve the Phase-1 pixel raw to cluster [15.0.x]#48581
cmsbuild merged 1 commit intocms-sw:CMSSW_15_0_Xfrom
fwyzard:fix_alpaka_PixelClustering_150x

Conversation

@fwyzard
Copy link
Copy Markdown
Contributor

@fwyzard fwyzard commented Jul 18, 2025

PR description:

Fix a bug in the heterogeneous pixel raw-to-cluster module:

  • include in the output the last pixel of each module. The current implementation ignores the last pixel of each module, and nobody seems to know why.

Implement a few small optimisations:

  • use only 64 groups, instead of one per module;
  • adjust the number of nearest neighbours after duplicate removal;
  • restrict the domain of atomic operations to per-block;
  • remove an unnecessary synchronisation.

The cumulative effect is to speed up the pixel local reconstruction by 7%.

machine type CMSSW_15_0_10_patch2 with #48581 relative difference
2022 HLT node (2× Milan 7763, 2× T4) 2495 ± 3 ev/s 2670 ± 6 ev/s +7%
2024 HLT node (2× Bergamo 9754, 3× L4) 11598 ± 23 ev/s 12447 ± 25 ev/s +7%

The performance of the full HLT menu is not affected, as it is limited by the CPU part.

PR validation:

Technical validation performed on top of CMSSW 15.0.10 running over 10k events from Run2025C.
Physics validation is required, though no changes are expected.

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

Backport of #48580, #48609 and #48644 to 15.0.x for the 2025 data taking.

@fwyzard
Copy link
Copy Markdown
Contributor Author

fwyzard commented Jul 18, 2025

backport #48580

@fwyzard
Copy link
Copy Markdown
Contributor Author

fwyzard commented Jul 18, 2025

enable gpu

@fwyzard
Copy link
Copy Markdown
Contributor Author

fwyzard commented Jul 18, 2025

type bugfix

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Jul 18, 2025

A new Pull Request was created by @fwyzard for CMSSW_15_0_X.

It involves the following packages:

  • RecoLocalTracker/SiPixelClusterizer (reconstruction)

@cmsbuild, @jfernan2, @mandrenguyen can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @VinInn, @VourMa, @dkotlins, @felicepantaleo, @ferencek, @gpetruc, @missirol, @mmusich, @mroguljic, @mtosi, @rovere, @threus, @tsusa, @tvami this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Jul 18, 2025

cms-bot internal usage

@fwyzard
Copy link
Copy Markdown
Contributor Author

fwyzard commented Jul 18, 2025

please test

@cmsbuild
Copy link
Copy Markdown
Contributor

-1

Failed Tests: rocmUnitTests
Size: This PR adds an extra 24KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2b153c/47281/summary.html
COMMIT: cb2ebfd
CMSSW: CMSSW_15_0_X_2025-07-18-1100/el8_amd64_gcc12
Additional Tests: CUDA,ROCM
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/48581/47281/install.sh to create a dev area with all the needed externals and cmssw changes.

ROCm Unit Tests

I found 0 errors in the following unit tests:


Comparison Summary

Summary:

  • You potentially added 3 lines to the logs
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 4020943
  • DQMHistoTests: Total failures: 9433
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4011490
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 218 log files, 189 edm output root files, 50 DQM output files
  • TriggerResults: found differences in 2 / 48 workflows

CUDA Comparison Summary

Summary:

ROCM Comparison Summary

Summary:

@cmsbuild
Copy link
Copy Markdown
Contributor

Pull request #48581 was updated. @cmsbuild, @jfernan2, @mandrenguyen can you please check and sign again.

@fwyzard
Copy link
Copy Markdown
Contributor Author

fwyzard commented Jul 19, 2025

please test

@cmsbuild
Copy link
Copy Markdown
Contributor

-1

Failed Tests: rocmUnitTests
Size: This PR adds an extra 24KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2b153c/47284/summary.html
COMMIT: 32a82bc
CMSSW: CMSSW_15_0_X_2025-07-18-2300/el8_amd64_gcc12
Additional Tests: CUDA,ROCM
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/48581/47284/install.sh to create a dev area with all the needed externals and cmssw changes.

ROCm Unit Tests

I found 0 errors in the following unit tests:


Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 4020943
  • DQMHistoTests: Total failures: 9481
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4011442
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 218 log files, 189 edm output root files, 50 DQM output files
  • TriggerResults: found differences in 2 / 48 workflows

CUDA Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 7
  • DQMHistoTests: Total histograms compared: 53445
  • DQMHistoTests: Total failures: 8
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 53437
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
  • Checked 24 log files, 30 edm output root files, 7 DQM output files
  • TriggerResults: no differences found

ROCM Comparison Summary

Summary:

@fwyzard fwyzard force-pushed the fix_alpaka_PixelClustering_150x branch from 32a82bc to 3fc2a80 Compare July 19, 2025 12:48
Fix a bug in the heterogeneous pixel raw-to-cluster module:
  - include in the output the last pixel of each module.

Implement a few small optimisations:
  - use only 64 groups, instead of one per module;
  - adjust the number of nearest neighbours after duplicate removal;
  - restrict the domain of atomic operations to per-block;
  - remove an unnecessary synchronisation.
@fwyzard fwyzard force-pushed the fix_alpaka_PixelClustering_150x branch from 2ea174d to b0b7322 Compare July 29, 2025 00:10
@cmsbuild
Copy link
Copy Markdown
Contributor

Pull request #48581 was updated. @cmsbuild, @jfernan2, @mandrenguyen can you please check and sign again.

@fwyzard
Copy link
Copy Markdown
Contributor Author

fwyzard commented Jul 29, 2025

please test

@cmsbuild
Copy link
Copy Markdown
Contributor

+1

Size: This PR adds an extra 32KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2b153c/47433/summary.html
COMMIT: b0b7322
CMSSW: CMSSW_15_0_X_2025-07-28-2300/el8_amd64_gcc12
Additional Tests: CUDA,ROCM
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/48581/47433/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 2 lines to the logs
  • Reco comparison results: 8 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 4020943
  • DQMHistoTests: Total failures: 487
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4020436
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 218 log files, 189 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

CUDA Comparison Summary

Summary:

  • You potentially added 1 lines to the logs
  • Reco comparison results: 70 differences found in the comparisons
  • DQMHistoTests: Total files compared: 7
  • DQMHistoTests: Total histograms compared: 53445
  • DQMHistoTests: Total failures: 4244
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 49201
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
  • Checked 24 log files, 30 edm output root files, 7 DQM output files
  • TriggerResults: found differences in 3 / 6 workflows

ROCM Comparison Summary

Summary:

@jfernan2
Copy link
Copy Markdown
Contributor

+1

@mmusich
Copy link
Copy Markdown
Contributor

mmusich commented Jul 30, 2025

@mandrenguyen @cms-sw/orp-l2 now that #48644 is merged can you please unhold this one?

@pietroGru
Copy link
Copy Markdown
Contributor

Hi. This PR has been validated in here. Please feel free to take a look at the plots.
Tracking performance are unaffected by the pull 48581. As a note, the largest relative difference is observed in the number of tracks per event, with no effects on the tracks quality though.

@fwyzard
Copy link
Copy Markdown
Contributor Author

fwyzard commented Jul 31, 2025

hi @pietroGru , what commit is used for the validation of this PR ?

I am surprised to hear that there is a difference in the number of tracks per event, as the latest commit should not introduce any changes at all on the reconstruction.

@antoniovilela
Copy link
Copy Markdown
Contributor

unhold

@cmsbuild
Copy link
Copy Markdown
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_15_0_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_15_1_X is complete. This pull request will now be reviewed by the release team before it's merged. @rappoccio, @sextonkennedy, @mandrenguyen, @antoniovilela (and backports should be raised in the release meeting by the corresponding L2)

@pietroGru
Copy link
Copy Markdown
Contributor

pietroGru commented Aug 1, 2025

hi @fwyzard , I had the repo fetch/pull after your last commit event) - so b0b7322 .

Just to be 100% sure, following up your question I run

diff PixelClustering.h PixelClustering_b0b7322.h 
diff SiPixelRawToClusterKernel.dev.cc SiPixelRawToClusterKernel_b0b7322.dev.cc

resulting in no differences.

Maybe ?

Please let me known if I can help in any way (running serial?)

@fwyzard
Copy link
Copy Markdown
Contributor Author

fwyzard commented Aug 1, 2025

Hi @pietroGru, thanks for the extra checks :-)

The maxNeighbours change should not have any impact, it only makes the code use less memory because we don't actually need to handle duplicates, since we remove them.

The lastPixel - 1 change should indeed have a slight impact - thanks for remind me about that.

I would be curious to see if after these changes the online and offline version agree more or less than before. And if changing back lastPixel to lastPixel - 1 removes all differences.
But maybe these are checks the DPG could do ?

From your report the physics is OK, so I think we can just go ahead, in preparation for the morphing update.

@pietroGru
Copy link
Copy Markdown
Contributor

Hi @fwyzard I agree with you: one can go ahead in preparation for the morphing.

Meanwhile, I have appended the "rollback" quick change for lastPixel in my to do list , just as a curiosity to check for any correlation.

@mandrenguyen
Copy link
Copy Markdown
Contributor

+1

@cmsbuild cmsbuild merged commit 299e53c into cms-sw:CMSSW_15_0_X Aug 4, 2025
14 checks passed
@fwyzard fwyzard deleted the fix_alpaka_PixelClustering_150x branch August 13, 2025 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants