Fixes for Digi Morphing: Limiting Histogram Size and Decoupling for `TrackerTraits` by AdrianoDee · Pull Request #49021 · cms-sw/cmssw

AdrianoDee · 2025-09-29T10:55:46Z

PR description:

This PR proposes a couple of fixes for the digi morphing to work properly for differen conditions (when acitve or not):

defining a maxPixInModuleForMorphing constant depending on the TrackerTraits to be used to define the number of threads for the FindClus kernel. This also sizes the histogram holding the pixels in a module:

using Hist = cms::alpakatools::HistoContainer<uint16_t,
                                                      TrackerTraits::clusterBinning,
                                                      TrackerTraits::maxPixInModuleForMorphing,
                                                      TrackerTraits::clusterBits,
                                                      uint16_t>;

having a different maxIterGPU per topology (given the different number of pixels affects the number of iterations we can use to cover the full module);
limiting the maxFakesInModule configuration parameter to take into account the maxPixInModuleForMorphing max to avoid the histogram overflowing.

PR validation:

160.03502 run.

Running successfully the test from @henriettepetersen :

hltConfigFromDB --configName /online/collisions/2025/2e34/v1.2/HLT/V2 > hlt.py
cp /gpu_data/store/data/Run2025C/EphemeralHLTPhysics/FED/run393240_cff.py .
cat >> hlt.py << @EOF

process.load('run393240_cff')

from Configuration.AlCa.GlobalTag import GlobalTag as customiseGlobalTag
process.GlobalTag = customiseGlobalTag(process.GlobalTag, globaltag = '150X_dataRun3_HLT_v1')

from HLTrigger.Configuration.customizeHLTforCMSSW import customizeHLTforCMSSW
process = customizeHLTforCMSSW(process)

process.PrescaleService.lvl1DefaultLabel = '2p0E34'
process.PrescaleService.forceDefault = True

process.options.wantSummary = False
process.MessageLogger.cerr.enableStatistics = cms.untracked.bool(False)

process.FastTimerService.writeJSONSummary = True

process.ThroughputService = cms.Service('ThroughputService',
    enableDQM = cms.untracked.bool(False),
    printEventSummary = cms.untracked.bool(True),
    eventResolution = cms.untracked.uint32(100),
    eventRange = cms.untracked.uint32(10300),
)
process.MessageLogger.cerr.ThroughputService = cms.untracked.PSet(
    limit = cms.untracked.int32(10000000),
    reportEvery = cms.untracked.int32(1)
)

import os
os.makedirs('%s/run%d' % (process.EvFDaqDirector.baseDir.value(), process.EvFDaqDirector.runNumber.value()), exist_ok=True)

process.options.numberOfThreads = 32
process.options.numberOfStreams = 24
process.options.numberOfConcurrentLuminosityBlocks = 2
process.maxEvents.input = 10300

process.hltSiPixelClustersSoA.DoDigiMorphing = cms.bool( True )
process.hltSiPixelClustersSoASerialSync.DoDigiMorphing = cms.bool( True )

@EOF

# run the configuration
cmsRun hlt.py

Backport is needed to 15_1_X for HI data taking.

cmsbuild · 2025-09-29T10:56:10Z

cms-bot internal usage

AdrianoDee · 2025-09-29T10:56:12Z

enable gpu

AdrianoDee · 2025-09-29T10:57:35Z

solves #48885

cmsbuild · 2025-09-29T10:58:12Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49021/46221

There are other open Pull requests which might conflict with changes you have proposed:
- File Geometry/CommonTopologies/interface/SimplePixelTopology.h modified in PR(s): CA Extension to strips #47090, [NGT] Extension of CA Pixel Tracking to Phase 2 Outer Tracker barrel #48921
- File RecoLocalTracker/SiPixelClusterizer/plugins/alpaka/PixelClustering.h modified in PR(s): Pixelimage morphing #48343
- File RecoLocalTracker/SiPixelClusterizer/plugins/alpaka/SiPixelRawToCluster.cc modified in PR(s): Pixelimage morphing #48343, Updated SoA View accessors from raw pointers to span #48377
- File RecoLocalTracker/SiPixelClusterizer/plugins/alpaka/SiPixelRawToClusterKernel.dev.cc modified in PR(s): Pixelimage morphing #48343, Updated SoA View accessors from raw pointers to span #48377

cmsbuild · 2025-09-29T10:58:37Z

A new Pull Request was created by @AdrianoDee for master.

It involves the following packages:

Geometry/CommonTopologies (geometry)
RecoLocalTracker/SiPixelClusterizer (reconstruction)

@Dr15Jones, @bsunanda, @civanch, @cmsbuild, @jfernan2, @kpedro88, @makortel, @mandrenguyen, @mdhildreth can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @VinInn, @VourMa, @bsunanda, @dkotlins, @elusian, @fabiocos, @felicepantaleo, @ferencek, @gpetruc, @martinamalberti, @mmasciov, @mmusich, @mroguljic, @mtosi, @rovere, @threus, @tsusa, @tvami this is something you requested to watch as well.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

AdrianoDee · 2025-09-29T11:00:06Z

test parameters:

relvals_gpu = 160.03502
relvals_opts_gpu = -w gpu

AdrianoDee · 2025-09-29T11:00:31Z

please test

AdrianoDee · 2025-09-29T11:10:45Z

type bug-fix

fwyzard · 2025-09-29T16:50:40Z

assign heterogeneous

cmsbuild · 2025-09-29T16:51:03Z

New categories assigned: heterogeneous

@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

fwyzard · 2025-09-29T16:54:23Z

 #include "HeterogeneousCore/AlpakaInterface/interface/warpsize.h"

-//#define GPU_DEBUG
+// #define GPU_DEBUG


Could you undo the extra whitespace change ?

fwyzard · 2025-09-29T17:01:31Z

        ALPAKA_ASSERT_ACC((alpaka::getWorkDiv<alpaka::Thread, alpaka::Elems>(acc)[0u] <= maxElements));

-        constexpr unsigned int maxIter = maxIterGPU * maxElements;
+        const unsigned int maxIter = TrackerTraits::maxIterClustering * maxElements;


The declaration of the arrays nn[maxIter][maxNeighbours] and nnn[maxIter] should not be allowed if maxIter is not constexpr or anyway known at compile time ?

In the host code we have tolerated variable-length arrays as a non-standard extension (for reasons that can be debated elsewhere). I don't know to what extent VLAs work in nvcc or hipcc.

CUDA does not seem to like it, compiling this

__global__ void kernel(bool more) { const int size = more ? 42 : 21; float data[size]; if (threadIdx.x < size) { data[size] = 0; } } int main(void) { kernel<<<1,1>>>(true); return 0; }

fails with

$ /usr/local/cuda-12.9/bin/nvcc -c test.cu -o test.o -arch sm_75 test.cu(4): error: expression must have a constant value float data[size]; ^ test.cu(4): note #2689-D: the value of variable "size" (declared at line 3) cannot be used as a constant float data[size]; ^ 1 error detected in the compilation of "test.cu".

Although the compilation of the tests seems to be progressing fine ?
And CUDA does support alloca() 🤔, in fact this compiles:

__global__ void kernel(bool more) { const int size = more ? 42 : 21; //float data[size]; float* data = static_cast<float *>(alloca(size * sizeof(float))); if (threadIdx.x < size) { data[size] = 0; } }

The alternative I see is just sizing it with the maximum possible (so TrackerTraits::maxElementsPerBlockMorph), basically wasting some of it.

And to be honest, I wasn't expecting this to compile either.

After looking a bit better into it I think I understand why it works:

on CPU the value of maxIter depends on whether morphing is enabled or not, but it works because as Matti pointed out we allow variable sized arrays;

on GPU the value of maxIter is actually independent whether morphing is enabled or not, so the compiler can determine it at compile time.

One smart compiler!

fwyzard · 2025-09-29T17:02:17Z


    static constexpr uint32_t maxPixInModule = 6000;
+    static constexpr uint32_t maxPixInModuleForMorphing = maxPixInModule;
+    static constexpr uint32_t maxIterClustering = 16;


Can we derive this from maxPixInModule or maxPixInModuleForMorphing ?

Yes, we could. But it depends on how we want to handle the number of blocks and threads for FindClus. As is, we fix maxPixInModule, maxIterClustering, blocks, and extrapolate maxElementsPerBlock so that maxElementsPerBlock = maxPixInModule/(maxIterClustering * blocks).

If I follow the code correctly, now we have

maxPixInModule, maxPixInModuleForMorphing and maxIterClustering fixed here

maxElementsPerBlock = maxPixInModule / maxIterClustering, round up to the next multiple of 64

maxElementsPerBlockMorph = maxPixInModuleForMorphing / maxIterClustering, round up to the next multiple of 64

maxElements

on CPU it is either maxElementsPerBlock or maxElementsPerBlockMorph

on GPU it is always 1.

maxIter = maxIterClustering × maxElements

on CPU it is maxPixInModule or maxPixInModuleForMorphing
rounded up to the next multiple of (maxIterClustering × 64)

on GPU it is maxIterClustering

Which results in

Phase2 Phase1 HIonPhase1

maxPixInModule 6000 6000 10000

maxPixInModuleForMorphing 6000 8400 11000

maxIterClustering 16 24 32

maxElementsPerBlock 384 256 320

maxElementsPerBlockMorph 384 384 384

maxElements (CPU, enableDigiMorphing = false) 384 256 320

maxElements (CPU, enableDigiMorphing = true) 384 384 384

maxElements (GPU) 1 1 1

maxIter (CPU, enableDigiMorphing = false) 6144 4096 5120

maxIter (CPU, enableDigiMorphing = true) 6144 9216 12288

maxIter (GPU) 16 24 32

Then maxIter is used to allocate the arrays of nearest neighbours.

My suggestion would be to

fix maxPixInModule and maxPixInModuleForMorphing like in this PR

determine maxElementsPerBlock based on what works and gives good performance on the T4 and/or L4 GPUs, and keep it fixed (hopefully using the same value with and without morphing)

derive maxIterClustering from maxPixInModuleForMorphing or maxPixInModule, depending if morphing is enabled or not.

What do you think ?

I was implementing this (I agree it's a better set of fixed variables), but: does this imply that maxIter is not fixed at compile time on GPU and the issue above would manifest?

Yes... 🤦🏻‍♂️

Shall we stay with the current schema for the moment? Just to get it in for the next 15_1_X release.

OK, I don't have a better suggestions, so let's keep it as it is for the moment 🤷🏻‍♂️

fwyzard · 2025-09-29T17:03:10Z

    static constexpr uint16_t last_barrel_detIndex = 864;

    static constexpr uint32_t maxPixInModule = 6000;
+    static constexpr uint32_t maxPixInModuleForMorphing = maxPixInModule;


Given the name maxPixInModuleForMorphing, it would make sense for this constant to indicate how many pixels at most one can expect to be recovered by the morphing step, rather than the total of original plus recovered pixels ?

Right, makes sense.

cmsbuild · 2025-09-29T20:58:04Z

+1

Size: This PR adds an extra 48KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-05a049/48337/summary.html
COMMIT: 9265f41
CMSSW: CMSSW_16_0_X_2025-09-28-2300/el8_amd64_gcc12
Additional Tests: GPU,AMD_MI300X,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/49021/48337/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

There are some workflows for which there are errors in the baseline:
2024.0050001 step 1
The results for the comparisons for these workflows could be incomplete
This means most likely that the IB is having errors in the relvals.The error does NOT come from this pull request

Summary:

You potentially removed 3 lines from the logs
Reco comparison results: 8 differences found in the comparisons
DQMHistoTests: Total files compared: 50
DQMHistoTests: Total histograms compared: 3861349
DQMHistoTests: Total failures: 23
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3861306
DQMHistoTests: Total skipped: 20
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
Checked 214 log files, 184 edm output root files, 50 DQM output files
TriggerResults: no differences found

AMD_MI300X Comparison Summary

Summary:

You potentially added 1 lines to the logs
Reco comparison results: 190 differences found in the comparisons
DQMHistoTests: Total files compared: 11
DQMHistoTests: Total histograms compared: 146188
DQMHistoTests: Total failures: 33294
DQMHistoTests: Total nulls: 13
DQMHistoTests: Total successes: 112881
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 10 files compared)
Checked 47 log files, 51 edm output root files, 11 DQM output files
TriggerResults: no differences found

NVIDIA_H100 Comparison Summary

Summary:

You potentially added 3 lines to the logs
Reco comparison results: 256 differences found in the comparisons
DQMHistoTests: Total files compared: 11
DQMHistoTests: Total histograms compared: 146188
DQMHistoTests: Total failures: 27888
DQMHistoTests: Total nulls: 9
DQMHistoTests: Total successes: 118291
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 10 files compared)
Checked 47 log files, 51 edm output root files, 11 DQM output files
TriggerResults: no differences found

NVIDIA_L40S Comparison Summary

There are some workflows for which there are errors in the baseline:
160.03502 step 4
The results for the comparisons for these workflows could be incomplete
This means most likely that the IB is having errors in the relvals.The error does NOT come from this pull request

Summary:

You potentially removed 22699 lines from the logs
ROOTFileChecks: Some differences in event products or their sizes found
Reco comparison results: 237 differences found in the comparisons
DQMHistoTests: Total files compared: 11
DQMHistoTests: Total histograms compared: 146188
DQMHistoTests: Total failures: 27427
DQMHistoTests: Total nulls: 10
DQMHistoTests: Total successes: 118751
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 10 files compared)
Checked 46 log files, 50 edm output root files, 11 DQM output files
TriggerResults: no differences found

NVIDIA_T4 Comparison Summary

There are some workflows for which there are errors in the baseline:
160.03502 step 4
The results for the comparisons for these workflows could be incomplete
This means most likely that the IB is having errors in the relvals.The error does NOT come from this pull request

Summary:

You potentially removed 18689 lines from the logs
ROOTFileChecks: Some differences in event products or their sizes found
Reco comparison results: 196 differences found in the comparisons
DQMHistoTests: Total files compared: 11
DQMHistoTests: Total histograms compared: 146188
DQMHistoTests: Total failures: 33692
DQMHistoTests: Total nulls: 13
DQMHistoTests: Total successes: 112483
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 10 files compared)
Checked 46 log files, 50 edm output root files, 11 DQM output files
TriggerResults: no differences found

cmsbuild · 2025-10-02T15:36:44Z

Pull request #49021 was updated. @Dr15Jones, @bsunanda, @civanch, @cmsbuild, @fwyzard, @jfernan2, @kpedro88, @makortel, @mandrenguyen, @mdhildreth can you please check and sign again.

AdrianoDee · 2025-10-02T15:40:02Z

please test

cmsbuild · 2025-10-03T05:52:25Z

+1

Size: This PR adds an extra 40KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-05a049/48429/summary.html
COMMIT: a487e5e
CMSSW: CMSSW_16_0_X_2025-10-02-1100/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/49021/48429/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

You potentially removed 1 lines from the logs
Reco comparison results: 10 differences found in the comparisons
DQMHistoTests: Total files compared: 51
DQMHistoTests: Total histograms compared: 3924341
DQMHistoTests: Total failures: 24
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3924297
DQMHistoTests: Total skipped: 20
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 50 files compared)
Checked 218 log files, 188 edm output root files, 51 DQM output files
TriggerResults: no differences found

AMD_MI300X Comparison Summary

Summary:

You potentially added 2 lines to the logs
Reco comparison results: 237 differences found in the comparisons
DQMHistoTests: Total files compared: 11
DQMHistoTests: Total histograms compared: 146284
DQMHistoTests: Total failures: 28118
DQMHistoTests: Total nulls: 8
DQMHistoTests: Total successes: 118158
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 10 files compared)
Checked 47 log files, 51 edm output root files, 11 DQM output files
TriggerResults: no differences found

AMD_W7900 Comparison Summary

Summary:

You potentially removed 2 lines from the logs
Reco comparison results: 235 differences found in the comparisons
DQMHistoTests: Total files compared: 11
DQMHistoTests: Total histograms compared: 146284
DQMHistoTests: Total failures: 28993
DQMHistoTests: Total nulls: 5
DQMHistoTests: Total successes: 117286
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 10 files compared)
Checked 47 log files, 51 edm output root files, 11 DQM output files
TriggerResults: no differences found

NVIDIA_H100 Comparison Summary

Summary:

You potentially added 1 lines to the logs
Reco comparison results: 251 differences found in the comparisons
DQMHistoTests: Total files compared: 11
DQMHistoTests: Total histograms compared: 146284
DQMHistoTests: Total failures: 23285
DQMHistoTests: Total nulls: 7
DQMHistoTests: Total successes: 122992
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 10 files compared)
Checked 47 log files, 51 edm output root files, 11 DQM output files
TriggerResults: no differences found

NVIDIA_L40S Comparison Summary

There are some workflows for which there are errors in the baseline:
160.03502 step 4
The results for the comparisons for these workflows could be incomplete
This means most likely that the IB is having errors in the relvals.The error does NOT come from this pull request

Summary:

You potentially removed 220646 lines from the logs
ROOTFileChecks: Some differences in event products or their sizes found
Reco comparison results: 234 differences found in the comparisons
DQMHistoTests: Total files compared: 11
DQMHistoTests: Total histograms compared: 146284
DQMHistoTests: Total failures: 24204
DQMHistoTests: Total nulls: 5
DQMHistoTests: Total successes: 122075
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 10 files compared)
Checked 46 log files, 50 edm output root files, 11 DQM output files
TriggerResults: no differences found

NVIDIA_T4 Comparison Summary

There are some workflows for which there are errors in the baseline:
160.03502 step 4
The results for the comparisons for these workflows could be incomplete
This means most likely that the IB is having errors in the relvals.The error does NOT come from this pull request

Summary:

You potentially removed 17453 lines from the logs
ROOTFileChecks: Some differences in event products or their sizes found
Reco comparison results: 254 differences found in the comparisons
DQMHistoTests: Total files compared: 11
DQMHistoTests: Total histograms compared: 146284
DQMHistoTests: Total failures: 24974
DQMHistoTests: Total nulls: 5
DQMHistoTests: Total successes: 121305
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 10 files compared)
Checked 46 log files, 50 edm output root files, 11 DQM output files
TriggerResults: no differences found

mandrenguyen · 2025-10-03T07:31:24Z

urgent
@cms-sw/heterogeneous-l2 @cms-sw/geometry-l2 @cms-sw/reconstruction-l2 please have a look today, if possible.
The 15_1_0 build is being held up by the the backport of this is PR. Thank you!

jfernan2 · 2025-10-03T09:18:54Z

+1

fwyzard · 2025-10-03T12:47:31Z

+heterogeneous

Thanks Adriano for the fix and addressing the various comments.

mandrenguyen · 2025-10-03T13:22:03Z

@cms-sw/geometry-l2 ping

civanch · 2025-10-03T14:48:11Z

+1

cmsbuild · 2025-10-03T14:48:38Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @mandrenguyen, @sextonkennedy, @ftenchini (and backports should be raised in the release meeting by the corresponding L2)

mandrenguyen · 2025-10-03T16:15:53Z

+1

Better handling of shared memory for pixel clustering and digi morphing

34102ee

cmsbuild added this to the CMSSW_16_0_X milestone Sep 29, 2025

cmsbuild added reconstruction-pending geometry-pending pending-signatures tests-pending orp-pending code-checks-pending trk labels Sep 29, 2025

cmsbuild added code-checks-approved and removed code-checks-pending labels Sep 29, 2025

cmsbuild added tests-started and removed tests-pending labels Sep 29, 2025

cmsbuild added the bug-fix label Sep 29, 2025

cmsbuild mentioned this pull request Sep 29, 2025

Updated SoA View accessors from raw pointers to span #48377

Merged

cmsbuild added the heterogeneous-pending label Sep 29, 2025

fwyzard reviewed Sep 29, 2025

View reviewed changes

cmsbuild added tests-approved and removed tests-started labels Sep 29, 2025

cmsbuild added tests-started and removed tests-pending labels Oct 2, 2025

cmsbuild added tests-approved and removed tests-started labels Oct 3, 2025

cmsbuild added the urgent label Oct 3, 2025

cmsbuild added reconstruction-approved and removed reconstruction-pending labels Oct 3, 2025

cmsbuild added heterogeneous-approved and removed heterogeneous-pending labels Oct 3, 2025

cmsbuild mentioned this pull request Oct 3, 2025

[NGT] Extension of CA Pixel Tracking to Phase 2 Outer Tracker barrel #48921

Merged

cmsbuild added geometry-approved fully-signed and removed geometry-pending pending-signatures labels Oct 3, 2025

cmsbuild added orp-approved and removed orp-pending labels Oct 3, 2025

cmsbuild merged commit dd7156a into cms-sw:master Oct 3, 2025
26 checks passed

mmusich mentioned this pull request Oct 5, 2025

[15_0_X] Digi Morphing for HLT #48832

Merged

AdrianoDee deleted the digimoprh_sharedmemory_160X branch October 13, 2025 09:29

jfernan2 mentioned this pull request Oct 23, 2025

[GPU] Relval 160.03502: SIGSEGV in SiPixelDigisClustersFromSoAAlpaka::produce #48885

Closed

	Phase2	Phase1	HIonPhase1
`maxPixInModule`	6000	6000	10000
`maxPixInModuleForMorphing`	6000	8400	11000
`maxIterClustering`	16	24	32

`maxElementsPerBlock`	384	256	320
`maxElementsPerBlockMorph`	384	384	384

`maxElements` (CPU, `enableDigiMorphing` = false)	384	256	320
`maxElements` (CPU, `enableDigiMorphing` = true)	384	384	384
`maxElements` (GPU)	1	1	1

`maxIter` (CPU, `enableDigiMorphing` = false)	6144	4096	5120
`maxIter` (CPU, `enableDigiMorphing` = true)	6144	9216	12288
`maxIter` (GPU)	16	24	32

Conversation

AdrianoDee commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR description:

PR validation:

Uh oh!

cmsbuild commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AdrianoDee commented Sep 29, 2025

Uh oh!

AdrianoDee commented Sep 29, 2025

Uh oh!

cmsbuild commented Sep 29, 2025

Uh oh!

cmsbuild commented Sep 29, 2025

Uh oh!

AdrianoDee commented Sep 29, 2025

Uh oh!

AdrianoDee commented Sep 29, 2025

Uh oh!

AdrianoDee commented Sep 29, 2025

Uh oh!

fwyzard commented Sep 29, 2025

Uh oh!

cmsbuild commented Sep 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AdrianoDee Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AdrianoDee Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cmsbuild commented Sep 29, 2025

Comparison Summary

AMD_MI300X Comparison Summary

NVIDIA_H100 Comparison Summary

NVIDIA_L40S Comparison Summary

NVIDIA_T4 Comparison Summary

Uh oh!

cmsbuild commented Oct 2, 2025

Uh oh!

AdrianoDee commented Oct 2, 2025

Uh oh!

cmsbuild commented Oct 3, 2025

Comparison Summary

AdrianoDee commented Sep 29, 2025 •

edited

Loading

cmsbuild commented Sep 29, 2025 •

edited

Loading

AdrianoDee Sep 30, 2025 •

edited

Loading

AdrianoDee Oct 2, 2025 •

edited

Loading