Fix: Add missing buffer flush for PCL to MillePedeAlignmentAlgorithm#50447
Fix: Add missing buffer flush for PCL to MillePedeAlignmentAlgorithm#50447cmsbuild merged 1 commit intocms-sw:CMSSW_16_0_Xfrom
Conversation
|
Pull request #50447 was updated. |
|
cms-bot internal usage |
|
Pull request #50447 was updated. |
1 similar comment
|
Pull request #50447 was updated. |
|
Not directly relevant for the review of this PR, but in case any experts are reading this: Can you think of a reason why in 15_1_0, this issue did not result in truncated Are there any known changes between the two release families that could explain why this has become an issue now? @lpnair confirmed that even "old" workflows that used to run in |
|
@cms-sw/core-l2 do you have any hint on #50447 (comment) (see also PR description). It looks like something changed in the way calls are scheduled. |
|
@goblirsc please squash the commits to one. |
as reported elsewhere, if you run with the |
Hi, I ran |
ccf2198 to
1be9f9d
Compare
|
Pull request #50447 was updated. |
|
test parameters:
|
|
@cmsbuild, please test |
|
I confirm that a previously failing setup: #!/bin/bash -ex
dasgoclient --limit 0 --query 'file dataset=/HLTMonitor/Run2026A-Express-v1/FEVTHLTALL run=401691' | sort -u > input_files.txt
echo '{"401691": [[36, 71]]}' > step1_lumiRanges.txt
cmsDriver.py ReAlCaHLT \
-s ALCA:TkAlHLTTracks+TkAlHLTTracksZMuMu+PromptCalibProdSiPixelAliHLTHGC \
--conditions 160X_dataRun3_Express_v1 \
--scenario pp \
--data \
--era Run3_2026 \
--datatier ALCARECO \
--eventcontent ALCARECO \
--process RECO \
--processName ReAlCa \
--filein filelist:input_files.txt \
--lumiToProcess step1_lumiRanges.txt \
-n 1000 >& ReAlCa.log
cmsDriver.py ALCAHARVDSIPIXELALIHLTHGCOMBINED \
-s ALCAHARVEST:SiPixelAliHLTHGCombined \
--conditions 160X_dataRun3_Express_v1 \
--scenario pp \
--era Run3_2026 \
--data \
-n -1 \
--filein file:PromptCalibProdSiPixelAliHLTHGC.root \
--customise Alignment/CommonAlignmentProducer/customizeLSNumberFilterForRelVals.lowerHitsPerStructure >& Harvesting.lognow produces a payload, despite in the log I find: |
As a first comment, in absence of data dependencies between modules the order of modules for which |
Hi, I just tried to reproduce this, but for me, the and several warnings for inactive alignables, which however makes sense with just 1000 events |
I have run merging locally #50420 |
I think that's precisely the problem (having a valid VOMS proxy and try to run the --ibeos). things run correctly for me -- I encourage you to use this in the future. |
|
Ah, great! I didn't merge that one in my test. Then yes, the message is expected: The exit code from this run is So it should indeed warn the user about a non-nominal exit code. For a follow-up (though maybe on |
in my opinion this is very misleading. A fix is in order (to be backported). |
Hi, in this case it seems to be an |
|
type bug-fix |
|
urgent
|
|
A new Pull Request was created by @goblirsc for CMSSW_16_0_X. It involves the following packages:
@Alejandro1400, @JanChyczynski, @arunhep, @atpathak, @perrotta can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
|
+1 Size: This PR adds an extra 36KB to repository Comparison SummarySummary:
|
Thanks. On a quick look I don't see any possible data dependency relationship between If they need to be run in in a specific order, the module whose endLumi needs to be run first should produce something to the On another quick look I did not see any framework changes that should alter the order of modules in endLumi transition. We did update TBB, that might have an effect or not. |
|
+alca |
|
This pull request is fully signed and it will be integrated in one of the next CMSSW_16_0_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_16_1_X is complete. This pull request will now be reviewed by the release team before it's merged. @mandrenguyen, @ftenchini, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2) |
|
+1 |
PR description:
This PR adds a missing buffer flush call to
MillePedeAlignmentAlgorithm::endLuminosityBlock.The missing call was resulting in the final dimuon track of a job to be truncated at the time the alignment FileBlob was written into the ALCARECO output, resulting in corrupted alignment data and failing high-granularity PCL alignment jobs.
Longer description below.
PR validation:
[ ✔️ ] Confirm that local re-running of PCL job with the change results in a succesful alignment
[ ✔️ ] Unit tests pass
[ ✔️ ]
runTheMatrixWF 1000,1001,1001.2,1001.3,1001.4,1002.3,1002.4,1002 (modulo 5 x "input file not found" - known and unrelated to the PR)If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:
This PR is only intended as a bugfix for
16_0_Xto fix the ongoing PCL high-granularity alignment involving dimuons.In
master(already merged in #49963), a new writing pattern forMillewas adopted that also fixes this issue.CC @lpnair
Detailed description of bug:
theMille: Used when a reference trajectory exists (MinBias)theBinary: Used to talk to the GBL interface inside the algorithm, used for dimuon pairsMillePedeAlignmentAlgorithm::endLuminosityBlocktheMille, but nottheBinaryMillePedeFileConverter::endLuminosityBlockProduceFileBlobinto the output ALCARECO file.MillePedeAlignmentAlgorithm::terminatetheMilleandtheBinary- leading to any remaining buffers being flushed to disk.theBinarybefore producing theFileBlobterminate) - and thus missed in theFileBlob.Consequence:
Proposed Fix in this PR: Make sure to flush the
theBinarybuffer to disk inMillePedeAlignmentAlgorithm::endLuminosityBlockwith a minimal change.