Skip to content

[EVOLUTION_X] Disable runTheMatrix workflows reading input data#50649

Merged
cmsbuild merged 1 commit intocms-sw:CMSSW_17_0_EVOLUTION_Xfrom
makortel:evoDisableWorkflows
Apr 13, 2026
Merged

[EVOLUTION_X] Disable runTheMatrix workflows reading input data#50649
cmsbuild merged 1 commit intocms-sw:CMSSW_17_0_EVOLUTION_Xfrom
makortel:evoDisableWorkflows

Conversation

@makortel
Copy link
Copy Markdown
Contributor

@makortel makortel commented Apr 2, 2026

PR description:

This PR disables runTheMatrix.py workflows that unconditionally read input data. For MC workflows whose GEN step can be run on the fly, the reading of the input data is disabled.

All changes were noted with COMMENT: reads old format file comment that we have used also for disabled unit tests.

It is possible that I didn't catch everything in this PR, but it is much easier to discover them from the IBs than running the full runTheMatrix myself.

I'd imagine eventually many of these workflows could be removed (in CMSSW_20 or beyond), but that is not my call (I'd be happy to remove workflows instead of commenting them out if that is wanted).

Resolves cms-sw/framework-team#2112

PR validation:

runTheMatrix.py -n succeeds locally.

@makortel
Copy link
Copy Markdown
Contributor Author

makortel commented Apr 2, 2026

assign core

@makortel
Copy link
Copy Markdown
Contributor Author

makortel commented Apr 2, 2026

type evolution

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Apr 2, 2026

cms-bot internal usage

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Apr 2, 2026

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Apr 2, 2026

New categories assigned: core

@Dr15Jones,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Apr 2, 2026

A new Pull Request was created by @makortel for CMSSW_16_1_EVOLUTION_X.

It involves the following packages:

  • Configuration/PyReleaseValidation (pdmv)

@AdrianoDee, @DickyChant, @Dr15Jones, @antoniovagnerini, @cmsbuild, @makortel, @miquork, @smuzaffar can you please review it and eventually sign? Thanks.
@Martin-Grunewald, @fabiocos, @slomeo this is something you requested to watch as well.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@makortel
Copy link
Copy Markdown
Contributor Author

makortel commented Apr 2, 2026

@cmsbuild, please test

I wouldn't be surprised of failures e.g. because bot asks for workflows that don't exist with this PR.

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Apr 2, 2026

-1

Failed Tests: UnitTests RelVals AddOn
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f67d65/52450/summary.html
COMMIT: 2777862
CMSSW: CMSSW_16_1_EVOLUTION_X_2026-03-30-2300/el8_amd64_gcc13
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/50649/52450/install.sh to create a dev area with all the needed externals and cmssw changes.

Failed Unit Tests

I found 1 errors in the following unit tests:

---> test test-runTheMatrix-interactive had ERRORS

Failed RelVals

ValueError: Undefined workflows: 2500.3001, 18634.0, 17034.0, 34634.999, 13034.0, 10224.0, 25202.0, 312.0, 250202.181

Failed AddOn Tests

----- Begin Fatal Exception 02-Apr-2026 23:11:04 CEST-----------------------
An exception of category 'FallbackFileOpenError' occurred while
   [0] Constructing the EventProcessor
   [1] Constructing input source of type PoolSource
   [2] Calling RootInputFileSequence::initTheFile()
   [3] Calling StorageFactory::open()
   [4] Calling XrdFile::open()
Exception Message:
Failed to open the file 'root://xrootd-cms.infn.it//store/relval/CMSSW_9_2_2/RelValProdTTbar_13/AODSIM/91X_mcRun2_asymptotic_v3-v1/10000/EEB99F74-DA4D-E711-A41C-0025905A48F2.root?scitag.flow=196664'
   Additional Info:
      [a] Attempted to open logical file /store/relval/CMSSW_9_2_2/RelValProdTTbar_13/AODSIM/91X_mcRun2_asymptotic_v3-v1/10000/EEB99F74-DA4D-E711-A41C-0025905A48F2.root.
      [b] Failed to open file with physical name root://eoscms.cern.ch//eos/cms/store/user/cmsbuild/store/relval/CMSSW_9_2_2/RelValProdTTbar_13/AODSIM/91X_mcRun2_asymptotic_v3-v1/10000/EEB99F74-DA4D-E711-A41C-0025905A48F2.root?scitag.flow=196664. Will attempt fallback. The error was
Error type FatalRootError
Fatal Root Error: @SUB=TStreamerInfo::BuildCheck

   The StreamerInfo of class reco::Photon::PflowIDVariables read from file root://eoscms.cern.ch//eos/cms/store/user/cmsbuild/store/relval/CMSSW_9_2_2/RelValProdTTbar_13/AODSIM/91X_mcRun2_asymptotic_v3-v1/10000/EEB99F74-DA4D-E711-A41C-0025905A48F2.root
   has the same version (=3) as the active class but a different checksum.
   You should update the version to ClassDef(reco::Photon::PflowIDVariables,4).
   Do not try to write objects with the current class definition,
   the files will not be readable.


      [c] Failed to open the file with physical name root://cms-xrd-global.cern.ch//eos/cms/store/relval/CMSSW_9_2_2/RelValProdTTbar_13/AODSIM/91X_mcRun2_asymptotic_v3-v1/10000/EEB99F74-DA4D-E711-A41C-0025905A48F2.root?scitag.flow=196664. Will attempt fallback.
      [d] Failed to open the file with physical name root://xrootd-cms.infn.it//store/relval/CMSSW_9_2_2/RelValProdTTbar_13/AODSIM/91X_mcRun2_asymptotic_v3-v1/10000/EEB99F74-DA4D-E711-A41C-0025905A48F2.root?scitag.flow=196664.
      [e] XrdCl::File::Open(name='root://xrootd-cms.infn.it//store/relval/CMSSW_9_2_2/RelValProdTTbar_13/AODSIM/91X_mcRun2_asymptotic_v3-v1/10000/EEB99F74-DA4D-E711-A41C-0025905A48F2.root?scitag.flow=196664', flags=0x10, permissions=0660) => error '[ERROR] Server responded with an error: [3011] No servers are available to read the file.
' (errno=3011, code=400). No additional data servers were found.
      [f] Last URL tried: root://cms-xrd-global.cern.ch:1094//store/relval/CMSSW_9_2_2/RelValProdTTbar_13/AODSIM/91X_mcRun2_asymptotic_v3-v1/10000/EEB99F74-DA4D-E711-A41C-0025905A48F2.root?scitag.flow=196664&tried=+1213xrootd-cms-redir-int.cr.cnaf.infn.it&xrdcl.requuid=86b9c361-19cd-4c95-a33e-90745f3cc11e
      [g] Problematic data server: cms-xrd-global.cern.ch:1094
      [h] Disabled source: cms-xrd-global.cern.ch:1094
----- End Fatal Exception -------------------------------------------------

@makortel
Copy link
Copy Markdown
Contributor Author

makortel commented Apr 6, 2026

@smuzaffar I couldn't find where cms-bot would require e.g. workflow 2500.3001 to be run. I dug a bit into cms-bot scripts, and found some workflows defined there (in cmssw-pr-test-config), I'll try to add conditional logic there fore the EVOLUTION_X.

But do I guess correctly that the bot establishes the set of workflows to run by default based on the baseline? (that's the best I could come up with for e.g. 2500.3001). Is there a way around that for a PR that removes workflows that are run in the runTheMatrix.py --limited?

@makortel
Copy link
Copy Markdown
Contributor Author

makortel commented Apr 6, 2026

@cmsbuild, please test with cms-sw/cms-bot#2715

@makortel
Copy link
Copy Markdown
Contributor Author

makortel commented Apr 6, 2026

@cmsbuild, please abort

@makortel makortel changed the base branch from CMSSW_16_1_EVOLUTION_X to CMSSW_17_0_EVOLUTION_X April 6, 2026 20:32
@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Apr 9, 2026

@cmsbuild
Copy link
Copy Markdown
Contributor

cmsbuild commented Apr 9, 2026

Pull request #50649 was updated. @AdrianoDee, @DickyChant, @Dr15Jones, @antoniovagnerini, @cmsbuild, @makortel, @miquork, @smuzaffar can you please check and sign again.

@makortel
Copy link
Copy Markdown
Contributor Author

makortel commented Apr 9, 2026

@cmsbuild, please test with cms-sw/cms-bot#2715

@smuzaffar
Copy link
Copy Markdown
Contributor

smuzaffar commented Apr 10, 2026

@makortel , JR comparison job generates too much log data like [a] can cause the 500GB disk to full [b]

[a]

Error in <TBufferFile::CheckByteCount>: Byte count probably corrupted around buffer position 16388:
        -1642856448 for a possible maximum of -10
Error in <TBufferFile::ReadClassBuffer>: class: pat::LookupTableRecord, attempting to access a wrong version: -18272, object skipped at offset 16386
Error in <TBufferFile::ReadClassBuffer>: class: pat::LookupTableRecord, attempting to access a wrong version: -31135, object skipped at offset 16388
Error in <TBufferFile::ReadClassBuffer>: class: pat::LookupTableRecord, attempting to access a wrong version: 4416, object skipped at offset 16394
Error in <TBufferFile::CheckByteCount>: object of class pat::LookupTableRecord read too many bytes: 2 instead of -1642856448
Warning in <TBufferFile::CheckByteCount>: pat::LookupTableRecord::Streamer() not in sync with data on file /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/CMSSW_17_0_X_2026-04-06-2300/12434.0_TTbar_14TeV+2023/step3_inMINIAODSIM.root, fix Streamer()
Error in <TBufferFile::CheckByteCount>: Byte count probably corrupted around buffer position 16388:
        -1642856448 for a possible maximum of -10
Error in <TBufferFile::ReadClassBuffer>: class: pat::LookupTableRecord, attempting to access a wrong version: -18272, object skipped at offset 16386
Error in <TBufferFile::ReadClassBuffer>: class: pat::LookupTableRecord, attempting to access a wrong version: -31135, object skipped at offset 16388
Error in <TBufferFile::ReadClassBuffer>: class: pat::LookupTableRecord, attempting to access a wrong version: 4416, object skipped at offset 16394
Error in <TBufferFile::CheckByteCount>: object of class pat::LookupTableRecord read too many bytes: 2 instead of -1642856448
Warning in <TBufferFile::CheckByteCount>: pat::LookupTableRecord::Streamer() not in sync with data on file /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/CMSSW_17_0_X_2026-04-06-2300/12434.0_TTbar_14TeV+2023/step3_inMINIAODSIM.root, fix Streamer()
Error in <TBufferFile::CheckByteCount>: Byte count probably corrupted around buffer position 16388:
        -1642856448 for a possible maximum of -10
Error in <TBufferFile::ReadClassBuffer>: class: pat::LookupTableRecord, attempting to access a wrong version: -18272, object skipped at offset 16386
Error in <TBufferFile::ReadClassBuffer>: class: pat::LookupTableRecord, attempting to access a wrong version: -31135, object skipped at offset 16388
Error in <TBufferFile::ReadClassBuffer>: class: pat::LookupTableRecord, attempting to access a wrong version: 4416, object skipped at offset 16394
Error in <TBufferFile::CheckByteCount>: object of class pat::LookupTableRecord read too many bytes: 2 instead of -1642856448
Warning in <TBufferFile::CheckByteCount>: pat::LookupTableRecord::Streamer() not in sync with data on file /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/CMSSW_17_0_X_2026-04-06-2300/12434.0_TTbar_14TeV+2023/step3_inMINIAODSIM.root, fix Streamer()
Error in <TBufferFile::CheckByteCount>: Byte count probably corrupted around buffer position 16388:
        -1642856448 for a possible maximum of -10
Error in <TBufferFile::ReadClassBuffer>: class: pat::LookupTableRecord, attempting to access a wrong version: -18272, object skipped at offset 16386
Error in <TBufferFile::ReadClassBuffer>: class: pat::LookupTableRecord, attempting to access a wrong version: -31135, object skipped at offset 16388
Error in <TBufferFile::ReadClassBuffer>: class: pat::LookupTableRecord, attempting to access a wrong version: 4416, object skipped at offset 16394
Error in <TBufferFile::CheckByteCount>: object of class pat::LookupTableRecord read too many bytes: 2 instead of -1642856448
Warning in <TBufferFile::CheckByteCount>: pat::LookupTableRecord::Streamer() not in sync with data on file /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/CMSSW_17_0_X_2026-04-06-2300/12434.0_TTbar_14TeV+2023/step3_inMINIAODSIM.root, fix Streamer()
Error in <TBufferFile::CheckByteCount>: Byte count probably corrupted around buffer position 16388:
        -1642856448 for a possible maximum of -10

[b]

[root@cmsbuild999 JR-comparison]# du -sh * | grep 'G\s\s*'
20G	11634.0_TTbar_14TeV+2022
77G	12434.0_TTbar_14TeV+2023
74G	12834.0_TTbar_14TeV+2024
22G	12846.0_ZEE_14+2024
20G	1306.0_SingleMuPt1_UP15
69G	1330.0_ZMM_13
53G	136.731_RunSinglePh2016B
2.5G	16834.0_TTbar_14TeV+2025
4.7G	18434.0_TTbar_14TeV+2026
45G	2025.0010001_RunJetMET02025C_10k
42G	25.0_TTbar
20G	34500.0_CloseByPGun_CE_H_Coarse_Scint+Run4D121
18G	9.0_Higgs200ChargedTaus

@makortel
Copy link
Copy Markdown
Contributor Author

JR comparison job generates too much log data like [a] can causes the 500GB disk to full [b]

Oops, sorry about that. Yeah, that is not good.

@makortel
Copy link
Copy Markdown
Contributor Author

makortel commented Apr 10, 2026

Is the JR comparisons comparing the files from this PR to the master IB baseline or the EVOLUTION_X IB baseline?

@smuzaffar
Copy link
Copy Markdown
Contributor

smuzaffar commented Apr 10, 2026

Is the JR comparisons comparing the files from this PR to the master IB baseline or the EVOLUTION_X IB baseline?

master IB is used as baseline

@makortel
Copy link
Copy Markdown
Contributor Author

Is the JR comparisons comparing the files from this PR to the master IB baseline or the EVOLUTION_X IB baseline?

master IB is used as baseline

Thanks, that explains. The ROOT files are incompatible by construction and can not be opened by the same ROOT-using process. I think we should either use the EVOLUTION_X IB as the baseline, or disable the JR comparisons altogether for EVOLUTION_X.

@cmsbuild
Copy link
Copy Markdown
Contributor

-1

Failed Tests: AddOn
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f67d65/52576/summary.html
COMMIT: 31c4b31
CMSSW: CMSSW_17_0_EVOLUTION_X_2026-04-06-2300/el8_amd64_gcc13
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/50649/52576/install.sh to create a dev area with all the needed externals and cmssw changes.

Failed AddOn Tests

----- Begin Fatal Exception 09-Apr-2026 23:41:55 CEST-----------------------
An exception of category 'FallbackFileOpenError' occurred while
   [0] Constructing the EventProcessor
   [1] Constructing input source of type PoolSource
   [2] Calling RootInputFileSequence::initTheFile()
   [3] Calling StorageFactory::open()
   [4] Calling XrdFile::open()
Exception Message:
Failed to open the file 'root://xrootd-cms.infn.it//store/relval/CMSSW_9_2_2/RelValProdTTbar_13/AODSIM/91X_mcRun2_asymptotic_v3-v1/10000/EEB99F74-DA4D-E711-A41C-0025905A48F2.root?scitag.flow=196664'
   Additional Info:
      [a] Attempted to open logical file /store/relval/CMSSW_9_2_2/RelValProdTTbar_13/AODSIM/91X_mcRun2_asymptotic_v3-v1/10000/EEB99F74-DA4D-E711-A41C-0025905A48F2.root.
      [b] Failed to open file with physical name root://eoscms.cern.ch//eos/cms/store/user/cmsbuild/store/relval/CMSSW_9_2_2/RelValProdTTbar_13/AODSIM/91X_mcRun2_asymptotic_v3-v1/10000/EEB99F74-DA4D-E711-A41C-0025905A48F2.root?scitag.flow=196664. Will attempt fallback. The error was
Error type FatalRootError
Fatal Root Error: @SUB=TStreamerInfo::BuildCheck

   The StreamerInfo of class reco::Photon::PflowIDVariables read from file root://eoscms.cern.ch//eos/cms/store/user/cmsbuild/store/relval/CMSSW_9_2_2/RelValProdTTbar_13/AODSIM/91X_mcRun2_asymptotic_v3-v1/10000/EEB99F74-DA4D-E711-A41C-0025905A48F2.root
   has the same version (=3) as the active class but a different checksum.
   You should update the version to ClassDef(reco::Photon::PflowIDVariables,4).
   Do not try to write objects with the current class definition,
   the files will not be readable.


      [c] Failed to open the file with physical name root://cms-xrd-global.cern.ch//eos/cms/store/relval/CMSSW_9_2_2/RelValProdTTbar_13/AODSIM/91X_mcRun2_asymptotic_v3-v1/10000/EEB99F74-DA4D-E711-A41C-0025905A48F2.root?scitag.flow=196664. Will attempt fallback.
      [d] Failed to open the file with physical name root://xrootd-cms.infn.it//store/relval/CMSSW_9_2_2/RelValProdTTbar_13/AODSIM/91X_mcRun2_asymptotic_v3-v1/10000/EEB99F74-DA4D-E711-A41C-0025905A48F2.root?scitag.flow=196664.
      [e] XrdCl::File::Open(name='root://xrootd-cms.infn.it//store/relval/CMSSW_9_2_2/RelValProdTTbar_13/AODSIM/91X_mcRun2_asymptotic_v3-v1/10000/EEB99F74-DA4D-E711-A41C-0025905A48F2.root?scitag.flow=196664', flags=0x10, permissions=0660) => error '[ERROR] Server responded with an error: [3011] No servers are available to read the file.
' (errno=3011, code=400). No additional data servers were found.
      [f] Last URL tried: root://cms-xrd-global.cern.ch:1094//store/relval/CMSSW_9_2_2/RelValProdTTbar_13/AODSIM/91X_mcRun2_asymptotic_v3-v1/10000/EEB99F74-DA4D-E711-A41C-0025905A48F2.root?scitag.flow=196664&tried=+1213xrootd-redic.pi.infn.it&xrdcl.requuid=11d8b597-8c42-4f2a-9de2-96724bdbf6a7
      [g] Problematic data server: cms-xrd-global.cern.ch:1094
      [h] Disabled source: cms-xrd-global.cern.ch:1094
----- End Fatal Exception -------------------------------------------------

Comparison Summary

@makortel
Copy link
Copy Markdown
Contributor Author

cms-sw/cms-bot#2719 was merged, so I guess we could test again here

@makortel
Copy link
Copy Markdown
Contributor Author

@cmsbuild, please test with cms-sw/cms-bot#2715

@cmsbuild
Copy link
Copy Markdown
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f67d65/52634/summary.html
COMMIT: 31c4b31
CMSSW: CMSSW_17_0_EVOLUTION_X_2026-04-10-2300/el8_amd64_gcc13
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/50649/52634/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

@makortel
Copy link
Copy Markdown
Contributor Author

@smuzaffar Seems like the comparisons were run, but the summary of the comparison was omitted from the message #50649 (comment) . Is that expected from cms-sw/cms-bot#2719?

@makortel
Copy link
Copy Markdown
Contributor Author

+core

I think this PR is anyway good to go to EVOLUTION_X.

@mandrenguyen
Copy link
Copy Markdown
Contributor

merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants