Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix error code on fallback file open error with test, 12_4_X backport #42396

Merged
merged 1 commit into from
Aug 22, 2023

Conversation

wddgit
Copy link
Contributor

@wddgit wddgit commented Jul 27, 2023

PR description:

Fix error code returned by cmsRun on a FallbackFileOpenError. There was a bug where it returns FileOpenError in the fallback case.

There is more discussion and more information related to this bug in issue #42040. FYI @stlammel

Note that in original version of this PR there was a unit test included. We deleted the unit test in the 12_4_X backport only and plan to add it when the Rucio 12_4_X backport is completed. The unit test depends on the Rucio backport.

PR validation:

Verified manually and with existing unit tests.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 27, 2023

A new Pull Request was created by @wddgit (W. David Dagenhart) for CMSSW_12_4_X.

It involves the following packages:

  • IOPool/Input (core)

@cmsbuild, @smuzaffar, @Dr15Jones, @makortel can you please review it and eventually sign? Thanks.
@makortel this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@wddgit
Copy link
Contributor Author

wddgit commented Jul 27, 2023

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-5f21b8/33944/summary.html
COMMIT: 854f81c
CMSSW: CMSSW_12_4_X_2023-07-23-0000/el8_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/42396/33944/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 5 lines to the logs
  • Reco comparison results: 2 differences found in the comparisons
  • DQMHistoTests: Total files compared: 51
  • DQMHistoTests: Total histograms compared: 3766083
  • DQMHistoTests: Total failures: 7
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 3766053
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.004 KiB( 50 files compared)
  • DQMHistoSizes: changed ( 312.0 ): 0.004 KiB MessageLogger/Warnings
  • Checked 212 log files, 167 edm output root files, 51 DQM output files
  • TriggerResults: no differences found

@perrotta
Copy link
Contributor

backport of #42249

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 2, 2023

This pull request is fully signed and it will be integrated in one of the next CMSSW_12_4_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_13_3_X is complete. This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@makortel
Copy link
Contributor

makortel commented Aug 2, 2023

No, wait ...

@makortel
Copy link
Contributor

makortel commented Aug 2, 2023

So the unit tests are strictly speaking incorrectly set up for 12_4_X (starting from setting SITECONFIG_PATH), because the Rucio catalog development has not yet been backported to 12_4_X (as noted in #42040 (comment)). The fix itself is still correct.

Possible options forward

  1. backport the Rucio catalog to 12_4_X
  2. redo the tests with TFC (i.e. with storage.xml)
  3. remove the tests from this backport, and rely on the tests in later releases

Since the Rucio catalog was planned to be backported to 12_4_X anyway, the option 2 feels like a wasted effort. Let's discuss more in #42040.

@wddgit
Copy link
Contributor Author

wddgit commented Aug 21, 2023

We could just go ahead and merge this bug fix without the unit test. The backport to Rucio can continue independently. The unit test isn't that important given the level of development occurring in 12_4_X and that the test is running in master already.

Also note that even though the current version of the unit test contains lines that are nonsensical for Trivial File Catalog, the log files show the test is actually verifying proper behavior. One file open attempt leads to FileOpenError and with two file open attempts there is a FallbackFileOpenError.

@makortel
Copy link
Contributor

Hmm. I suppose the risk for the site-local-config.xml used in the IBs for not having fallbacks (that would cause the unit test to fail) would be tiny. Relying on the IB's site-local-config.xml would mean that the xrootd servers and redirectors would see file open attempts for nonexistent files. The number of such file open attempts should be fairly small (3 per IB flavor times 2 per day) though.

@makortel
Copy link
Contributor

We decided to not include the unit tests in this PR. The tests will be backported as part of the Rucio catalog backport to 12_4_X, or soon afterwards.

@cmsbuild
Copy link
Contributor

Pull request #42396 was updated. @cmsbuild, @smuzaffar, @Dr15Jones, @makortel can you please check and sign again.

@wddgit
Copy link
Contributor Author

wddgit commented Aug 21, 2023

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-5f21b8/34395/summary.html
COMMIT: 881384b
CMSSW: CMSSW_12_4_X_2023-08-20-0000/el8_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/42396/34395/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 11 lines from the logs
  • Reco comparison results: 10 differences found in the comparisons
  • DQMHistoTests: Total files compared: 51
  • DQMHistoTests: Total histograms compared: 3766083
  • DQMHistoTests: Total failures: 19
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 3766041
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.004 KiB( 50 files compared)
  • DQMHistoSizes: changed ( 312.0 ): 0.004 KiB MessageLogger/Warnings
  • Checked 212 log files, 167 edm output root files, 51 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Contributor

Comparisons show #39214, which is a known issue in this release cycle.

@makortel
Copy link
Contributor

+core

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_12_4_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_13_3_X is complete. This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @antoniovilela, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 88ebdac into cms-sw:CMSSW_12_4_X Aug 22, 2023
@wddgit wddgit deleted the fixErrorReturnCode124X branch March 11, 2024 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants