Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix chaining in rules and unify prefix and rules of protocols #42530

Merged
merged 1 commit into from
Aug 18, 2023

Conversation

nhduongvn
Copy link
Contributor

The chaining mechanism is still be supported in the new storage description as in the case of Trivial File Catalogue (TFC) before:
https://twiki.cern.ch/twiki/bin/viewauth/CMS/StorageDescription
However, it was effectively turned off (chain="") during development of codes for the new storage description
#37278
Therefore, some sites will not see the file, for example T2_DE_DESY
https://cms-talk.web.cern.ch/t/errors-of-missing-files-at-t2-de-desy/27838/20
(currently there are three sites use 'chain' T1_DE_KIT, T2_DE_DESY, T2_BE_IIHE.)

This PR puts back the chaining feature. It also unifies the prefix and rules of protocols by converting the prefix to a rule. Therefore, codes handling prefix and rules are the same.

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 9, 2023

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-42530/36543

  • This PR adds an extra 16KB to repository

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 9, 2023

A new Pull Request was created by @nhduongvn for master.

It involves the following packages:

  • FWCore/Catalog (core)

@cmsbuild, @smuzaffar, @Dr15Jones, @makortel can you please review it and eventually sign? Thanks.
@makortel, @missirol, @wddgit this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@makortel
Copy link
Contributor

Could you add tests that show and ensure the chaining is working properly?

FWCore/Catalog/interface/FileLocator.h Outdated Show resolved Hide resolved
FWCore/Catalog/interface/FileLocator.h Outdated Show resolved Hide resolved
@@ -1,6 +1,5 @@
#include "FWCore/Catalog/interface/FileLocator.h"
#include "FWCore/ServiceRegistry/interface/Service.h"
#include "FWCore/Utilities/interface/Exception.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this #include should be kept.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

FWCore/Catalog/src/FileLocator.cc Outdated Show resolved Hide resolved
@nhduongvn
Copy link
Contributor Author

Could you add tests that show and ensure the chaining is working properly?

I tested with the file that user in the CMStalk post above reported that it can't be opened at T2_DE_DESY:

Error in TNetXNGFile::ReadBuffer: [ERROR] Server responded with an error:
[3011] Unable to read
/store/data/Run2022F/EGamma/NANOAOD/PromptNanoAODv11_v1-v2/80000/6d07f113-b587-45b3-8db6-c2928358ae0b.root;
no such file or directory

Following these steps

  1. setenv SITECONFIG_PATH /cvmfs/cms.cern.ch/SITECONF/T2_DE_DESY
  2. edmFileUtil -d /store/data/Run2022F/EGamma/NANOAOD/PromptNanoAODv11_v1-v2/80000/6d07f113-b587-45b3-8db6-c2928358ae0b.root
    dcap://dcache-cms-dcap.desy.de/pnfs/desy.de/cms/tier2/store/data/Run2022F/EGamma/NANOAOD/PromptNanoAODv11_v1-v2/80000/6d07f113-b587-45b3-8db6-c2928358ae0b.root

(The path is translated correctly. The "/pnfs/..." which is resulted from path translation of protocol "pnfs" chained from protocol "dcap" is included)

  1. cmsRun test_sitelocalconfig_source_cfg_working.py

(The config is https://drive.google.com/file/d/1sA35yXvVDa6vizKGkYGpDfmpXqwLNfC9/view?usp=sharing)

The file can be opened
10-Aug-2023 10:36:08 CDT Initiating request to open file dcap://dcache-cms-dcap.desy.de/pnfs/desy.de/cms/tier2/store/data/Run2022F/EGamma/NANOAOD/PromptNanoAODv11_v1-v2/80000/6d07f113-b587-45b3-8db6-c2928358ae0b.root
10-Aug-2023 10:36:09 CDT Successfully opened file dcap://dcache-cms-dcap.desy.de/pnfs/desy.de/cms/tier2/store/data/Run2022F/EGamma/NANOAOD/PromptNanoAODv11_v1-v2/80000/6d07f113-b587-45b3-8db6-c2928358ae0b.root
10-Aug-2023 10:36:11 CDT Closed file dcap://dcache-cms-dcap.desy.de/pnfs/desy.de/cms/tier2/store/data/Run2022F/EGamma/NANOAOD/PromptNanoAODv11_v1-v2/80000/6d07f113-b587-45b3-8db6-c2928358ae0b.root

(If run with a MINIAOD file "/store/data/Run2016C/DoubleMuon/MINIAOD/17Jul2018-v1/00000/02B5E9A5-878D-E811-832D-0CC47A545096.root". There is no crash. The above file is NANOAOD.)

@nhduongvn
Copy link
Contributor Author

For our record, here is email from Stephan on how chaining works:

Hallo Duong,
it's iterative. Consider the following scenario:

"protocol": "first"
"rules": [
{ "lfn": "/test/aaa/(.*)",
"pfn": "/test/AAA/$1"
},
{ "lfn": "(.*)",
"pfn": "$1"
}
]

"protocol": "second"
"rules": [
{ "chain": "first",
"lfn": "/+test/(.*)",
"pfn": "/cms/$1"
},
{ "lfn": "/+store/(.*)",
"pfn": "/cms/store/$1"
}
]

"protocol": "root"
"rules": [
{ "lfn": "(.*)",
chain": "second",
"pfn": "root://host.domain//pnfs$1"
}
]

then /store/mc/xxx of protocol root should translate
protocol root lfn rule (.*) matches triggers chain second
protocol second lfn rule /+store/(.*) matches translates pfn to /cms/store/mc/xxx
protocol root translates /cms/store/mc/xxx to root://host.domain//pnfs/cms/store/mc/xxx

and /test/aaa/xxx of protocol root should translate
protocol root lfn rule (.*) matches triggers chain second
protocol second lfn rule /+test/(.*) triggers chain first
protocol first lfn rule /test/aaa/(.*) matches translates pfn to /test/AAA/xxx
protocol second translates /test/AAA/xxx to /cms/AAA/xxx
protocol root translates /cms/AAA/xxx to root://host.domain//pnfs/cms/AAA/xxx

(I hope i got things right. It's still confusing me and i get things
wrong a fair time i make an example.)
Thanks,
cheers, Stephan

@nhduongvn nhduongvn closed this Aug 10, 2023
@nhduongvn nhduongvn reopened this Aug 10, 2023
@makortel
Copy link
Contributor

Could you add tests that show and ensure the chaining is working properly?

I tested with the file that user in the CMStalk post above reported that it can't be opened at T2_DE_DESY:

Ok, but having those implemented as unit tests would clearly demonstrate the expected behavior, and ensure in the future that the behavior doesn't break.

In retrospect insufficient unit tests played a role in the hurdles following the merge of #37278, and I'd really like to avoid that experience.

@nhduongvn
Copy link
Contributor Author

Could you add tests that show and ensure the chaining is working properly?

I tested with the file that user in the CMStalk post above reported that it can't be opened at T2_DE_DESY:

Ok, but having those implemented as unit tests would clearly demonstrate the expected behavior, and ensure in the future that the behavior doesn't break.

In retrospect insufficient unit tests played a role in the hurdles following the merge of #37278, and I'd really like to avoid that experience.

I totally agree with this. When Stephan is back we can come up with a site-local-config.xml and storage.json that cover all possible scenarios and design unit tests for them. For now I can add a unit test for this chaining.

@makortel
Copy link
Contributor

When Stephan is back we can come up with a site-local-config.xml and storage.json that cover all possible scenarios and design unit tests for them.

That would be great!

For now I can add a unit test for this chaining.

I think for the chaining alone expanding the existing FileLocator_t.cpp and either the existing storage.json or adding a new one would suffice.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-42530/36569

  • This PR adds an extra 20KB to repository

@cmsbuild
Copy link
Contributor

Pull request #42530 was updated. @cmsbuild, @smuzaffar, @Dr15Jones, @makortel can you please check and sign again.

@makortel
Copy link
Contributor

@cmsbuild, please test

@makortel
Copy link
Contributor

@nhduongvn A gentle ping

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-42530/36633

  • This PR adds an extra 12KB to repository

@cmsbuild
Copy link
Contributor

Pull request #42530 was updated. @cmsbuild, @smuzaffar, @Dr15Jones, @makortel can you please check and sign again.

@nhduongvn
Copy link
Contributor Author

@nhduongvn A gentle ping

Done! I am sorry for slow response due to traveling.

@makortel
Copy link
Contributor

Thanks!

@makortel
Copy link
Contributor

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-46ede2/34346/summary.html
COMMIT: 000b6fb
CMSSW: CMSSW_13_3_X_2023-08-17-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/42530/34346/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 10 lines from the logs
  • Reco comparison results: 1 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3152915
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3152893
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 207 log files, 159 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Contributor

+core

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @antoniovilela, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@rappoccio
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants