Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new data catalogs from Rucio storage description (RucioCatalog) and use it by default instead of trivial data catalogs (TrivialCatalog) #37278

Merged
merged 1 commit into from
Aug 30, 2022

Conversation

nhduongvn
Copy link
Contributor

PR description:

New data access using catalogs (defined in ) and storage.json

PR validation:

if this PR is a backport please specify the original PR and why you need to backport that PR:

Before submitting your pull requests, make sure you followed this checklist:

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37278/28905

  • This PR adds an extra 32KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37278/28906

  • This PR adds an extra 36KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @nhduongvn for master.

It involves the following packages:

  • FWCore/Catalog (core)
  • FWCore/Services (core)

@cmsbuild, @smuzaffar, @Dr15Jones, @makortel can you please review it and eventually sign? Thanks.
@makortel, @wddgit this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@Dr15Jones
Copy link
Contributor

What is the motivation for this change?

@nhduongvn
Copy link
Contributor Author

Hi Chris,
The changes are needed for transition to a new storage description at sites required by Rucio (and then we switched from XML to the more modern json format) too. So, this is effectively completing the Rucio transition.
Best

Copy link
Contributor

@makortel makortel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Digging through old e-mails I reminded myself with these references
https://twiki.cern.ch/twiki/bin/view/CMS/StorageDescription
https://indico.cern.ch/event/919965/#11-a-new-description-for-stora
@nhduongvn Could you add these to the PR description (for future reference)?

Here are my comments from first review pass. My feeling is that some details may need further iteration. Could you also extend the tests of FileLocator and SiteLocalConfigService (here and here) to cover the new functionality?

Is there any need to backport this to earlier release cycles?

If I understood correctly this PR only adds the code, but it does not get used yet (because the default in InputFileCatalog constructor preserves current behavior). How do you foresee the new code to be enabled? Would this be done e.g. by switching the default in InputFileCatalog constructor? Or would every user of InputFileCatalog need to decide case-by-case which to use? Or something else?

How long does the reading of storage.xml need to be supported? E.g. do all the sites already provide storage.json?

#include <boost/property_tree/ptree.hpp>
#include <boost/foreach.hpp>

namespace pt = boost::property_tree;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this namespace alias to FileLocator.cc? There is only one use of that in this header (so the long name is not that big of a deal), but the shorthand feels too generic to be spread into other source files including this header.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 10 to 12
#include <boost/property_tree/json_parser.hpp>
#include <boost/property_tree/ptree.hpp>
#include <boost/foreach.hpp>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only boost::property_tree type appears to be used in this header. Would it be sufficient to leave the #include of its header here and move the rest to the FileLocator.cc?

Suggested change
#include <boost/property_tree/json_parser.hpp>
#include <boost/property_tree/ptree.hpp>
#include <boost/foreach.hpp>
#include <boost/property_tree/ptree.hpp>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok the #include are rearranged


namespace edm {

class FileLocator {
public:
explicit FileLocator(std::string const& catUrl, unsigned iCatalog = 0);
//catType: 0 (1) to construct using data catagory from event-data (data-access)
explicit FileLocator(std::string const& catUrl, unsigned iCatalog = 0, unsigned catType = 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change the catType to an enum class with descriptive names? That would make the code more self-documenting.

With enum class you could also add a new constructor that avoids spelling the default value of iCatalog in calling code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @makortel ,
I do not understand your comment "With enum class you could also add a new constructor that avoids spelling the default value of iCatalog in calling code."
I updated the constructor:
explicit FileLocator(std::string const& catUrl, unsigned iCatalog = 0, unsigned catType = 0);
--> enum CatalogType {TrivialCatalog,RucioCatalog};
explicit FileLocator(std::string const& catUrl, unsigned iCatalog = 0, CatalogType catType = TrivialCatalog);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant you could add the following overload for the constructor

explicit FileLocator(std::string const& catUrl, CatalogType catType = TrivialCatalog);

Then the default value of iCatalog does not have to be repeated in places that need to set non-default catType.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

~FileLocator();

std::string pfn(std::string const& ilfn) const;
//ruleType: 0 (1) to use rule from storage.xml (storage.json)
std::string pfn(std::string const& ilfn, unsigned ruleType = 0) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change the ruleType to an enum class as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

rule.result = result;
rule.chain = "";
rules[protocol].emplace_back(std::move(rule));
} catch (cms::Exception const& e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What code in the try block would throw cms::Exception? Maybe it should catch something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the try block. Please see further comment below

pfns.push_back(pfn);
}
}
if (catType == 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (catType == 1) {
else if (catType == 1) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return m_dataCatalogs;
if (catType == 0)
return m_dataCatalogs;
if (catType == 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (catType == 1)
else if (catType == 1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -30,7 +30,7 @@ namespace edm {
SiteLocalConfig() {}
virtual ~SiteLocalConfig() {}

virtual std::vector<std::string> const& dataCatalogs(void) const = 0;
virtual std::vector<std::string> const& dataCatalogs(unsigned catType = 0) const = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that all callers of this function know at compile time whether catType is 0 or 1 (e.g. because of if statements outside). The returned data is also different for the two cases

  • 0 contains the url attributes of the catalog elements inside event-data elements
  • 1 contains comma-separated list of
    • site
    • sub-site if exists, otherwise site
    • site attribute of catalog element if exists, otherwise site
    • volume
    • protocol

So the calling code must handle these two cases differently. How about creating

  • a struct to store the aforementioned fields (to avoid creating and parsing comma-separated strings)
  • a new accessor function that returns std::vector of those structs

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this option without success. The code crashed with complain about "bad allocation". I do not think that 0 and 1 case the returned data is different. They can be both strings. It is just the matter of how trivial catalogs and new (Rucio) catalogs are defined in site_local_config.xml. In trivial catalog (case 0) everything is combined in an url and the codes still need to split it to know the path to storage definition (storage.xml) and the protocol:
"trivialcatalog_file:/cvmfs/cms.cern.ch/SITECONF/T1_US_FNAL/PhEDEx/storage.xml?protocol=xrd"
In the new (Rucio) catalog (case 1), things are more structured in "site", "volume" etc..

Therefore, we can make a string to include all information and the code will split it later as in case 0.
I do not see the benefit of having more complexity of a struct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The struct would be easier to understand and maintain as the meaning different fields would be self-documented than serializing the data into strings. Could you share the code you tried?

I'd actually argue that also in the "case 0" the dataCatalogs() returning the full URL (directly from site_local_config.xml) is not very good, and it would have been better to do the URL parsing in SiteLocalConfigService instead of FileLocator. But that is not really worth of improving at this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if (m_catAttr.empty()) {
if (!localconfservice.isAvailable())
throw cms::Exception("FileCatalog", "edm::SiteLocalConfigService is not available");
if (iCatalog >= localconfservice->dataCatalogs().size())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be calling

Suggested change
if (iCatalog >= localconfservice->dataCatalogs().size())
if (iCatalog >= localconfservice->dataCatalogs(1).size())

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -50,11 +62,14 @@ namespace edm {
ProtocolRules m_directRules;
/** Inverse rules are used to do the mapping from PFN to LFN*/
ProtocolRules m_inverseRules;
/** Direct rules are used to do the mapping from LFN to PFN taken from storage.json*/
ProtocolRules m_directRules_da;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the inverse rules, for PFN->LFN mapping?

On the other hand, a quick git grep indicates tha tthe FileLocator::lfn() would not be called except in the unit tests. Should we consider removing it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it, not sure why it is here in the first place

@stlammel
Copy link

Hallo Matti,

Is there any need to backport this to earlier release cycles?

yes, it would be desirable to port this to the latest versions of all CMSSW_X releases so
storage.xml can be phased out (at some point).

How long does the reading of storage.xml need to be supported? E.g. do all the sites already provide storage.json?

All sites have a storage.json file. The site support team is adding the two new site-local-config.xml sections to the remaining sites this week.

Thanks,

  • Stephan

@makortel
Copy link
Contributor

Thanks Stephan. That sounds to me like there would be no need to keep the storage.xml reading support (or, maybe keep the code for a while but change the default to storage.json).

@stlammel
Copy link

Hallo Matti,

We also came up with an additional question: is it intentional that the path to storage.json is not configurable within the site-local-config.xml file (like the path to storage.xml is currently)?

Yes, this is an intentional change: A site's storage.json and site-local-config.xml might be at different locations for different worker nodes/subsites. This way only the SITECONF directory structure is "fixed" with one entry point, SITECONFIG_PATH.

Thanks,

  • Stephan

@nhduongvn
Copy link
Contributor Author

I have received your comments and will work them out one by one.

@nhduongvn
Copy link
Contributor Author

Hi @makortel @stlammel
I will try to finish updating the codes today or tomorrow. I got "segmentation fault" when running tests last week and could not track down where. I have to do implementation step by step and run test along. Hopefully, I can resolve this, if not I will report back.

@nhduongvn
Copy link
Contributor Author

I have gone slowly to modify codes from Mati suggestions and done test along, so far no "segmentation fault" (I might did something wrong in the past when modifying all suggestions at once and did the test at the end). I will continue tomorrow.

@nhduongvn
Copy link
Contributor Author

Hi @makortel,
I made a new branch "test_struct_dataCatalog" to test storing data catalog as a struct. The struct is defined here:
https://github.com/nhduongvn/cmssw/blob/8a32692ed93f91eafa0fd1927ba5fb7c473ee777/FWCore/Catalog/interface/SiteLocalConfig.h#L35
I also made two constructors for "FileLocator"
https://github.com/nhduongvn/cmssw/blob/8a32692ed93f91eafa0fd1927ba5fb7c473ee777/FWCore/Catalog/interface/FileLocator.h#L21
https://github.com/nhduongvn/cmssw/blob/8a32692ed93f91eafa0fd1927ba5fb7c473ee777/FWCore/Catalog/interface/FileLocator.h#L22
However, when I compiled there are two errors in [1]. I am not sure how to overcome this currently. This branch contains all modifications you have suggested so if these are solved I will merge it with the branch used in the current pull request and the review can be restarted again.
As you remember, I said that I saw "crashes" when using struct or adding a new function to SiteLocalConfig.h (FWCore/Catalog/interface/SiteLocalConfig.h). I did not see them now. This could be that I implemented things in bulk and tested at the end and there were bugs somewhere that caused crashes (It could be that I used older CMSSW version). This time I carefully implement things step-by-step and doing test along. Anyway, these are not problems anymore and we can move on. Please help me to solve these compilation errors.
I have other question: where the struct can be optimally located? in SiteLocalConfig.h?
Thank you,
Duong

======================================
[1]
/uscms_data/d3/duong/SiteOperation/CMSSW_12_3_0_pre2/src/FWCore/Catalog/src/FileLocator.cc:77:27: error: expected constructor, destructor, or type conversion before '(' token
77 | FileLocator::FileLocator(edm::SiteLocalConfig::CatalogAtrr const& catAttr, unsigned iCatalog)
| ^
/uscms_data/d3/duong/SiteOperation/CMSSW_12_3_0_pre2/src/FWCore/Catalog/src/FileLocator.cc: In member function 'void edm::FileLocator::init_da(const edm::SiteLocalConfig::CatalogAttr&, unsigned int)':
/uscms_data/d3/duong/SiteOperation/CMSSW_12_3_0_pre2/src/FWCore/Catalog/src/FileLocator.cc:242:33: error: passing 'const edm::SiteLocalConfig::CatalogAttr' as 'this' argument discards qualifiers [-fpermissive]
242 | if (input_dataCatalog.empty()) {
| ^
In file included from /uscms_data/d3/duong/SiteOperation/CMSSW_12_3_0_pre2/src/FWCore/Catalog/interface/FileLocator.h:4,
from /uscms_data/d3/duong/SiteOperation/CMSSW_12_3_0_pre2/src/FWCore/Catalog/src/FileLocator.cc:1:
/uscms_data/d3/duong/SiteOperation/CMSSW_12_3_0_pre2/src/FWCore/Catalog/interface/SiteLocalConfig.h:37:18: note: in call to 'const bool edm::SiteLocalConfig::CatalogAttr::empty()'
37 | bool const empty(){return site==""&&subSite==""&&storageSite==""&&volume==""&&protocol=="";}

@makortel
Copy link
Contributor

Thanks @nhduongvn. The compilation error comes from edm::SiteLocalConfig::CatalogAttr::empty() not being const. Basically this line
https://github.com/nhduongvn/cmssw/blob/8a32692ed93f91eafa0fd1927ba5fb7c473ee777/FWCore/Catalog/interface/SiteLocalConfig.h#L37
should be changed to

      bool empty() const {return site==""&&subSite==""&&storageSite==""&&volume==""&&protocol=="";}

(plus formatting). I'd actually prefer to use the std::string::empty(), i.e.

      bool empty() const {return site.empty() and subSite.empty() and storageSite.empty() and volume.empty() and protocol.empty();}

The SiteLocalConfig.h as a place of the struct is ok. I would, however, move it outside of SiteLocalConfig class, i.e. directly to edm namespace, and spell out the name fully, i.e. CatalogAttributes. Note that the constructor on line
https://github.com/nhduongvn/cmssw/blob/8a32692ed93f91eafa0fd1927ba5fb7c473ee777/FWCore/Catalog/interface/SiteLocalConfig.h#L36
could be just CatalogAttr() = default; or even left out (I think).

I would also make the enum CatalogType as enum class CatalogType, and move directly to edm namespace (assuming the enumeration is still needed, I've forgotten already that level of detail).

@nhduongvn
Copy link
Contributor Author

Thank you. The errors were fixed.

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37278/29278

  • This PR adds an extra 216KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@mmusich
Copy link
Contributor

mmusich commented Aug 31, 2022

I kind of suspect this PR caused issues in the DQM unit tests in the CMSSW_12_6_X_2022-08-30-2300 IB https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el8_amd64_gcc10/CMSSW_12_6_X_2022-08-30-2300/unitTestLogs/DQM/Integration#/

@nhduongvn
Copy link
Contributor Author

I think since new data access is the default choice now, this --catalog file:/cvmfs/cms-ib.cern.ch/SITECONF/local/PhEDEx/storage.xml?protocol=xrootd is not appropriate or valid. If the SITECONFIG_PATH is set correctly, this one should work edmFileUtil --events /store/express/Run2022B/ExpressPhysics/FEVT/Express-v1/000/355/380/00000/b8a57fc4-5656-42b4-9b7b-2e647baf65e8.root

@makortel
Copy link
Contributor

makortel commented Sep 16, 2022

So so far the fallout of this PR has been

The backports should include the aforementioned fixes as well (plus others if anything else gets revealed by the time 12_6_0_pre2 RelVals finish, the pre-release itself has been cut). I'd be fine with a PR to 12_5_X already now if you want (or wait for the RelVals to finish first, I'd anyway wait for that before signing the backport).

@smuzaffar
Copy link
Contributor

another fallout is failure from CRAB jobs https://cms-talk.web.cern.ch/t/crab-test-cmssw-12-6-x-invalid-site-local-config/15423

@makortel
Copy link
Contributor

makortel commented Dec 1, 2022

The data taking has ended, and no further problems have surfaced. @nhduongvn @stlammel Maybe now it would be good time to prepare the backports?

I updated my comment #37278 (comment) that lists the fixes that need to backported too.

@stlammel
Copy link

stlammel commented Dec 1, 2022

Hallo Matti,
actually i was thinking about the same. I had a ping on my schedule for next Monday (to not look too pushy!)
Yes, let's round this up in the end of year shutdown. Thanks,

  • Stephan

@makortel
Copy link
Contributor

Hi @stlammel @nhduongvn just reminding the catalog change should still be backported.

@nhduongvn
Copy link
Contributor Author

Hi @stlammel @nhduongvn just reminding the catalog change should still be backported.

I am sorry I misunderstood that @makortel will do the backported. I will try to do this but actually I never done this before. Are there instructions somewhere?

@makortel
Copy link
Contributor

Ah, so we didn't really agree on who does the backports (as I assumed the opposite). There are instructions in https://cms-sw.github.io/tutorial-resolve-conflicts.html#backporting-a-pr . Alternatively, a short recipe would be to create a developer area of the latest release of a given release cycle, and cherry-pick the commits from this PR and the further fixes listed in #37278 (comment) (and fix possible conflicts on the way), test quickly, and make a PR.

@nhduongvn
Copy link
Contributor Author

Hi @makortel ,
I tried the backport but it seems something is wrong (many commits). Here are what I did:
cmsrel CMSSW_12_5_X_2023-01-24-2300
cd CMSSW_12_5_X_2023-01-24-2300/src/
cmsenv
git cms-rebase-topic nhduongvn:new-data-access
git branch -m new-data-access_126X_125X_1
git cherry-pick 9202779
git cherry-pick 0ff2e14
git cms-checkdeps -a
scram b
git push my-cmssw new-data-access_126X_125X_1

I also tried git cms-rebase-topic --old-base CMSSW_12_6_X_2023-01-25-2300 nhduongvn:new-data-access
but do not see changes in the files at all (The src area is empty after this command)

@makortel
Copy link
Contributor

By quick look your new-data-access_126X_125X_1 branch looks correct (three commits on top of CMSSW_12_5_X_2023-01-24-2300).

Alternative would be to just cherry-pick the commits of all three PRs, i.e. something along

cmsrel CMSSW_12_5_X_2023-01-24-2300
cd CMSSW_12_5_X_2023-01-24-2300/src/
cmsenv
git cms-addpkg FWCore/Catalog FWCore/Services Configuration/PyReleaseValidation
git cherry-pick 256fc9f4071e64c04c4d34a645d2cc52fff105da
git cherry-pick 9202779
git cherry-pick 0ff2e14
git cms-checkdeps -a
scram b

cmsbuild added a commit that referenced this pull request Apr 19, 2023
Backport codes for adding new data catalogs from Rucio storage description (RucioCatalog) and use it by default instead of trivial data catalogs (TrivialCatalog) #37278, and related bug fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants