Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix rucio list dataset replicas #43

Closed
dciangot opened this issue Apr 29, 2022 · 9 comments
Closed

Fix rucio list dataset replicas #43

dciangot opened this issue Apr 29, 2022 · 9 comments

Comments

@dciangot
Copy link

dciangot commented Apr 29, 2022

The RUCIO API that lists dataset replicas location has a known issue (*) that make it provides inconsistent/outdated location. The correct response is provided by the very same API but with deep=True parameter (**).

If I follow the DAS code correctly (big if) the only point where this API is used is here (***) (not sure though, the name is not what I expect the call to do). It should be then enough to add deep=True parameter to this call in order to get the correct set of dataset location.

(*)
dmwm/CMSRucio#257

(**)
https://rucio.readthedocs.io/en/old-doc/restapi/replica.html#get--replicas-(path-scope_name)-datasets

(***)

das2go/das/das.go

Lines 641 to 644 in 12589ce

if urn == "block4dataset_size" {
// add datasets after url which will return CMS blocks (Rucio datasets)
furl = fmt.Sprintf("%s/datasets/", furl)
}

@dciangot
Copy link
Author

by the way, the following example can be more useful than me diving into guessing code workflow:

https://cmsweb.cern.ch/das/request?instance=prod/global&input=site+dataset%3D%2FDoubleMuon%2FRun2017D-17Nov2017-v1%2FAOD

this is showing T2_BR_UERJ site with some blocks present. That is coherent with calling rucio with deep=False (default):

$ rucio list-dataset-replicas cms:/DoubleMuon/Run2017D-17Nov2017-v1/AOD#ebd759d0-d2b0-11e7-9193-02163e018fca

DATASET: cms:/DoubleMuon/Run2017D-17Nov2017-v1/AOD#ebd759d0-d2b0-11e7-9193-02163e018fca
+-----------------+---------+---------+
| RSE             |   FOUND |   TOTAL |
|-----------------+---------+---------|
| T1_UK_RAL_Tape  |      64 |      64 |
| T2_BR_UERJ      |       0 |      64 |
| T2_UK_London_IC |      59 |      64 |
+-----------------+---------+---------+

But indeed using deep=True we got the correct location where T2_BR_UERJ is not within the sites:

$ rucio list-dataset-replicas cms:/DoubleMuon/Run2017D-17Nov2017-v1/AOD#ebd759d0-d2b0-11e7-9193-02163e018fca --deep

DATASET: cms:/DoubleMuon/Run2017D-17Nov2017-v1/AOD#ebd759d0-d2b0-11e7-9193-02163e018fca
+-----------------+---------+---------+
| RSE             |   FOUND |   TOTAL |
|-----------------+---------+---------|
| T1_UK_RAL_Tape  |      64 |      64 |
| T2_UK_London_IC |      59 |      64 |
+-----------------+---------+---------+

@dciangot
Copy link
Author

@vkuznet

FYI: @belforte @ericvaandering @nsmith- please correct me if I'm reporting this wrong

@ericvaandering
Copy link
Member

I thought DAS already switched to using deep=True. Nevertheless, I think that's the correct approach.

@dciangot
Copy link
Author

maybe a better candidate for the fix is this one:

furl = fmt.Sprintf("%s/replicas/cms/%s/datasets", RucioUrl(), url.QueryEscape(blkName))

@vkuznet
Copy link
Collaborator

vkuznet commented Apr 29, 2022

Hi, I applied deep=True to /replicas/cms/<block>/datasets API and new server is available on cmsweb-testbed, please inspect results and let me know if it ok now. For instance, here is your dataset on testbed: https://cmsweb-testbed.cern.ch/das/request?view=list&limit=50&instance=int%2Fglobal&input=site+dataset%3D%2FDoubleMuon%2FRun2017D-17Nov2017-v1%2FAOD

If you confirm, then I can apply it to production server.

@dciangot
Copy link
Author

Yes, at least the behavior is consistent with --deep.

@belforte
Copy link
Member

thanks Valentin, looks good. Here's e.g. another example on a larger dataset
old: https://cmsweb.cern.ch/das/request?view=list&limit=50&instance=prod%2Fglobal&input=site+dataset%3D%2FEGamma%2FRun2018C-12Nov2019_UL2018-v2%2FAOD
new: https://cmsweb-testbed.cern.ch/das/request?instance=int/global&input=site+dataset%3D%2FEGamma%2FRun2018C-12Nov2019_UL2018-v2%2FAOD

The number of sites where the dataset is present changed from 36 to 33 but most relevant is that in new view all disk sites have full blocks (file replica presence is always 100%) which is as we like it to be.
If file presence were not 100% it would mean that data is in transfer, or that Rucio did not work as it should. So we expect it to be very rare.

FWIW I feel much better about our data placement now !

From my side you can close and move to production, Thanks again for super fast fix.

@vkuznet
Copy link
Collaborator

vkuznet commented Apr 30, 2022

Now, new das server version in production, I'll need to update dasgoclient though.

@vkuznet
Copy link
Collaborator

vkuznet commented Apr 30, 2022

New dasgoclient version v02.04.48 is in cmsdist pipeline, see cms-sw/cmsdist#7834

I'm closing this ticket.

@vkuznet vkuznet closed this as completed Apr 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants