-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple RelVal failures due to file being registered in DAS but not present #40889
Comments
A new Issue was created by @iarspider . @Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
looks like some cleanup was done at
but now same query returns just one non accessible file
any one knows about this cleanup ? [a]
|
assign core |
New categories assigned: core @Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks |
Let's add @cms-sw/pdmv-l2 in case they would know if there is or has been any general RelVal cleanup at CERN. |
I've followed this CMS Talk thread https://cms-talk.web.cern.ch/t/cannot-find-any-valid-file-inside-the-dataset/20761 where Rucio's automatic file-level cleanup has caused some unexpected behavior. |
If these data are not cached for IBs, the relval outputs are kept for 6 months - 1 year. |
@kskovpen , we only cache files which are actually open during the IB/PR tests. In this case only one file |
I see now in the DAS web GUI that the
Clearly there is some inconsistency between |
The two commands I run on the Rucio side
are consistent. The file is only at T2_IN_TIFR. Maybe some cached info in DAS from CERN? In fact there was a rule which could have had it there until Saturday:
|
dasgocliet json outout shows that [a]
|
I see the JSON document has inside @smuzaffar Could you remind me, if DAS would have returned empty list of files for the |
DAS go client does not have cache, said that the issue is different rucio APIs DAS uses when using one query vs another. If you'll add to das client
As far as I can tell they produce different results and therefore it is issue with output of different Rucio APIs rather DAS per se. |
I suspect the first one is the equivalent of
(Note the absence of the |
@makortel , bot keeps the old results if das returns empty list or error for a query. |
To @ericvaandering , I saw this |
If you look here: https://github.com/rucio/rucio/blob/ea4d2c7e2702b85a86a8668b2d852971f42fceb4/lib/rucio/client/replicaclient.py#L326 and here: https://github.com/rucio/rucio/blob/ea4d2c7e2702b85a86a8668b2d852971f42fceb4/lib/rucio/common/utils.py#L133 the way I read it is you just append |
For now I have update bot to ignore the |
RelVals 20834.x, 21034.x are broken since 2022-02-26-0000 due to inaccessible file:
This file is a part of
/RelValTTbar_14TeV/CMSSW_12_3_0_pre5-123X_mcRun4_realistic_v4_2026D88noPU-v1/GEN-SIM
dataset, and registered in DAS as accessible on T2_CH_CERN:$ dasgoclient --limit 0 --query 'file dataset=/RelValTTbar_14TeV/CMSSW_12_3_0_pre5-123X_mcRun4_realistic_v4_2026D88noPU-v1/GEN-SIM site=T2_CH_CERN' /store/relval/CMSSW_12_3_0_pre5/RelValTTbar_14TeV/GEN-SIM/123X_mcRun4_realistic_v4_2026D88noPU-v1/10000/49e54274-4298-4576-b47b-866e2247eab5.root
but it is not actually present on EOS:
Previously, no files from that dataset were registered as present on
T2_CH_CERN
, and DAS was returning a full list of files, so RelVal was using a different file (2c4c1ca9-73fe-4648-982f-e773c9ec91e9.root
), which is cached on EOS.The text was updated successfully, but these errors were encountered: