Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit test PhysicsToolsNanoAODToolsTestDriver failed: missing file #44293

Closed
iarspider opened this issue Mar 4, 2024 · 26 comments
Closed

Unit test PhysicsToolsNanoAODToolsTestDriver failed: missing file #44293

iarspider opened this issue Mar 4, 2024 · 26 comments

Comments

@iarspider
Copy link
Contributor

===== Test "PhysicsToolsNanoAODToolsTestDriver" ====
===== Test NanoAODTools functionality ====
Error in <TNetXNGFile::Open>: [ERROR] Server responded with an error: [3011] Unable to open file /eos/cms/store/user/cmsbuild/store/group/cat/datasets/NANOAODSIM/RunIISummer20UL18NanoAODv9-106X_upgrade2018_realistic_v16_L1v1-v2/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/7B930101-EB91-4F4E-9B90-0861460DBD94.root; No such file or directory

Traceback (most recent call last):
 File "/data/cmsbld/jenkins/workspace/ib-run-qa/CMSSW_14_1_X_2024-03-03-0000/src/PhysicsTools/NanoAODTools/test/exampleAnalysis.py", line 56, in <module>
   p.run()
 File "/cvmfs/cms-ib.cern.ch/sw/x86_64/week1/slc7_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-03-0000/src/PhysicsTools/NanoAODTools/python/postprocessing/framework/postprocessor.py", line 166, in run
   inFile = ROOT.TFile.Open(fname)
 File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02827/slc7_amd64_gcc12/lcg/root/6.30.05-09dd1c56883e4f434d3cfa5dffc18d6e/lib/ROOT/_pythonization/_tfile.py", line 103, in _TFileOpen
   raise OSError('Failed to open file {}'.format(str(args[0])))
OSError: Failed to open file root://eoscms.cern.ch//eos/cms/store/user/cmsbuild/store/group/cat/datasets/NANOAODSIM/RunIISummer20UL18NanoAODv9-106X_upgrade2018_realistic_v16_L1v1-v2/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/7B930101-EB91-4F4E-9B90-0861460DBD94.root
Falure in exampleAalysis: status 1

---> test PhysicsToolsNanoAODToolsTestDriver had ERRORS
TestTime:3
^^^^ End Test PhysicsToolsNanoAODToolsTestDriver ^^^^
>> Tests for package PhysicsTools/NanoAODTools ran.

File is indeed missing from EOS. First occurrence: CMSSW_14_1_X_2024-03-03-0000 (slc7_amd64_gcc12) and CMSSW_14_1_NONLTO_X_2024-03-03-0000 (el8_amd64_gcc12)

@iarspider
Copy link
Contributor Author

assign PhysicsTools/NanoAODTools

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 4, 2024

New categories assigned: xpog

@vlimant,@hqucms you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 4, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 4, 2024

A new Issue was created by @iarspider.

@sextonkennedy, @makortel, @smuzaffar, @antoniovilela, @Dr15Jones, @rappoccio can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@vlimant
Copy link
Contributor

vlimant commented Mar 4, 2024

FYI : @clelange @lenzip @namapane

@namapane
Copy link
Contributor

namapane commented Mar 4, 2024

The file was supposed to be available permanently. I see it was renamed:

$ ls /eos/cms/store/user/cmsbuild/store/group/cat/datasets/NANOAODSIM/RunIISummer20UL18NanoAODv9-106X_upgrade2018_realistic_v16_L1v1-v2/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/

7B930101-EB91-4F4E-9B90-0861460DBD94.root.unused

@makortel
Copy link
Contributor

makortel commented Mar 4, 2024

@iarspider The path /eos/cms/store/user/cmsbuild/... is controlled by core (for the "ibeos" cache), right? Do you know why the file was renamed? (guessing from the new file, did some component think the file would be unused?)

@iarspider
Copy link
Contributor Author

@makortel I'm unfortunately not very familiar with how eos cache works. A quick grep shows that there is a job that populates/cleans up that area, and from the logs I see that indeed this file was renamed, as it wasn't used for 150 days (unused files are deleted after 360 days):

/eos/cms/store/user/cmsbuild/store/group/cat/datasets/NANOAODSIM/RunIISummer20UL18NanoAODv9-106X_upgrade2018_realistic_v16_L1v1-v2/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/7B930101-EB91-4F4E-9B90-0861460DBD94.root unused for last 150 days.
Renamed:  /store/group/cat/datasets/NANOAODSIM/RunIISummer20UL18NanoAODv9-106X_upgrade2018_realistic_v16_L1v1-v2/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/7B930101-EB91-4F4E-9B90-0861460DBD94.root

The usage is based on ElasticSearch index, maybe @aandvalenzuela can add something.

@iarspider
Copy link
Contributor Author

If I'm not missing anything, the script that populates that index looks for Initiating request to open file line in log to determine which files (datasets) are used by a given unit test. Looking at successful log of the failing unit test, I don't see this line - does it use a different mechanism to load the missing root file?

@makortel
Copy link
Contributor

makortel commented Mar 4, 2024

I'm pretty sure this test does not use CMSSW and therefore would not have Initiating request to open file line.

@vlimant @clelange @lenzip @namapane Would it be easy to change the test to print such line?

@namapane
Copy link
Contributor

namapane commented Mar 4, 2024

Do I understand correctly that the test script should print "Initiating request to open file " so that it is understood as being used?
I'm a bit surprised this is the way the eos cache works, but anyhow, adding this line is surely possible, will make a PR...

@iarspider
Copy link
Contributor Author

Do I understand correctly that the test script should print "Initiating request to open file " so that it is understood as being used? I'm a bit surprised this is the way the eos cache works, but anyhow, adding this line is surely possible, will make a PR...

It should print something like this :

04-Mar-2024 01:40:15 CET  Initiating request to open file root://eoscms.cern.ch//eos/cms/store/user/cmsbuild/store/data/Run2011A/MinimumBias/RAW-RECO/ValSkim-08Nov2011-v1/0000/065CC727-800F-E111-B1A6-002618943986.root

(timestamp is not important, but must be present; also the path should, of course, match whatever file(s) are actually used)

@antoniovilela
Copy link
Contributor

antoniovilela commented Mar 4, 2024

Do I understand correctly that the test script should print "Initiating request to open file " so that it is understood as being used? I'm a bit surprised this is the way the eos cache works, but anyhow, adding this line is surely possible, will make a PR...

It should print something like this :

04-Mar-2024 01:40:15 CET  Initiating request to open file root://eoscms.cern.ch//eos/cms/store/user/cmsbuild/store/data/Run2011A/MinimumBias/RAW-RECO/ValSkim-08Nov2011-v1/0000/065CC727-800F-E111-B1A6-002618943986.root

(timestamp is not important, but must be present; also the path should, of course, match whatever file(s) are actually used)

Doesn't it seem something easy to get wrong, the formatting of the text?

Could the script that populates the index look for another string from the test script (in addition to what is already done), one that more clearly identifies its purpose, say "REGISTER FILE: file path" or something along these lines?

@iarspider
Copy link
Contributor Author

Doesn't it seem something easy to get wrong, the formatting of the text?

Maybe. Actually, the scipt searches for (space)Initiating request to open file(space), and treats anything after that and upto next space as dataset URL: https://github.com/cms-sw/cms-bot/blob/e199d812cd001646df4ac4798dc415657044ecd7/es_ibs_log.py#L79-L86

Could the script that populates the index look for another string from the test script (in addition to what is already done), one that more clearly identifies its purpose, say "REGISTER FILE: file path" or something along these lines?

Sure, I can prepare a PR for that.

@namapane
Copy link
Contributor

namapane commented Mar 5, 2024

Added the message in #44310. The code now issues a message like this:
05-Mar-2024 10:41:04 CET Initiating request to open file root://eoscms.cern.ch//eos/cms/store/user/cmsbuild/store/group/cat/datasets/NANOAODSIM/RunIISummer20UL18NanoAODv9-106X_upgrade2018_realistic_v16_L1v1-v2/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/7B930101-EB91-4F4E-9B90-0861460DBD94.root

This output is redirected to logs/$SCRAM_ARCH/testing.log by scram b runtests, I hope that is where the scripts search for it.

BTW, somebody has to put back the renamed file, otherwise the test will keep failing,

@iarspider
Copy link
Contributor Author

Added the message in #44310. The code now issues a message like this: 05-Mar-2024 10:41:04 CET Initiating request to open file root://eoscms.cern.ch//eos/cms/store/user/cmsbuild/store/group/cat/datasets/NANOAODSIM/RunIISummer20UL18NanoAODv9-106X_upgrade2018_realistic_v16_L1v1-v2/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/7B930101-EB91-4F4E-9B90-0861460DBD94.root

This output is redirected to logs/$SCRAM_ARCH/testing.log by scram b runtests, I hope that is where the scripts search for it.

Thanks, will check once the PR is merged.

BTW, somebody has to put back the renamed file, otherwise the test will keep failing,

If I'm reading the source correctly, it will fail once after merge, then parser will find the line and rename the file back. I can probably rename it by hand, but it will be renamed again once the sync job runs (which happens every 12h, around 10 am and 10 pm).

@namapane
Copy link
Contributor

namapane commented Mar 5, 2024

I don't think the PR can be merged if runtests fails, isn't it?

@antoniovilela
Copy link
Contributor

We can override and merge it, let us know when it is ready.

@namapane
Copy link
Contributor

namapane commented Mar 5, 2024

It is ready, #44310.

@antoniovilela
Copy link
Contributor

It is ready, #44310.

@iarspider has renamed the file, so let's wait for the tests.

@antoniovilela
Copy link
Contributor

antoniovilela commented Mar 6, 2024

The PR is merged. Could we check when possible the script log to see if it is finding this file?

Does this need to be backported? Maybe not because of the file-in-index issue? There were a couple of other fixes included though.

@namapane
Copy link
Contributor

namapane commented Mar 6, 2024

Does this need to be backported? Maybe not because of the file-in-index issue? There were a couple of other fixes included though.

I think that backporinting it to 13_3_X (and possibly also 13_2_X) would be useful to keep a consistent behaviour, also in the open message, although it's not mandatory. If you agree I can make the PRs (please let me know for which series you would recommend it)

@antoniovilela
Copy link
Contributor

antoniovilela commented Mar 6, 2024

Does this need to be backported? Maybe not because of the file-in-index issue? There were a couple of other fixes included though.

I think that backporinting it to 13_3_X (and possibly also 13_2_X) would be useful to keep a consistent behaviour, also in the open message, although it's not mandatory. If you agree I can make the PRs (please let me know for which series you would recommend it)

It is ok to backport. 13_3_X is fine. Also 14_0_X, right?

namapane added a commit to namapane/cmssw that referenced this issue Mar 7, 2024
@vlimant
Copy link
Contributor

vlimant commented Mar 15, 2024

all good here, can we close this ?

@iarspider
Copy link
Contributor Author

Do I understand correctly that the test script should print "Initiating request to open file " so that it is understood as being used? I'm a bit surprised this is the way the eos cache works, but anyhow, adding this line is surely possible, will make a PR...

It should print something like this :

04-Mar-2024 01:40:15 CET  Initiating request to open file root://eoscms.cern.ch//eos/cms/store/user/cmsbuild/store/data/Run2011A/MinimumBias/RAW-RECO/ValSkim-08Nov2011-v1/0000/065CC727-800F-E111-B1A6-002618943986.root

(timestamp is not important, but must be present; also the path should, of course, match whatever file(s) are actually used)

Doesn't it seem something easy to get wrong, the formatting of the text?

Could the script that populates the index look for another string from the test script (in addition to what is already done), one that more clearly identifies its purpose, say "REGISTER FILE: file path" or something along these lines?

@smuzaffar

@smuzaffar
Copy link
Contributor

Could the script that populates the index look for another string from the test script (in addition to what is already done), one that more clearly identifies its purpose, say "REGISTER FILE: file path" or something along these lines?

it will be trivial to teach bot to look for a additional message e.g. REGISTER_LFN:lfn or OPEN_LFN:lfn and cache those lfns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants