-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem overwriting/unlink simulation output file #44369
Comments
cms-bot internal usage |
A new Issue was created by @fabferro. @rappoccio, @makortel, @Dr15Jones, @smuzaffar, @sextonkennedy, @antoniovilela can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign core |
New categories assigned: core @Dr15Jones,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks |
I'm not able to reproduce on Could you give more details, e.g. on what filesystem you are running? Are you using https://github.com/cms-sw/cmssw/blob/master/SimPPS/Configuration/test/pg_step1_GEN_SIM_2021.py exactly as it is, or do you change |
I ran it as it is. I tried modifying it but things don't change. |
Trying some differential analysis: |
One more piece of information: it works fine with "pure" AFS, so it seems to be related to some bad interplay between EOS and CMSSW_14_0_0 |
The last working releases is CMSSW_14_0_0_pre1. _pre2 is the first one showing this issue |
I can reproduce when running the job on directory on EOS (via the FUSE mount). A major difference between 14_0_0_pre1 and pre2 is that pre1 used ROOT 6.26, and pre2 uses ROOT 6.30. Here is a stack trace for the exception
The cmssw/IOPool/TFileAdaptor/src/TStorageFactoryFile.cc Lines 167 to 169 in d389c1c
and our TStorageFactorySystem::Unlink() is indeed implemented ascmssw/IOPool/TFileAdaptor/src/TStorageFactorySystem.cc Lines 45 to 48 in d389c1c
The
cmssw/IOPool/TFileAdaptor/src/TFileAdaptor.cc Lines 60 to 61 in d389c1c
As of why the underlying filesystem makes a difference, I have no clue at the moment. |
Two possible workarounds
|
With
is root://eoshome-m.cern.ch/${PWD}<filename> . I'd bet this somehow makes the ROOT's TUnixSystem to not unlink the file, and leading to our TStorageFactorySystem::Unlink() to be called.
I checked the behavior on 14_0_0_pre1, and the the |
type root |
@pcanal Did ROOT get an ability to find out if a local file is on (CERN) EOS, and in which case it prepends the file path with |
This workaround seems to work too process.add_(cms.Service("AdaptorConfig", native=cms.untracked.vstring("root"))) |
Setting output file as |
I found root-project/root#11644. It pointed another workaround, adding
to |
Yes in v6.28. (the PR you found). |
@pcanal Is there a way to choose the behavior per |
If you know it is a local file and want to stay local, you use |
I'm running the PPS Full Simulation with a particle gun, but when I run it for the second time I get the following error:
----- Begin Fatal Exception 11-Mar-2024 15:47:36 CET-----------------------
An exception of category 'FatalRootError' occurred while
[0] Calling EventProcessor::runToCompletion (which does almost everything after beginJob and before endJob)
Additional Info:
[a] Fatal Root Error: @sub=TStorageFactorySystem::Unlink
Unsupported
----- End Fatal Exception -------------------------------------------------
The error disappears if I delete the output root file and re-run the simulation.
It started to happen a few weeks ago, never happened before.
It happens in CMSSW_14_0_0 but also in other releases.
It happens both with lxplus and lxplus7.
The file I'm running is https://github.com/cms-sw/cmssw/blob/master/SimPPS/Configuration/test/pg_step1_GEN_SIM_2021.py
The text was updated successfully, but these errors were encountered: