-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question on how TFileService
is supposed to interact with eos
#46024
Comments
cms-bot internal usage |
A new Issue was created by @mmusich. @Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign CommonTools/UtilAlgos |
New categories assigned: reconstruction @jfernan2,@mandrenguyen you have been requested to review this Pull request/Issue and eventually sign? Thanks |
Since
calls
With the It may be worth of noting here that writing to (CERN) EOS through the FUSE mount has an "interesting" behavior as well #44369 (ROOT internally transforms the local-looking path into a |
assign core |
New categories assigned: core @Dr15Jones,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks |
Would you be able to try if adding process.add_(cms.Service("AdaptorConfig", native=cms.untracked.vstring("root"))) to the job configuration would impact the behavior? (this prevents CMSSW to register the |
type root |
@pcanal Could there be some error condition (or other assumption) in |
I can see two possibility. One is that the ROOT build being used does not have the code from root-project/root#13842. The other, more likely, is that writing in a file open in read-only mode might not be failing elegantly .... i.e.
when/if the file was open with "RECREATE" indicates that something 'bad' happened during the One possibility is that the file is seen/thought-of as non-writeable (for example issue with permissions) and that some part of the logic in or around |
indeed, adding this line in the configuration file, the segmentation fault is prevented. Thank you. |
@mmusich is this issue solved? Thanks |
I am not sure. With the workaround at #46024 (comment) this particular instance of the problem is solved, though I can't say if that's a design feature or a bug. |
I have a naive question concerning the expected behavior of
TFileService
when it's configured to (over-)write files on eos.While trying to re-run some alignment related jobs @henriettepetersen reported a segmentation fault in
SplitVertexResolution
, stack trace below:(a reproducer is available at
/afs/cern.ch/work/h/hpeterse/public/splitV_seg_fault
, by copying locally the folder in any recent cmssw release and then runningcmsRun validation_cfg.py config=validation.json
).The issue seems to be related to the fact that the file that we're trying to write already exists with the same name at the same location.
In particular the segmentation fault originates here:
cmssw/Alignment/OfflineValidation/plugins/SplitVertexResolution.cc
Line 970 in e54f434
I can circumvent the issue by commenting that line, but then when running I see the following warning:
Warning in <TStorageFactoryFile::Write>: file root://eoscms.cern.ch//eos/cms/store/group/alca_trackeralign/AlignmentValidation/AlignmentValidation/2024_CDE_ReReco_mp3949_splitV_379525/SplitV/single/GT/compare2024/379525/SplitV.root not opened in write mode
What's somehow puzzling to me, is that when the address of the output file is local (e.g. the
$PWD
) even if the file is already existing there, there is no issue whatsoever.Also I would have thought that due to this:
cmssw/CommonTools/UtilAlgos/src/TFileService.cc
Line 22 in 5e10089
the file would have been overwritten anyway.
Also when trying to prepare a reproducer via a simple ROOT script:
I have found out that with this I can overwrite the remote file as many times as I want.
Am I missing something trivial ?
Cc: @TomasKello
The text was updated successfully, but these errors were encountered: