Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redirect TFile::Open() calls to XRootD protocol if possible #11644

Merged
merged 1 commit into from
Nov 11, 2022

Conversation

amadio
Copy link
Member

@amadio amadio commented Oct 28, 2022

In a meeting in IT I saw that ~200PB/year are being read from the eoshome instance via fuse, and I thought that maybe ROOT at least could try to do something smart and just redirect fuse traffic to XRootD if possible. The patch in this pull request does just that. When I talked to Axel about making this work, we wanted the redirection to not happen when using file://file.root, but it created some inconsistencies when using TFile::Open vs using TChain so the final implementation always attempts to redirect.

Below is a demo session of ROOT opening a file on EOS with (a former version of) the patch:

$ pwd
/eos/home-a/amadio
$ ls *.root
Run2012BC_DoubleMuParked_Muons.root
$ root.exe
   ------------------------------------------------------------------
  | Welcome to ROOT 6.27/01                        https://root.cern |
  | (c) 1995-2022, The ROOT Team; conception: R. Brun, F. Rademakers |
  | Built for linuxx8664gcc on Oct 28 2022, 09:49:15                 |
  | From heads/redirect-xrootd@v6-09-01-24773-gd85df4c5e9            |
  | With c++ (GCC) 8.5.0 20210514 (Red Hat 8.5.0-15)                 |
  | Try '.help'/'.?', '.demo', '.license', '.credits', '.quit'/'.q'  |
   ------------------------------------------------------------------

root [0] auto f1 = TFile::Open("file://Run2012BC_DoubleMuParked_Muons.root");
root [1] auto f2 = TFile::Open("Run2012BC_DoubleMuParked_Muons.root");
root [2] f1->GetName()
(const char *) "Run2012BC_DoubleMuParked_Muons.root"
root [3] f2->GetName()
(const char *) "root://eoshome-a.cern.ch//eos/user/a/amadio/Run2012BC_DoubleMuParked_Muons.root"
root [4] .q
$

and with a slightly modified df102_NanoAODDimuonAnalysis tutorial to avoid TChain:

void df102_NanoAODDimuonAnalysis(const char* filename)
 {
    // Enable multi-threading
    ROOT::EnableImplicitMT();
 
-   // std::cout << "Using filename: " << filename << std::endl;
+   auto f = TFile::Open(filename);
+   auto t = f->Get<TTree>("Events");
 
-   ROOT::RDataFrame df("Events", filename);
+   ROOT::RDataFrame df(*t);

I got the following:

$ time ./dimuon file://Run2012BC_DoubleMuParked_Muons.root 
Info in <TCanvas::Print>: pdf file dimuon_spectrum.pdf has been created
Events with exactly two muons: pass=31104343   all=61540413   -- eff=50.54 % cumulative eff=50.54 %
Muons with opposite charge: pass=24067843   all=31104343   -- eff=77.38 % cumulative eff=39.11 %
13.34
$ time ./dimuon Run2012BC_DoubleMuParked_Muons.root 
Info in <TCanvas::Print>: pdf file dimuon_spectrum.pdf has been created
Events with exactly two muons: pass=31104343   all=61540413   -- eff=50.54 % cumulative eff=50.54 %
Muons with opposite charge: pass=24067843   all=31104343   -- eff=77.38 % cumulative eff=39.11 %
8.81
$

Note: With the final version of the patch, instead of using file://, which now does not disable the redirection anymore, one should add TFile.CrossProtocolRedirects: no to their rootrc configuration file in order to disable this feature (I checked that it works).

@amadio amadio self-assigned this Oct 28, 2022
@amadio amadio requested a review from pcanal as a code owner October 28, 2022 15:29
@phsft-bot
Copy link
Collaborator

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac1015/cxx17, mac11/cxx14, windows10/cxx14
How to customize builds

@amadio amadio changed the title WIP: Redirect TFile::Open() calls to XRootD protocol if possible (DO NOT MERGE) Draft: Redirect TFile::Open() calls to XRootD protocol if possible (DO NOT MERGE) Oct 28, 2022
@amadio amadio marked this pull request as draft October 28, 2022 15:31
@amadio amadio changed the title Draft: Redirect TFile::Open() calls to XRootD protocol if possible (DO NOT MERGE) Redirect TFile::Open() calls to XRootD protocol if possible (DO NOT MERGE) Oct 28, 2022
io/io/src/TFile.cxx Outdated Show resolved Hide resolved
@phsft-bot
Copy link
Collaborator

Build failed on mac1015/cxx17.
Running on macitois21.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Errors:

  • [2022-10-28T16:17:33.522Z] FAILED: io/io/CMakeFiles/RIO.dir/src/TFile.cxx.o
  • [2022-10-28T16:17:33.781Z] /Volumes/HD2/build/workspace/root-pullrequests-build/root/io/io/src/TFile.cxx:4042:14: error: no matching function for call to 'getxattr'

@phsft-bot
Copy link
Collaborator

Build failed on mac11/cxx14.
Running on macphsft20.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Errors:

  • [2022-10-28T16:20:59.168Z] FAILED: io/io/CMakeFiles/RIO.dir/src/TFile.cxx.o
  • [2022-10-28T16:20:59.489Z] /Users/sftnight/build/workspace/root-pullrequests-build/root/io/io/src/TFile.cxx:4042:14: error: no matching function for call to 'getxattr'

@eguiraud
Copy link
Member

it just so happens that the file:// gets stripped if you create an RDataFrame("MyTree", "file://file.root") and then the redirection always happens

Guilherme tells me that it's actually TChain that strips the file:// from the filename. @pcanal could it not do that? 😄

@phsft-bot
Copy link
Collaborator

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac1015/cxx17, mac11/cxx14, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Collaborator

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac1015/cxx17, mac11/cxx14, windows10/cxx14
How to customize builds

@amadio
Copy link
Member Author

amadio commented Oct 31, 2022

Please clang-format as well...

@amadio
Copy link
Member Author

amadio commented Oct 31, 2022

Guilherme tells me that it's actually TChain that strips the file:// from the filename. @pcanal could it not do that? smile

The first call to TFile::Open happens here:

#0  0x00007ffff580b7c0 in TFile::Open(char const*, char const*, char const*, int, int)@plt ()
   from libTree.so
#1  0x00007ffff5856d3a in TChain::LoadTree (this=0x1027b50, entry=0) at core/base/inc/TString.h:244
#2  0x00007ffff58545d2 in TChain::GetListOfBranches (this=0x1027b50) at tree/tree/src/TChain.cxx:1116
#3  0x00007ffff3ee03a6 in (anonymous namespace)::GetBranchNamesImpl (t=..., bNamesReg=std::set with 0 elements, 
    bNames=std::vector of length 0, capacity 0, analysedTrees=std::set with 1 element = {...}, friendName="", 
    allowDuplicates=true) at tree/dataframe/src/RLoopManager.cxx:142
#4  0x00007ffff3ee0e7c in ROOT::Internal::RDF::GetBranchNames[abi:cxx11](TTree&, bool) (t=..., 
    allowDuplicates=<optimized out>) at tree/dataframe/src/RLoopManager.cxx:341
#5  0x00007ffff3ee0f1d in ROOT::Detail::RDF::RLoopManager::GetBranchNames[abi:cxx11]() (this=0x16502a0)
    at tree/dataframe/src/RLoopManager.cxx:1015
#6  0x000000000046121e in ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager, void>::Filter(std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >) ()
#7  0x000000000045e376 in df102_NanoAODDimuonAnalysis(char const*) ()
#8  0x000000000045eb60 in main ()

by this time, the file:// prefix is already stripped.

@eguiraud
Copy link
Member

eguiraud commented Oct 31, 2022

The file:// prefix is lost here:

ParseTreeFilename(name, basename, tn, query, suffix, kFALSE);

(this is in TChain::AddFile but TChain::Add also ends up here). basename is the file name stripped of the file:// prefix and it is what's used to form the file name that's passed to the TChainElement constructor later (which is what stores the information about each sub-tree in the chain).

It looks like special-casing file:// is a deliberate choice, here's where just for file:// we take the URL without the protocol:

filename = (strcmp(url.GetProtocol(), "file")) ? url.GetUrl() : url.GetFileAndOptions();

@amadio
Copy link
Member Author

amadio commented Oct 31, 2022

Thanks for the detailed analysis, @eguiraud! I will see what happens when remove the special case to not strip the protocol part. Should be fine, since this goes into TFile::Open but better get confirmation from @pcanal as well.

@amadio
Copy link
Member Author

amadio commented Oct 31, 2022

Removing the special casing leads to this:

execve("./df", ["./df", "Run2012BC_DoubleMuParked_Muons.r"...], 0x7ffd482bdf58 /* 49 vars */) = 0
getxattr("file:Run2012BC_DoubleMuParked_Muons.root", "eos.url.xroot", NULL, 0) = -1 ENOENT (No such file or directory)
access("/eos/home-a/amadio/Run2012BC_DoubleMuParked_Muons.root", F_OK) = 0
access("/eos/home-a/amadio/Run2012BC_DoubleMuParked_Muons.root", R_OK) = 0
openat(AT_FDCWD, "/eos/home-a/amadio/Run2012BC_DoubleMuParked_Muons.root", O_RDONLY) = 4
...

So that breaks the redirection, since now file: gets added somewhere... 😞

@phsft-bot
Copy link
Collaborator

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac1015/cxx17, mac11/cxx14, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Collaborator

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac1015/cxx17, mac11/cxx14, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Collaborator

Build failed on ROOT-ubuntu2004/python3.
Running on root-ubuntu-2004-1.cern.ch:/home/sftnight/build/workspace/root-pullrequests-build
See console output.

Errors:

  • [2022-11-02T09:07:09.867Z] FAILED: io/io/CMakeFiles/RIO.dir/src/TFile.cxx.o
  • [2022-11-02T09:07:10.125Z] /home/sftnight/build/workspace/root-pullrequests-build/root/io/io/src/TFile.cxx:4044:54: error: invalid conversion from ‘const void*’ to ‘void*’ [-fpermissive]

@phsft-bot
Copy link
Collaborator

Build failed on ROOT-debian10-i386/soversion.
Running on pcepsft10.dyndns.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Errors:

  • [2022-11-02T09:10:31.230Z] /home/sftnight/build/workspace/root-pullrequests-build/root/io/io/src/TFile.cxx:4044:54: error: invalid conversion from ‘const void*’ to ‘void*’ [-fpermissive]

@phsft-bot
Copy link
Collaborator

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac1015/cxx17, mac11/cxx14, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Collaborator

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac1015/cxx17, mac11/cxx14, windows10/cxx14
How to customize builds

@amadio
Copy link
Member Author

amadio commented Nov 10, 2022

@Axel-Naumann I've updated the patch to use TUrl and always redirect. I updated the code comment and release notes to match. The way to disable the redirection then will be to set TFile.CrossProtocolRedirects to 0 in rootrc. Now the behavior should be consistent between plain TFile::Open calls and TChain.

@pcanal I tried moving the code after line 4057 and using expandedUrl but it started showing errors like this when using TChain:

Error in <TNetXNGFile::Open>: [ERROR] Server responded with an error: [3001] Required argument not present

So I moved it back where it was before and use only the input url to TFile::Open as is. I tested that this now works both for plain TFile::Open calls as well as with TChain and the redirection always happens as long as the file being opened is on EOS. The problem was that the exapandedUrl sometimes contained only the base EOS management URL, which broke it.

@amadio amadio requested a review from pcanal November 10, 2022 10:24
@pcanal
Copy link
Member

pcanal commented Nov 10, 2022

The problem was that the exapandedUrl sometimes contained only the base EOS management URL, which broke it.

Can you give a concrete illustration?

io/io/src/TFile.cxx Outdated Show resolved Hide resolved
@phsft-bot
Copy link
Collaborator

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac12/noimt, mac11/cxx14, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Collaborator

Build failed on mac12/noimt.
Running on macphsft18.dyndns.cern.ch:/Users/sftnight/build/jenkins/workspace/root-pullrequests-build
See console output.

Warnings:

  • [2022-11-11T10:43:29.869Z] ld: warning: -undefined dynamic_lookup may not work with chained fixups
  • [2022-11-11T10:43:37.506Z] ld: warning: -undefined dynamic_lookup may not work with chained fixups
  • [2022-11-11T10:43:39.679Z] ld: warning: -undefined dynamic_lookup may not work with chained fixups
  • [2022-11-11T10:43:51.484Z] ld: warning: -undefined dynamic_lookup may not work with chained fixups
  • [2022-11-11T10:44:05.475Z] ld: warning: -undefined dynamic_lookup may not work with chained fixups

Failing tests:

And 58 more

@phsft-bot
Copy link
Collaborator

Build failed on ROOT-ubuntu2004/python3.
Running on root-ubuntu-2004-1.cern.ch:/home/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@amadio
Copy link
Member Author

amadio commented Nov 11, 2022

The problem was that the exapandedUrl sometimes contained only the base EOS management URL, which broke it.

Can you give a concrete illustration?

Here is a session with strace with the redirection disabled:

$ strace ./dimuon Run2012BC_DoubleMuParked_Muons.root 2>&1 | grep Muons
execve("./dimuon", ["./dimuon", "Run2012BC_DoubleMuParked_Muons.r"...], 0x7ffd2e39de38 /* 49 vars */) = 0
access("/eos/home-a/amadio/Run2012BC_DoubleMuParked_Muons.root", F_OK) = 0
access("/eos/home-a/amadio/Run2012BC_DoubleMuParked_Muons.root", R_OK) = 0
openat(AT_FDCWD, "/eos/home-a/amadio/Run2012BC_DoubleMuParked_Muons.root", O_RDONLY) = 4
lstat("/eos/home-a/amadio/Run2012BC_DoubleMuParked_Muons.root", {st_mode=S_IFREG|0644, st_size=2244449133, ...}) = 0
lstat("/eos/home-a/amadio/Run2012BC_DoubleMuParked_Muons.root", {st_mode=S_IFREG|0644, st_size=2244449133, ...}) = 0
access("/eos/home-a/amadio/Run2012BC_DoubleMuParked_Muons.root", F_OK) = 0
access("/eos/home-a/amadio/Run2012BC_DoubleMuParked_Muons.root", R_OK) = 0
openat(AT_FDCWD, "/eos/home-a/amadio/Run2012BC_DoubleMuParked_Muons.root", O_RDONLY) = 5
lstat("/eos/home-a/amadio/Run2012BC_DoubleMuParked_Muons.root", {st_mode=S_IFREG|0644, st_size=2244449133, ...}) = 0
lstat("/eos/home-a/amadio/Run2012BC_DoubleMuParked_Muons.root", {st_mode=S_IFREG|0644, st_size=2244449133, ...}) = 0
access("/eos/home-a/amadio/Run2012BC_DoubleMuParked_Muons.root", F_OK) = 0
access("/eos/home-a/amadio/Run2012BC_DoubleMuParked_Muons.root", R_OK) = 0
openat(AT_FDCWD, "/eos/home-a/amadio/Run2012BC_DoubleMuParked_Muons.root", O_RDONLY) = 5
lstat("/eos/home-a/amadio/Run2012BC_DoubleMuParked_Muons.root", {st_mode=S_IFREG|0644, st_size=2244449133, ...}) = 0
lstat("/eos/home-a/amadio/Run2012BC_DoubleMuParked_Muons.root", {st_mode=S_IFREG|0644, st_size=2244449133, ...}) = 0
write(1, "Muons with opposite charge: pass"..., 97Muons with opposite charge: pass=24067843   all=31104343   -- eff=77.38 % cumulative eff=39.11 %

one with the redirection enabled:

$ strace ./dimuon Run2012BC_DoubleMuParked_Muons.root 2>&1 | grep Muons
execve("./dimuon", ["./dimuon", "Run2012BC_DoubleMuParked_Muons.r"...], 0x7ffe7c400bd8 /* 49 vars */) = 0
getxattr("Run2012BC_DoubleMuParked_Muons.root", "eos.url.xroot", NULL, 0) = 79
getxattr("Run2012BC_DoubleMuParked_Muons.root", "eos.url.xroot", "root://eoshome-a.cern.ch//eos/us"..., 79) = 79
getxattr("Run2012BC_DoubleMuParked_Muons.root", "eos.url.xroot", NULL, 0) = 79
getxattr("Run2012BC_DoubleMuParked_Muons.root", "eos.url.xroot", "root://eoshome-a.cern.ch//eos/us"..., 79) = 79
getxattr("Run2012BC_DoubleMuParked_Muons.root", "eos.url.xroot", NULL, 0) = 79
getxattr("Run2012BC_DoubleMuParked_Muons.root", "eos.url.xroot", "root://eoshome-a.cern.ch//eos/us"..., 79) = 79
write(1, "Muons with opposite charge: pass"..., 97Muons with opposite charge: pass=24067843   all=31104343   -- eff=77.38 % cumulative eff=39.11 %

and finally, one with the code moved to the place you suggested (using expandedUrl):

$ strace ./dimuon Run2012BC_DoubleMuParked_Muons.root 2>&1 | grep Muons
execve("./dimuon", ["./dimuon", "Run2012BC_DoubleMuParked_Muons.r"...], 0x7ffccf746158 /* 49 vars */) = 0
getxattr("Run2012BC_DoubleMuParked_Muons.root", "eos.url.xroot", NULL, 0) = 25
getxattr("Run2012BC_DoubleMuParked_Muons.root", "eos.url.xroot", "root://eoshome-a.cern.ch/", 25) = 25
access("/eos/home-a/amadio/Run2012BC_DoubleMuParked_Muons.root", F_OK) = 0
access("/eos/home-a/amadio/Run2012BC_DoubleMuParked_Muons.root", R_OK) = 0
openat(AT_FDCWD, "/eos/home-a/amadio/Run2012BC_DoubleMuParked_Muons.root", O_RDONLY) = 12
lstat("/eos/home-a/amadio/Run2012BC_DoubleMuParked_Muons.root", {st_mode=S_IFREG|0644, st_size=2244449133, ...}) = 0
lstat("/eos/home-a/amadio/Run2012BC_DoubleMuParked_Muons.root", {st_mode=S_IFREG|0644, st_size=2244449133, ...}) = 0
getxattr("Run2012BC_DoubleMuParked_Muons.root", "eos.url.xroot", NULL, 0) = 79
getxattr("Run2012BC_DoubleMuParked_Muons.root", "eos.url.xroot", "root://eoshome-a.cern.ch//eos/us"..., 79) = 79
getxattr("Run2012BC_DoubleMuParked_Muons.root", "eos.url.xroot", NULL, 0) = 79
getxattr("Run2012BC_DoubleMuParked_Muons.root", "eos.url.xroot", "root://eoshome-a.cern.ch//eos/us"..., 79) = 79
write(1, "Muons with opposite charge: pass"..., 97Muons with opposite charge: pass=24067843   all=31104343   -- eff=77.38 % cumulative eff=39.11 %

It is a bit puzzling (doesn't always happen), but as you can see, in the case using expandedUrl, sometimes the URL returned is only the base management URL, and the file may be opened via FUSE. This doesn't always happen, but I thought I'd be on the safe side and avoid potential problems.

@amadio
Copy link
Member Author

amadio commented Nov 11, 2022

More puzzling, here are two consecutive runs without changing anything (using expandedUrl):

lcgapp-centos8-physical amadio $ cat test.C
#include "TFile.h"

void test(const char* filename, const char* options)
{
  auto f = TFile::Open(filename, options);
}

int main(int argc, char **argv)
{
   test(argv[1], argv[2]);
}

$ cat ~/.rootrc && strace ./test test.root 2>&1 | grep "test.root"
TFile.CrossProtocolRedirects: yes
execve("./test", ["./test", "test.root"], 0x7fffd10a02b8 /* 49 vars */) = 0
getxattr("test.root", "eos.url.xroot", NULL, 0) = 54
getxattr("test.root", "eos.url.xroot", "root://eoshome-a.cern.ch//eos/us"..., 54) = 54
access("/eos/home-a/amadio/test.root", F_OK) = 0
access("/eos/home-a/amadio/test.root", R_OK) = 0
openat(AT_FDCWD, "/eos/home-a/amadio/test.root", O_RDONLY) = 11
lstat("/eos/home-a/amadio/test.root", {st_mode=S_IFREG|0644, st_size=2244449133, ...}) = 0
lstat("/eos/home-a/amadio/test.root", {st_mode=S_IFREG|0644, st_size=2244449133, ...}) = 0

$ cat ~/.rootrc && strace ./test test.root 2>&1 | grep "test.root"
TFile.CrossProtocolRedirects: yes
execve("./test", ["./test", "test.root"], 0x7fff277b6488 /* 49 vars */) = 0
getxattr("test.root", "eos.url.xroot", NULL, 0) = 53
getxattr("test.root", "eos.url.xroot", "root://eoshome-a.cern.ch//eos/us"..., 53) = 53

So better keep the code where it is, as I never observe this sort of thing in that case.

@amadio
Copy link
Member Author

amadio commented Nov 11, 2022

For the record, the build will fail because some tests based on reference files now have some unexpected warning:

+ld: warning: -undefined dynamic_lookup may not work with chained fixups
+ld: warning: -undefined dynamic_lookup may not work with chained fixups

These are not caused by my changes, so I will merge this if those are the only failures.

@amadio amadio requested a review from pcanal November 11, 2022 11:53
@Axel-Naumann
Copy link
Member

Yup sorry I'm on it.

@pcanal
Copy link
Member

pcanal commented Nov 11, 2022

This doesn't always happen, ...

This actually quite concerning ... In the use case you show, there should be not practical difference as the filename should be strictly the same before and after the expand. (As opposed to load "/eos/home-a/amadio/Run2012BC_DoubleMuParked_Muons.root" vs "$MY_EOS_DIR/Run2012BC_DoubleMuParked_Muons.root" where the redirection would work only for the 1st one without the expansion).

In the first strace, I see:

getxattr("Run2012BC_DoubleMuParked_Muons.root", "eos.url.xroot", NULL, 0) = 79
getxattr("Run2012BC_DoubleMuParked_Muons.root", "eos.url.xroot", "root://eoshome-a.cern.ch//eos/us"..., 79) = 79

vs

getxattr("Run2012BC_DoubleMuParked_Muons.root", "eos.url.xroot", NULL, 0) = 25
getxattr("Run2012BC_DoubleMuParked_Muons.root", "eos.url.xroot", "root://eoshome-a.cern.ch/", 25) = 25

which "seems" to mean that goven the exact same input getxattr returns different result. Why would that be?

More puzzling, here are two consecutive runs without changing anything (using expandedUrl):

That seems "consistent" with the "inconsistent" behavior I just talked about. So there is something wrong about expandfilename that we need to fix since:

(a) it is semantically necessary to call it
(b) it behaves oddly in this case but also probably also in other case without us knowing.

so since we have a running failing example of the odd behavior we may as well investigate it now.

@amadio
Copy link
Member Author

amadio commented Nov 11, 2022

This doesn't always happen, ...

This actually quite concerning ... In the use case you show, there should be not practical difference as the filename should be strictly the same before and after the expand. (As opposed to load "/eos/home-a/amadio/Run2012BC_DoubleMuParked_Muons.root" vs "$MY_EOS_DIR/Run2012BC_DoubleMuParked_Muons.root" where the redirection would work only for the 1st one without the expansion).

In the first strace, I see:

getxattr("Run2012BC_DoubleMuParked_Muons.root", "eos.url.xroot", NULL, 0) = 79
getxattr("Run2012BC_DoubleMuParked_Muons.root", "eos.url.xroot", "root://eoshome-a.cern.ch//eos/us"..., 79) = 79

vs

getxattr("Run2012BC_DoubleMuParked_Muons.root", "eos.url.xroot", NULL, 0) = 25
getxattr("Run2012BC_DoubleMuParked_Muons.root", "eos.url.xroot", "root://eoshome-a.cern.ch/", 25) = 25

which "seems" to mean that given the exact same input getxattr returns different result. Why would that be?

Good question, and I'm not sure. This seems to be a problem with EOS (see below). @apeters1971 Do you know what could cause this? To reproduce, all you need is to copy a file with a different name, then call getfattr on it and it will return the wrong URL, at least the first time (but potentially several times):

$ cp hsimple.root mytest.root
$ getfattr -n eos.url.xroot hsimple.root && getfattr -n eos.url.xroot mytest.root
# file: hsimple.root
eos.url.xroot="root://eoshome-a.cern.ch//eos/user/a/amadio/hsimple.root"

# file: mytest.root
eos.url.xroot="root://eoshome-a.cern.ch/"

So I guess we could move the code after the expansion after all, as the problem is not coming from ROOT, but from EOS. When the wrong URL is returned, all that happens is that the opening via XRootD fails and it falls back to use FUSE, so I guess it should still be fine to merge this, if only for allowing people to test more easily once ROOT master is installed somewhere in CVMFS.

@phsft-bot
Copy link
Collaborator

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac12/noimt, mac11/cxx14, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Collaborator

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac12/noimt, mac11/cxx14, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Collaborator

Build failed on mac12/noimt.
Running on macphsft18.dyndns.cern.ch:/Users/sftnight/build/jenkins/workspace/root-pullrequests-build
See console output.

Warnings:

  • [2022-11-11T15:32:39.611Z] ld: warning: -undefined dynamic_lookup may not work with chained fixups
  • [2022-11-11T15:32:45.647Z] ld: warning: -undefined dynamic_lookup may not work with chained fixups
  • [2022-11-11T15:32:47.228Z] ld: warning: -undefined dynamic_lookup may not work with chained fixups
  • [2022-11-11T15:33:00.691Z] ld: warning: -undefined dynamic_lookup may not work with chained fixups
  • [2022-11-11T15:33:15.753Z] ld: warning: -undefined dynamic_lookup may not work with chained fixups

Copy link
Member

@pcanal pcanal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@amadio amadio dismissed pcanal’s stale review November 11, 2022 16:39

I've moved the code to the suggested place as the error I saw is something in EOS (can be reproduced without ROOT being involved). I'd like to merge to let others test more widely before the release by using an installation of ROOT master from CVMFS.

@amadio
Copy link
Member Author

amadio commented Nov 11, 2022

Ah! Perfect timing, Philippe! Thanks! Merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants