-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a backend for rucio #300
Conversation
rucio.py
Outdated
@@ -0,0 +1,41 @@ | |||
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you accidentally uploaded this file twice. Can you remove this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks for the remark
strax/storage/rucio.py
Outdated
dirname = str(dirname) | ||
prefix = dirname_to_prefix(dirname) | ||
metadata_json = f'{prefix}-metadata.json' | ||
fn = rucio_path(metadata_json, dirname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you raise the same error as here if the md is not available. Strax may rely on this later:
https://github.com/AxFoundation/strax/blob/master/strax/storage/files.py#L215
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, furthermore, I had much more outputs with this error than without 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright thanks for adding it
strax/storage/rucio.py
Outdated
return json.loads(f.read()) | ||
|
||
def _read_chunk(self, dirname, chunk_info, dtype, compressor): | ||
#print('yes') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you forgot to remove this line ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry 😅
strax/storage/rucio.py
Outdated
"Cannot save directly into rucio, upload with admix instead") | ||
|
||
|
||
def rucio_path(filename, dirname): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bit is a bit hard to follow without comments (sorry I'm not an expert in the rucio naming convention). What do the paths look like and are we sure its always these hard-coded conversions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Te path look like ('/dali/lgrandi/rucio/xnt_008710/1f/fb/peaklets-b7dgmtzaef-000047'), It is a hard-coded convention on the Rucio code, you can see it here https://github.com/rucio/rucio/blob/671fe6253981eb632aae3c9ddfe54eb83e63fd1a/lib/rucio/rse/protocols/protocol.py#L114
strax/storage/rucio.py
Outdated
|
||
|
||
def rucio_path(filename, dirname): | ||
root_path ='/dali/lgrandi/rucio' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't hard-code it like this as this might change to e.g. '/dali/lgrandi/xenonnt/rucio' I guess you want to get the info from the dirname.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, dirname does not include the root path 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm okay. So how about this:
You pass it on at initialization:
https://github.com/XENONnT/straxen/blob/5b762f32eec7d3fb8a1a96e31ae504fe8c8cf3cd/straxen/rundb.py#L110
and set the default to /dali/lgrandi/rucio at the init of the rundb here:
https://github.com/XENONnT/straxen/blob/5b762f32eec7d3fb8a1a96e31ae504fe8c8cf3cd/straxen/rundb.py#L42
You would also have to change these lines:
Line 15 in 6404ae5
Line 25 in 6404ae5
fn = rucio_path(chunk_info['filename'], dirname) |
If you want some help I can also commit this idea to prevent hardcodes?
Thanks Sid, this would be an absolute wonderful feature to add! Can you address the comments and/or provide some more info to this PR (e.g. the rucio naming convention would be useful) |
File uploaded twice
Remonving a commented test
Thank you very much @jorana for reviewing my pr, I hope that I answered to all your remarks 🙂 |
If you'd use SidAhmedMa#1 you can prevent specifying the rootdir |
raise data not available if folder does not exist
fix codefactor
Hi Sid, I tested your implementation on dali and found some things about data not being available that are fixed by 52d6a99. Since this is working and a really nice addition I'm going to merge this PR. |
I see I thought that the test to see if the directory exists or not was done upstream before going to the backend, but it is probably also useful to do a new test. |
Hello everyone, given the particular structure of rucio, which for each file matches a sub-sub-folder, it was mandatory to create a new backend to manage this new structure,
In the new structure, we use strax.rucio as the backend and the database frontend, you will find a notebook below which shows the tests I performed.
Notebook: /dali/lgrandi/sahmedma/rucio-backend-test.ipynb