-
-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
N5Store support of cloud buckets #540
Comments
Funny this came up in conversation today. Also relates to issue ( #395 ) and issue ( zarr-developers/n5py#9 ). |
I'm also affected by this issue -- I have n5 data on s3 (because that's what neuroglancer wants) but I'd also like to use zarr locally to access these data via s3fs. Concretely, what needs to be done to get this working? |
This is probably not a difficult thing to solve if you are interested. As implemented A similar thing may be needed for If we decide that we would rather not modify Thoughts? 🙂 |
as a non-maintainer, either one looks great to me :) I will start with the first strategy and see how far I get. |
Sounds good. Please let us know if there are other ways we can help 🙂 |
I like the first strategy, as well. Meta-question: what sorts of Store's do you expect? The reason I ask is that the various implementations of fsspec could be used in a single FilesystemStore, which detects which fssspec implementation to use. This avoids having a S3Store, GcsStore, LocalFsStore, etc. Is this a worthwhile approach? |
Indeed, see @rabernat PR: #373 (comment) -- I was going to try a quick implementation this weekend, happy to collaborate though! |
Let me know what you come up with @mzjp2 . It's also on my radar, but I don't know when. |
Has anyone made progress on n5+http? I like @jakirkham's suggested approach, but figure my time would be better spent testing/tweaking someone else's code if it's hiding in a branch somewhere. |
It is on my list, but also happy to review rather than write! |
it's on my list too :) but i haven't gotten to it yet |
With #546 in the works, let me get my head around what we want here. @alimanfoo has the good idea of re-writing If I understand correctly, The code would look like:
|
Sure, that would work fine. However, it would also be really nice to have top level |
Side note: Neuroglancer now supports directly zarr files, so that might be a good option for you! |
Thanks for the heads up. I'd like to migrate to that, as it'd be a bit more efficient than my current setup! I have a cloudvolume+igneous deployment going now -- the existing igneous code+config sets up downscaling pyramids out-of-the-box, which is super helpful and I'd have to replicate (maybe not hard... just time). |
@martindurant : sorry, I missed the comment in #546 -- this isn't fully implemented yet, correct? But rather "NestedDirectoryStore support of cloud buckets", right? |
Correct, FSStore allows nesting by specifying the key separator. I don't know what more N5 needs, but presumably it could use of of these stores. |
Ok. Re-opening this issue then. |
I got FSStore-backed n5 writing working in my own repo: https://github.com/janelia-cosem/fibsem-tools/blob/master/src/fibsem_tools/io/storage.py I created a class
Beyond these two changes, I could recycle a lot of the existing key transformation logic from I need this for my own work so I haven't put together a PR yet. Would the changes I made to |
👍
I don't know the impact of this offhand, but if tests are passing...
From my POV, having them would be great!
Since NestedDirectoryStore can't deal with remote access, I think we minimally need an FSStore version which can handle N5. (I had some hope that we wouldn't need to subclass and instead could use composition, but I haven't looked into it further.) I think the question is whether or not we maintain NDS or if it gets deprecated. |
I originally wrote FSStore to optionally contain N5 also, but it proved too unwieldy. I don't think there's a problem to have multiple store implementations. I haven't had a look at the changes yet, but passing the tests is a good indication. |
speaking of passing tests... I noticed that some of the tests for fsstore are incomplete, and you can't actually read from a "/"-separated fsstore without the aforementioned changes to |
Please flesh out as you see fit; if this justifies the changes in FSStore, all the better. |
I found myself trying to open a remote N5 this morning. I had completely forgotten about this issue. 🤦 |
@perlman see https://github.com/janelia-cosem/fibsem-tools/blob/master/src/fibsem_tools/io/storage.py#L223 Ultimately I want to take this code out of my own repo and get it here... |
This has been mentioned elsewhere, but it may be better instead of subclassing, to have a layered approach, i.e., a N5 storage class that takes another storage class as an arguments. |
And to be clear: code that works now is a great thing to have! |
Thanks @d-v-b, I'll give it a whirl. I might have a few cycles to work on moving this over (...I would certainly use it).
@martindurant: Is there example to follow, style-wise, for doing this within python-zarr? |
Not as far as I know. cc: @jakirkham who mentioned something similar to me as well in case he has an example. |
@d-v-b: would you like to do the honors? |
this should be addressed by #793 |
💯 |
The N5Store works great for storing local files. What are the technical challenges in moving to support something like GCS for storage? Other pieces of the zarr library seamlessly work if I move to gs:// prefixes (with the right packages installed), but N5Store ends up writing to the local filesystem when using that prefix.
Is it a matter of going through n5.py and storage.py and changing uses of the os library to uses of fsspec? Or are there operations that fundamentally won't work on cloud objects?
The text was updated successfully, but these errors were encountered: