Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support ome-ngff on s3 #209

Open
d-v-b opened this issue Feb 13, 2024 · 5 comments
Open

support ome-ngff on s3 #209

d-v-b opened this issue Feb 13, 2024 · 5 comments
Labels
enhancement New feature or request NGFF OME-NGFF (OME-Zarr format)
Milestone

Comments

@d-v-b
Copy link

d-v-b commented Feb 13, 2024

It would be great if this library could access ome-ngff data stored on AWS S3, google GCS, etc. Since you have fsspec in your dependencies, I think this would be doable without drastic changes.

  • this logic uses os.path, so you would need to use something that's agnostic to the storage backend. But it's not clear why you even need to check if the path exists -- Zarr will do that for you, once you choose a store class.
  • you are using DirectoryStore. There are two issues with this: first, DirectoryStore is not recommended for anything because it puts all the chunks in a single directory and, with too many chunks, that directory becomes unwieldy. Consider using NestedDirectoryStore as a replacement. Second, if you have fsspec installed along with zarr, then zarr can transparently store stuff on cloud backends with FSStore. FSStore will even automatically parse a string like s3://bucket-name/foo.zarr and pick the correct backend (s3, in this case).

If there's appetite, I could submit a PR (I didn't see a pre-existing one, but maybe I didn't look too closely).

@ziw-liu
Copy link
Collaborator

ziw-liu commented Feb 13, 2024

Hi @d-v-b and thanks for your suggestions!

  1. Checking that the path exists manually is only to provide a more readable error message (and a shorter traceback). This can also be achieved by catching storage errors.

  2. We are using DirectoryStore with dimension separator "/". In practice this should be equivalent as NestedDirectoryStore, since that is exactly what it does under the hood (subclassing DirectoryStore).

  3. We would love to have support for FSStore! The last time I tried it I had difficulties with the dimension separator (which may or may not be related to other "/" issues we noticed during development of this library, e.g. Resizing an array overwrites the dimension separator metadata zarr-developers/zarr-python#1533). We also didn't have a real use case for cloud storage (and endpoints to test against). If you want to submit a PR, note that we plan to release the first 'stable' version very soon (I was planning for today actually) so it will likely not be included. But of course cloud storage support is well worth dedicating a v0.2 version number. Also note that we are undergoing a major API refactor with breaking changes to implement a universal API across file formats. The refactor for the NGFF part will start in a few weeks.

@ziw-liu ziw-liu added enhancement New feature or request NGFF OME-NGFF (OME-Zarr format) labels Feb 13, 2024
@d-v-b
Copy link
Author

d-v-b commented Feb 13, 2024

We are using DirectoryStore with dimension separator "/". In practice this should be equivalent as NestedDirectoryStore, since that is exactly what it does under the hood (subclassing DirectoryStore).

Perfect! I actually forgot that DirectoryStore takes a dimension_separator kwarg 😅

As for testing against cloud storage, I think you can look at the zarr tests for an example. Personally I don't have experience with this part of the test suite, so I can't be more concrete about how to mock the s3 resources (i'm guessing it's via moto? but I don't know for sure).

And thank your for reminding me about this zarr issue: zarr-developers/zarr-python#1533, I will look into whether I can figure out a solution for that. I will hold off on a PR here until your refactor lands. With that being said, should I leave this open until FSStore support gets in?

@ziw-liu
Copy link
Collaborator

ziw-liu commented Feb 13, 2024

And thank your for reminding me about this zarr issue: zarr-developers/zarr-python#1533, I will look into whether I can figure out a solution for that.

I fixed that in zarr-developers/zarr-python#1540. You reviewed and merged it (Thanks!). (edit: it is still broken for iohub because there has been no zarr release after that.)

With that being said, should I leave this open until FSStore support gets in?

👍

@d-v-b
Copy link
Author

d-v-b commented Feb 14, 2024

You reviewed and merged it (Thanks!). (edit: it is still broken for iohub because there has been no zarr release after that.)

Lol, I totally forgot about this. 😆

I'm not sure when the next Zarr release will be, but if you make some noise over in the zarr issue tracker you might get an answer / motivate people to prepare a release.

@ziw-liu
Copy link
Collaborator

ziw-liu commented Feb 15, 2024

New Zarr release is out! Will bump soon.

@ziw-liu ziw-liu added this to the 0.2.0 milestone Feb 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request NGFF OME-NGFF (OME-Zarr format)
Projects
None yet
Development

No branches or pull requests

2 participants