Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code separation for storage location and storage format #10

Closed
tischi opened this issue Oct 16, 2020 · 9 comments
Closed

Code separation for storage location and storage format #10

tischi opened this issue Oct 16, 2020 · 9 comments

Comments

@tischi
Copy link

tischi commented Oct 16, 2020

@axtimwalde

@joshmoore and I were thinking whether it would be possible (and make sense) to separate the code in terms of storage location and storage format.

So for example, one could have something like

new N5Reader( StorageLocation storageLocation, StorageFormat storageFormat )

with, e.g.,

enum StorageLocation {
   FileSystem
   AWSS3,
   GoogleCloud
}

enum StorageFormat {
   N5,
   Zarr
}

This would of course mainly make sense if one would manage within the N5Reader class to be able to mix and match the two things, thereby avoiding code duplication.

Not sure I am making sense?

Maybe @joshmoore can formulate this more professionally?

@bogovicj
Copy link
Contributor

@axtimwalde and I talked about this recently.

This would involve big changes, and so won't happen soon, but I agree its desirable and should be on the roadmap.

@joshmoore
Copy link

@bogovicj : do you have any idea of a workaround? Our hope was to be able to access IDR data in OME-Zarr on S3 via the N5 stack for I2K? (Wow. The acronyms abound!) My best guess would be to make a copy/fork of n5-aws-s3, add a dependency on n5-zarr, and migrate the N5ZarrReader use of DType etc. into https://github.com/saalfeldlab/n5-aws-s3/blob/master/src/main/java/org/janelia/saalfeldlab/n5/s3/N5AmazonS3Reader.java (or a subclass).

@bogovicj
Copy link
Contributor

Wow. The acronyms abound!

😂

make a copy/fork of n5-aws-s3, add a dependency on n5-zarr,

Either that or the reverse - fork n5-zarr and depend on n5-aws-s3. Not sure which would be less work in the end...

Were I to try to hack something together quickly, I think I might start with N5ZarrReader, rip out any filesystem calls, and try to replace with the appropriate stuff in N5AmazonS3Reader

@joshmoore
Copy link

Gotcha. The only other suggestion that has also been discussed on the Zarr side (zarr-developers/zarr-python#540) would be a wrapper strategy roughly of the form:

n5 = new N5ZarrWrapper(new N5AmazonS3Reader())

@axtimwalde
Copy link
Collaborator

axtimwalde commented Oct 16, 2020

I think the most mature path forward is to introduce a separate interface for KeyValueStores implement the N5Reader and N5Writer interfaces for various dialects (N5, Zarr, BrainMaps, Boss, ...) on top of this. This could capture file-system and cloud storage but would not be appropriate for HDF5 which is fine but noteworthy. Also, while this sounds trivial at first glance it becomes a bit icky when considering the differences in how the various backends lock (or don't lock), i.e. the currently straight forward file-system logic will become clunkier. This is mainly why I haven't touched it yet. Skipping file-system and doing it only for cloud-storage interfaces defeats the purpose...

@joshmoore
Copy link

@axtimwalde : hmmm.... could you sketch out a bit of the inheritance hierarchy that you'd see? Or is the design of that hierarchy part of the problem and therefore needs more time?

Is there any short-term path forward that you wouldn't consider monsterous?

@axtimwalde
Copy link
Collaborator

As a short term hack, I would plug the AWS S3 access logic into a copy of n5-zarr as @bogovicj sugested, and add alternative readers and writers to the n5-zarr repo. It's not too much work and will be consistent with people using N5 through the N5Reader and N5Writer interfaces which they should. This would also cover all practical use cases that we currently have.

joshmoore added a commit to joshmoore/n5-zarr that referenced this issue Oct 20, 2020
…bclass

Until there are interfaces, wrappers, or similar composable objects
which allow having both S3 access and Zarr reading, the N5S3ZarrReader
allows a client to explicitly choose the combination of both. This
commit copies methods in order to having a working version of the
reader. Future clean up includes extracting methods to a helper class,
introducing abstract methods to reduce repeated code blocks, and similar.

see: saalfeldlab/n5-aws-s3#10 (comment)
@joshmoore
Copy link

I've opened saalfeldlab/n5-zarr#5 as a draft. It's possible that this should never be merged and instead should exist in a separate branch and/or repository.

@bogovicj
Copy link
Contributor

Closing this issue now that AmazonS3KeyValueAccess exists for the purpose of achieving this separation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants