Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Support zarr-python v3 #202

Open
3 tasks done
rly opened this issue Jun 18, 2024 · 6 comments
Open
3 tasks done

[Feature]: Support zarr-python v3 #202

rly opened this issue Jun 18, 2024 · 6 comments
Assignees
Labels
category: enhancement improvements of code or code behavior
Milestone

Comments

@rly
Copy link
Contributor

rly commented Jun 18, 2024

What would you like to see added to HDMF-ZARR?

Zarr-Python is moving closer to a v3 release which has significant changes in the API that result in errors in hdmf-zarr. It is not clear how stable the current alpha release is right now. But we should add support for Zarr Python v3 soon. See #200 for more details.

Is your feature request related to a problem?

No response

What solution would you like?

^

Do you have any interest in helping implement the feature?

Yes, but I would need guidance.

Code of Conduct

@rly rly added the category: enhancement improvements of code or code behavior label Jun 18, 2024
@rly rly added this to the Future milestone Jun 18, 2024
@bendichter
Copy link
Contributor

Zarr-Python v3 is slated for release today (source)!

I was looking through the release notes for v3 and noticed some major changes, in particular this section of the Zarr v3 spec:

Scope reduction
In order to facilitate implementation of the core specification across different programming languages, the scope of the specification has been reduced relative to Zarr v2. Note in particular the following:

  • The set of core data types is reduced relative to Zarr v2. Only fixed size integer and floating point data types are defined, in addition to a Boolean data type and a fixed length raw data type. Any data types relating to storage of textual data have been left for definition within an extension specification. The “object” data type, which was never included within the Zarr v2 spec but which was implemented in the Python implementation, is also not included – it is suggested that a better approach be found for supporting variable length data types, and defined within a data type extension specification.

The Zarr team has also provided a guide for migrating from v2 to v3, with the following section:

The following features that were supported by Zarr-Python 2 have not been ported to Zarr-Python 3 yet:

  • Structured arrays / dtypes (#2134)
  • Fixed-length string dtypes (#2347)
  • Datetime and timedelta dtypes (#2616)
  • Object dtypes (#2617)
  • Ragged arrays (#2618)
  • Groups and Arrays do not implement __enter__ and __exit__ protocols (#2619)
  • Big Endian dtypes (#2324)
  • Default filters for object dtypes for Zarr format 2 arrays (#2627)

Any data types relating to storage of textual data have been left for definition within an extension specification

So if I am reading this right, does that mean that all strings must be in attributes? That's going to be a pretty big breaking change for us, as there are several places where we have string datasets which are required. I suppose we could use the raw type and use magic attributes? Either that or define an "extension specification." If they have a good formal way of creating these specifications we might also be able to use it for links and references, thus making our brand of zarr a bit more in-line with the official zarr standard, but this seems like it could be a nontrivial amount of work.

@oruebel
Copy link
Contributor

oruebel commented Jan 9, 2025

I was looking through the release notes for v3 and noticed some major changes, in particular this section of the Zarr v3 spec:

I agree, that reduction in scope will make things tricky for string datasets, compound datasets, and references. In particular for strings it would be great to not have to role our own custom solution, to both maintain interoperability and avoid having too much custom functionality.

Either that or define an "extension specification."

Yes, if "extension specifications" are possible then I think that would nice. Strings are a common need so one would hope that there will be standard extension for this.

If they have a good formal way of creating these specifications we might also be able to use it for links and references, thus making our brand of zarr a bit more in-line with the official zarr standard, but this seems like it could be a nontrivial amount of work.

Yes, that would be nice.

@oruebel
Copy link
Contributor

oruebel commented Jan 13, 2025

@alxmrs
Copy link

alxmrs commented Jan 23, 2025

Hey Ryan, hey Oliver. I'd be happy to join working on this. After reading a few issues, it seems like a Zarr v3 migration is planned before hdmf-zarr mergest with hdmf? I'd be happy to take a look at the migration issue to support "string datasets, compound datasets, and references".

WRT the extension option -- it would be cool to extend the nwb protocol to create a "NeuroZarr" extension! I'd be happy to assist with this as well (should it be the design choice we opt for).

Are either of you available to meet? Maybe it would be helpful to discuss the outstanding technical challenges synchronously.

@rly
Copy link
Contributor Author

rly commented Jan 24, 2025

Hi @alxmrs , thanks for your interest! I'll reach out to you by email to set up a quick meeting with myself, @oruebel , and @mavaylon1 to discuss the migration and how best we can work together. I see an email address on your github profile - should I reach you there?

@alxmrs
Copy link

alxmrs commented Jan 24, 2025

Yes! Thanks, that is perfect :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: enhancement improvements of code or code behavior
Projects
None yet
Development

No branches or pull requests

5 participants