-
-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v3] Fixed-width unicode string support in zarr v3 #2347
Comments
I'm not sure if I'll be able to work on this, but here's some notes on the V2 behavior, and some things: >>> import numpy as np
>>> import zarr
>>> import json
>>> b = np.array([b'a', b'bb', b'ccc'])
>>> u = np.array(['a', 'bb', 'ccc'])
>>> store = {}
>>> zarr.array(b, store=store, path="bytes", compressor=None)
>>> zarr.array(u, store=store, path="unicode", compressor=None)
>>> print(json.loads(store['bytes/.zarray'])['dtype'])
# |S3
>>> print(json.loads(store['unicode/.zarray'])['dtype'])
# <U3
assert store['bytes/0'] == b.tobytes()
assert store['unicode/0'] == u.tobytes() NumPy uses 32-bit UCS-4 codepoints for Unicode data ref. (I think that |
Given it doesn't look like this funcitonality will get into 3.0, it looks like this breaking change is something else to add to #2596 |
Leaving a link to this here as I did not find this on the page: https://hackmd.io/@ivirshup/SkdO2szas |
Zarr version
v3
Numcodecs version
na
Python Version
na
Operating System
na
Installation
na
Description
Mentioned in #2323 (comment), right now we can't create a fixed-width string dtype in zarr v3.
We would want the NumPy dtype of that array to be
U3
, a fixed-width unicode string dtype. We'd want to support this in addition to the variable width strings being used currently. Some initial questions I don't know the answer to:data_type
shows up in the metadata?Steps to reproduce
.
Additional output
No response
The text was updated successfully, but these errors were encountered: