You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Mentioned in #2323 (comment), right now we can't create a fixed-width string dtype in zarr v3.
In [1]: importzarrIn [2]: arr=zarr.create(shape=(3,), dtype="U3")
In [3]: arr[:] = ['a', 'bb', 'ccc']
In [4]: arr[:]
Out[4]: array(['a', 'bb', 'ccc'], dtype=StringDType())
We would want the NumPy dtype of that array to be U3, a fixed-width unicode string dtype. We'd want to support this in addition to the variable width strings being used currently. Some initial questions I don't know the answer to:
What data_type shows up in the metadata?
What codecs are needed?
How are the actual bytes stored? In parquet, fixed_len_byte_array is one of the primitive types.
Steps to reproduce
.
Additional output
No response
The text was updated successfully, but these errors were encountered:
NumPy uses 32-bit UCS-4 codepoints for Unicode data ref. (I think that len(u.tobytes()) is something like 4 bytes per character * 3, since that's the fixed width). For bytes data, it uses the ASCII values padded with null bytes to the fixed width.
Zarr version
v3
Numcodecs version
na
Python Version
na
Operating System
na
Installation
na
Description
Mentioned in #2323 (comment), right now we can't create a fixed-width string dtype in zarr v3.
We would want the NumPy dtype of that array to be
U3
, a fixed-width unicode string dtype. We'd want to support this in addition to the variable width strings being used currently. Some initial questions I don't know the answer to:data_type
shows up in the metadata?Steps to reproduce
.
Additional output
No response
The text was updated successfully, but these errors were encountered: