Can't create big endian dtypes in V3 array #2324

rabernat · 2024-10-09T14:53:55Z

This works with V2 data:

zarr.create(shape=10, dtype=">i2", zarr_version=2)
# -> <Array memory://4413530368 shape=(10,) dtype=>i2>

But raises for V3

zarr.create(shape=10, dtype=">i2", zarr_version=3)

File ~/gh/zarr-developers/zarr-python/src/zarr/codecs/__init__.py:40, in _get_default_array_bytes_codec(np_dtype)
     37 def _get_default_array_bytes_codec(
     38     np_dtype: np.dtype[Any],
     39 ) -> BytesCodec | VLenUTF8Codec | VLenBytesCodec:
---> 40     dtype = DataType.from_numpy(np_dtype)
     41     if dtype == DataType.string:
     42         return VLenUTF8Codec()

File ~/gh/zarr-developers/zarr-python/src/zarr/core/metadata/v3.py:599, in DataType.from_numpy(cls, dtype)
    581     return DataType.bytes
    582 dtype_to_data_type = {
    583     "|b1": "bool",
    584     "bool": "bool",
   (...)
    597     "<c16": "complex128",
    598 }
--> 599 return DataType[dtype_to_data_type[dtype.str]]

KeyError: '>i2'

In the V3 spec, endianness is now handled by a codec: https://zarr-specs.readthedocs.io/en/latest/v3/codecs/bytes/v1.0.html

Xarray tests create data with big endian dtypes, and Zarr needs to know how to handle them.

The text was updated successfully, but these errors were encountered:

d-v-b · 2024-10-09T15:06:18Z

If the codecs are unspecified, then I think we could automatically parametrize the BytesCodec based on the dtype. If the codecs are specified and the BytesCodec endianness doesn't match the endianness of the data, then we raise an exception.

But a bigger problem is that, by making endianness a serialization detail, the zarr dtype model has diverged from the numpy dtype model. If our array object uses zarr v3 data type semantics, then zarr.create(..., dtype=">i2") will return an array with dtype <i2 + a special bytes codec. From the POV of functions like np.array_like, this zarr array will not have its "real" dtype; users might be surprised to see that zarr.create(..., dtype=">i2") and zarr.create(..., dtype="<i2") returns arrays with the same dtype. I don't see an easy solution to this.

rabernat · 2024-10-12T12:22:25Z

One solution could be to always translate the endianness of the on-disk data to the endianness of the in-memory data. This could be done within BytesCodec. However, it would be hard, since endianness is not part of ArraySpec.

dstansby · 2024-12-30T17:44:16Z

Looks like this either needs resolving, or documenting as a breaking change at #2596 for zarr 3

normanrz · 2025-01-07T18:48:02Z

Should we put endianness in the new runtime ArrayConfig? We could parse the dtype to set it.

jhamman · 2025-01-08T00:07:17Z

I've moved this to "After 3.0.0" and will be adding this to the work in progress section of the v3 migration docs.

astrofrog · 2025-02-01T14:08:01Z

I'm running into this too - just to check, is this something that is going to be fixed in the 3.0.x series of releases, or is it a breaking change that will not be changed that we should adjust existing code to?

d-v-b · 2025-02-01T14:17:00Z

I think we intend to fix this, but it will force us to revise the semantics of the Array.dtype attribute. The alternative to handling endianness the way users expect is unacceptable IMO.

rabernat mentioned this issue Oct 9, 2024

Fill value fixes for V3 TomAugspurger/xarray#1

Merged

This was referenced Nov 1, 2024

Monthly issue metrics report #2455

Closed

Monthly issue metrics report sanketverma1704/zarr-python#3

Open

rabernat mentioned this issue Nov 1, 2024

Invalid Datatype ('>f8') when trying to convert kerchunk reference to icechunk reference. earth-mover/icechunk#367

Closed

jbusecke mentioned this issue Nov 1, 2024

Tracking issue for Nov presentation jbusecke/esgf-virtual-zarr-data-access#15

Open

LDeakin mentioned this issue Nov 5, 2024

(feat): minimum working codec pipeline ilan-gold/zarrs-python#19

Merged

dstansby added the bug Potential issues with the zarr-python library label Dec 28, 2024

dstansby added this to the 3.0.0 milestone Dec 28, 2024

jhamman modified the milestones: 3.0.0, After 3.0.0 Jan 8, 2025

bendichter mentioned this issue Jan 9, 2025

[Feature]: Support zarr-python v3 hdmf-dev/hdmf-zarr#202

Open

3 tasks

nenb mentioned this issue Jan 23, 2025

Prototype of new DType interface #2750

Draft

abarciauskas-bgse mentioned this issue Feb 10, 2025

Manifest arrays use arrayv3metadata zarr-developers/VirtualiZarr#429

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't create big endian dtypes in V3 array #2324

Can't create big endian dtypes in V3 array #2324

rabernat commented Oct 9, 2024

d-v-b commented Oct 9, 2024

rabernat commented Oct 12, 2024

dstansby commented Dec 30, 2024

normanrz commented Jan 7, 2025

jhamman commented Jan 8, 2025

astrofrog commented Feb 1, 2025

d-v-b commented Feb 1, 2025

Can't create big endian dtypes in V3 array #2324

Can't create big endian dtypes in V3 array #2324

Comments

rabernat commented Oct 9, 2024

d-v-b commented Oct 9, 2024

rabernat commented Oct 12, 2024

dstansby commented Dec 30, 2024

normanrz commented Jan 7, 2025

jhamman commented Jan 8, 2025

astrofrog commented Feb 1, 2025

d-v-b commented Feb 1, 2025