-
-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v3] V2 Codec pipeline is not consistent with legacy usage of filters
#2325
Comments
also cc @normanrz who knows the v2 pipeline path the best |
Thanks for reporting this. I wasn't aware, that zarr-python 2 worked with using compressors, such as |
In v2, there is no conceptual difference between filters and codecs. To reiterate, the pipeline actually run is |
Should we bring this API back to V3 as a convenience / compatibility layer? We could inspect the arguments to filters + compressors and translate them into a proper V3 codec pipeline. |
i'm not sure it is generally possible to map a v2 |
In terms of breaking API changes, this is one of the biggest ones from V2 to V3. Folks who depend on zarr python have existing code that creates compressors and filters.
We could try! codecs = [v2_to_v3_codec(filt) for filt in filters] + [v2_to_v3_codec(compressor)]
Let's leave that out of scope. |
(with check for filters or compressor being None) This will work. If there's some dataset that defines these such that an array codec tries to act on the output of a bytes codec, we can warn maybe, and understand that they were relying on bytes buffers even where we think a codec produces arrays. |
filters
filters
closed by #2325 |
Zarr version
3.0.0a8
Numcodecs version
0.13.0
Python Version
3.11
Operating System
Mac
Installation
pip in virtual environment
Description
I am reading a kerchunk reference filesystem as a zarr v2 store with zarr python. The entire reference file is attached, but an example
.zarray
is as follows:Notably, the
filters
containszlib
which is aBytesBytesCodec
in zarr-python v3+. The issue comes, when we look at the codec pipeline created for V2 arrays:zarr-python/src/zarr/core/array.py
Lines 105 to 113 in 9bce890
This defines the pipeline as two codecs,
filters
andcompressor
. The problem here is thatV2Filters
is defined as aArrayArrayCodec
andV2Compressor
is defined as aArrayBytesCodec
. Because of this, all codecs defined in the metadata asfilters
are expected to beArrayArrayCodecs
and applied once the buffer is translated to an array. Further, thecompressor
can only define one codecs that is anArrayBytesCodec
, which leaves no place to define aBytesBytesCodec
in v2 metadata.With the current
.zarray
above, the pipeline crashes because thezlib
codec outputs bytes and not an array as is expected.test_dict.json
cc @jhamman @martindurant
Steps to reproduce
Additional output
No response
The text was updated successfully, but these errors were encountered: