Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treatment of final smaller chunk in the zarr model #38

Open
TomNicholas opened this issue Mar 15, 2024 · 2 comments
Open

Treatment of final smaller chunk in the zarr model #38

TomNicholas opened this issue Mar 15, 2024 · 2 comments
Labels
Kerchunk Relating to the kerchunk library / specification itself zarr-specs Requires adoption of a new ZEP

Comments

@TomNicholas
Copy link
Member

TomNicholas commented Mar 15, 2024

The ManifestArray code right now might not be treating carefully enough the possibility of the last chunk being smaller than the rest. For example if you concatenate two arrays that each have uniform chunks except for the final chunk, you could end up with an array that has variable-length chunks in the middle somewhere 😕

One solution to this might be to generalise ManifestArray.chunks to be able to represent rectilinear arrays of chunks (a step towards #33), but just have a check on the constructor to forbid creating an array that actually has any chunk other than the last be of different length. This check could then be relaxed in future once we have variable-length chunks supported upstream in zarr.

@TomNicholas TomNicholas added zarr-specs Requires adoption of a new ZEP Kerchunk Relating to the kerchunk library / specification itself labels Mar 26, 2024
@TomNicholas
Copy link
Member Author

We need some test cases to start explicitly checking behaviour for these cases. Unfortunately kerchunk doesn't seem to have any (see fsspec/kerchunk#436). This is a little worrying because this seems like an important set of edge cases to check, so we should try to find / create some files like this ourselves and add them to the test suite of this library.

@TomNicholas
Copy link
Member Author

have a check on the constructor to forbid creating an array that actually has any chunk other than the last be of different length.

A better place to have this check would probably be in the serialization step. It makes perfect sense to have a ManifestArray that points to variable-length chunks, the problem is just that neither of the available on-disk formats (kerchunk or zarr) support variable-length chunks yet. (A warning instead of an error in the constructor would be more appropriate.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Kerchunk Relating to the kerchunk library / specification itself zarr-specs Requires adoption of a new ZEP
Projects
None yet
Development

No branches or pull requests

1 participant