Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

key space for chunk manifest #71

Open
d-v-b opened this issue Apr 3, 2024 · 1 comment
Open

key space for chunk manifest #71

d-v-b opened this issue Apr 3, 2024 · 1 comment
Labels
zarr-python Relevant to zarr-python upstream

Comments

@d-v-b
Copy link

d-v-b commented Apr 3, 2024

if I understand correctly, the keys of the chunk manifest are chunk ids, which are strings like "0.0.0". If a zarr array has a chunk size of [10, 10, 10], then we can think of "0.0.0" as denoting the cube of indices [[0...9], [0...9], [0...9]], which in turn denotes the region in the virtual array associated with the data in a chunk.

If a Zarr array is sliced arbitrarily, then some of its chunks might be sliced before inserting into the Zarr array, in which case the region associated with that (chunk , slice) combination is not the full cube of indices associated with the string chunk id, but a subse of those indices (and the space of the subsets is determined by the slicing operations that are allowed).

So I wonder what would break in virtualizarr if the name of a chunk was changed from the chunk ID string to a sliceable expression of the output region associated with that chunk (and if the slicing operation was also expressible as the value associated with that key). This would define an abstract representation of the output region : chunk relationship that could be used internally in Zarr for lazy slicing, which I would like a lot :) If it's not workable to make this change, it would be great to know why so we can avoid it.

@TomNicholas
Copy link
Member

TomNicholas commented Apr 29, 2024

I'm still thinking about this suggestion @d-v-b , but one problem is that really VirtualiZarr's array type should expose a .chunks property that supports variable-length chunks (see #38).

Your statement

If a zarr array has a chunk size of [10, 10, 10], then we can think of "0.0.0" as denoting the cube of indices [[0...9], [0...9], [0...9]]

is only true if you assume chunks are all the same length.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
zarr-python Relevant to zarr-python upstream
Projects
None yet
Development

No branches or pull requests

2 participants