You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
if I understand correctly, the keys of the chunk manifest are chunk ids, which are strings like "0.0.0". If a zarr array has a chunk size of [10, 10, 10], then we can think of "0.0.0" as denoting the cube of indices [[0...9], [0...9], [0...9]], which in turn denotes the region in the virtual array associated with the data in a chunk.
If a Zarr array is sliced arbitrarily, then some of its chunks might be sliced before inserting into the Zarr array, in which case the region associated with that (chunk , slice) combination is not the full cube of indices associated with the string chunk id, but a subse of those indices (and the space of the subsets is determined by the slicing operations that are allowed).
So I wonder what would break in virtualizarr if the name of a chunk was changed from the chunk ID string to a sliceable expression of the output region associated with that chunk (and if the slicing operation was also expressible as the value associated with that key). This would define an abstract representation of the output region : chunk relationship that could be used internally in Zarr for lazy slicing, which I would like a lot :) If it's not workable to make this change, it would be great to know why so we can avoid it.
The text was updated successfully, but these errors were encountered:
I'm still thinking about this suggestion @d-v-b , but one problem is that really VirtualiZarr's array type should expose a .chunks property that supports variable-length chunks (see #38).
Your statement
If a zarr array has a chunk size of [10, 10, 10], then we can think of "0.0.0" as denoting the cube of indices [[0...9], [0...9], [0...9]]
is only true if you assume chunks are all the same length.
if I understand correctly, the keys of the chunk manifest are chunk ids, which are strings like
"0.0.0"
. If a zarr array has a chunk size of[10, 10, 10]
, then we can think of"0.0.0" as
denoting the cube of indices[[0...9], [0...9], [0...9]]
, which in turn denotes the region in the virtual array associated with the data in a chunk.If a Zarr array is sliced arbitrarily, then some of its chunks might be sliced before inserting into the Zarr array, in which case the region associated with that (chunk , slice) combination is not the full cube of indices associated with the string chunk id, but a subse of those indices (and the space of the subsets is determined by the slicing operations that are allowed).
So I wonder what would break in virtualizarr if the name of a chunk was changed from the chunk ID string to a sliceable expression of the output region associated with that chunk (and if the slicing operation was also expressible as the value associated with that key). This would define an abstract representation of the
output region : chunk
relationship that could be used internally in Zarr for lazy slicing, which I would like a lot :) If it's not workable to make this change, it would be great to know why so we can avoid it.The text was updated successfully, but these errors were encountered: