-
-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nested storage detection in Zarr V2 #707
Comments
Thoughts on the name of the separator: Therefore the following suggestions:
|
One issue with the detection protocol is that it does not work when writing. While auto-detection is nice it might be better to just assume "." unless the metadata key is present --- any existing arrays could be fixed up, and hopefully all future arrays would have the metadata key. That avoids the added complexity (and inefficiency) of auto-detection. |
Hi @jbms. You mean a In the case of a
Guess there are some edge cases I worry about here. Maybe there are a number of strategies that can be enabled:
The real question is likely to be what should be the default. |
@SabineEmbacher, @axtimwalde suggests "dimension separator". |
I used |
I did not make my suggestions regarding the name in order to enforce them. Please make a decision and I will put it in asap according to the specification. |
No worries, @SabineEmbacher. The suggestions were definitely useful. It's more just a matter of https://martinfowler.com/bliki/TwoHardThings.html ... Since On the community call last night, there were no objections to moving forward with the .zarray addition, so I'll open a v2 spec PR now. I'm a bit more hesitant about defining the heuristic as part of the specification (cf. @jbms comment) I'll leave this open for a discussion of where first-chunk writing falls on the MAY/SHOULD/MUST spectrum. |
Various implementations allow for defining the separator between the dimension indexes when writing chunks: * n5-zarr defines a `dimensionSeparator` parameter; * zarr-python's NestedDirectoryStore does so by default * and FSStore provides a `key_separator` parameter; * tensorstore has a `key_encoding` parameter; and * jzarr is looking to add the same functionality. When writing an array, it is straight-forward to set this separator and have arrays properly configured. Consumers of such arrays, however, must either know *a priori* if their arrays use a non-default separator or must loop through all possible chunks keys searching for the right one. By defining adding an optional metadata key to the .zarray, we: * preserve the efficient configuration of arrays * while keeping the v2 spec backwards compatible. The primary downsides are that this will be the first optional metadata value in the v2 spec and therefore we don't have a strong understanding of how that will play out, and datasets which were previously written with non-default separators will need updating in order to enable the detection though that is no worse than the current situation.
Various implementations allow for defining the separator between the dimension indexes when writing chunks: * n5-zarr defines a `dimensionSeparator` parameter; * zarr-python's NestedDirectoryStore does so by default * and FSStore provides a `key_separator` parameter; * tensorstore has a `key_encoding` parameter; and * jzarr is looking to add the same functionality. When writing an array, it is straight-forward to set this separator and have arrays properly configured. Consumers of such arrays, however, must either know *a priori* if their arrays use a non-default separator or must loop through all possible chunks keys searching for the right one. By defining adding an optional metadata key to the .zarray, we: * preserve the efficient configuration of arrays * while keeping the v2 spec backwards compatible. The primary downsides are that this will be the first optional metadata value in the v2 spec and therefore we don't have a strong understanding of how that will play out, and datasets which were previously written with non-default separators will need updating in order to enable the detection though that is no worse than the current situation.
* v2 spec: add optional dimension_separator (see #707) Various implementations allow for defining the separator between the dimension indexes when writing chunks: * n5-zarr defines a `dimensionSeparator` parameter; * zarr-python's NestedDirectoryStore does so by default * and FSStore provides a `key_separator` parameter; * tensorstore has a `key_encoding` parameter; and * jzarr is looking to add the same functionality. When writing an array, it is straight-forward to set this separator and have arrays properly configured. Consumers of such arrays, however, must either know *a priori* if their arrays use a non-default separator or must loop through all possible chunks keys searching for the right one. By defining adding an optional metadata key to the .zarray, we: * preserve the efficient configuration of arrays * while keeping the v2 spec backwards compatible. The primary downsides are that this will be the first optional metadata value in the v2 spec and therefore we don't have a strong understanding of how that will play out, and datasets which were previously written with non-default separators will need updating in order to enable the detection though that is no worse than the current situation. * Update dim. sep. description after feedback * Remove `MUST NOT` restriction for other keys
I consider the |
see: ome/ngff#29 and bcdev/jzarr#17
In order to better handle Zarr arrays created with
NestedDirectoryStorage
orFSStore(key_separator="/")
, SabineEmbacher and I have been working on a "protocol heuristic" that can be used by V2 implementations to detect nested chunking rather than requiring the user to specify it correctly.tl;dr: This proposes a new key for
.zarray
which it would be good to have feedback on.Proposal
When creating a zarr array:
{"dimension_separator": "/"}
When opening an array:
["/", "."]
)Points for discussion:
dimension_separator
differs from the code implementationkey_separator
to reduce confusion about whether every separator in the key name is effected.The text was updated successfully, but these errors were encountered: