-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Zarr compatibility functions #478
base: main
Are you sure you want to change the base?
Conversation
kerchunk/utils.py
Outdated
@@ -116,9 +140,32 @@ def rename_target_files( | |||
ujson.dump(new, f) | |||
|
|||
|
|||
def _encode_for_JSON(store): | |||
def zarr_init_group_and_store(store=None, zarr_version=None, overwrite=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth checking whether these new are appropriate for the public API of kerchunk (IMO, no. I'd recommend prefixing the function names with underscores or putting them in a _utils.py
.
kerchunk/utils.py
Outdated
@@ -116,9 +140,32 @@ def rename_target_files( | |||
ujson.dump(new, f) | |||
|
|||
|
|||
def _encode_for_JSON(store): | |||
def zarr_init_group_and_store(store=None, zarr_version=None, overwrite=True): | |||
zarr_version = zarr_version or 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify the semantics of zarr_version
here (and in zarr_open
)? What's the behavior of each of these cases?
zarr-python-library-version | zarr_version | behavior |
---|---|---|
2.x | None | write zarr v2 |
2.x | 2 | write zarr v2 |
2.x | 3 | error |
3.x | None | write zarr v2 or v3? |
3.x | 2 | write zarr v2 |
3.x | 3 | write zarr v3 |
Really it's just the case of zarr_version=None
with zarr-python 3.x that I'm unsure about. i.e. what's the default behavior: write zarr v2, or write whatever version is the default for that version of zarr-python?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also worth confirming that we do error for zarr_version=3
with zarr-python=2, to not silently ignore that keyword argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surprisingly, if you have zarrv2 installed and you set zarr_version=3
then zarr
will accept that, issue a warning, and give you a valid group. I need to re-run these tests with zarrv3 installed and see what happens in those cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Zarr2 (i.e., mainline released version) has had v3 internally for quite a while, but the implementation is different. It should still conform to the v3 spec, though!
Thank you for putting this together, you have clearly done a careful job (and thanks to @jhamman for the earlier attempt). This will take some time to go through. I would encourage you to also look at the implementation in fsspec.implementations.reference, where I expect there will be more zarr2-specific assumptions, particularly in the parquet implementation. |
Following @TomAugspurger 's comments, it may well make sense to have None mean V2 for now, whichever zarr version is installed |
In However in ZarrV3, you need to pass a |
No, this is not the model.
|
Sorry I didn't grok that. Here I see store = RemoteStore(url="abfs://...")
zarr.open(store=store)
Unless I'm missing it, passing a URL to def make_store_path(store_like: StoreLike | None, *, mode: OpenMode | None = None) -> StorePath:
if isinstance(store_like, StorePath):
if mode is not None:
assert mode == store_like.store.mode
return store_like
elif isinstance(store_like, Store):
if mode is not None:
assert mode == store_like.mode
return StorePath(store_like)
elif store_like is None:
if mode is None:
mode = "w" # exception to the default mode = 'r'
return StorePath(MemoryStore(mode=mode))
elif isinstance(store_like, str):
return StorePath(LocalStore(Path(store_like), mode=mode or "r"))
raise TypeError Did this work in ZarrV2? zarr.open(store="abfs://...")? |
Yes, this most certainly worked, and if it no longer does, I consider this a major regression and worse user experience. The optional argument
I suppose you are right and this works too - the lazy mapper is also dict-like. It's probably a case of my lazyness in the tests; it only works where the "files" are stored as raw binary, not as references (which is the case for the tests). |
cc @jhamman |
It doesn't work in ZarrV3 yet: # Missing storage_options but it's irrelevant
# this eventually tries to construct a StorePath(LocalStore, 'file://abfs:/daymet-zarr/daily/hi.zarr')
# and will throw because it's not writable
zarr.open(store="abfs://daymet-zarr/daily/hi.zarr")
Laziness can be good, and your tests could probably use the |
Here probably - explicitly expects .zarray in every path containing data. |
This is a continuation of #475 because I identified that there are several more behavior changes in zarr v3. Going forward, the
utils.py
file will abstract away the differences between ZarrV2 and ZarrV3 depending on what the user has installed.Note that @jhamman's PR #292 may now be outdated, though I'll merge those commits into this PR if its desired to expose a
zarr_version
to kerchunk callers.