Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

open() fails for path to a v2 group in a RemoteStore #2444

Closed
agoodm opened this issue Oct 29, 2024 · 3 comments
Closed

open() fails for path to a v2 group in a RemoteStore #2444

agoodm opened this issue Oct 29, 2024 · 3 comments
Labels
bug Potential issues with the zarr-python library

Comments

@agoodm
Copy link
Contributor

agoodm commented Oct 29, 2024

Zarr version

v3.0.0b

Numcodecs version

n/a

Python Version

3.11

Operating System

Mac

Installation

pip install -e .

Description

Using the standard open() to open a v2 group inside a RemoteStore fails.

Steps to reproduce

In [1]: import zarr

In [2]: zarr.open("s3://mur-sst/zarr/", zarr_version=2, storage_options=dict(anon=True))
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[2], line 1
----> 1 zarr.open("s3://mur-sst/zarr/", zarr_version=2, storage_options=dict(anon=True))

File ~/dev/zarr-python/src/zarr/_compat.py:43, in _deprecate_positional_args.<locals>._inner_deprecate_positional_args.<locals>.inner_f(*args, **kwargs)
     41 extra_args = len(args) - len(all_args)
     42 if extra_args <= 0:
---> 43     return f(*args, **kwargs)
     45 # extra_args > 0
     46 args_msg = [
     47     f"{name}={arg}"
     48     for name, arg in zip(kwonly_args[:extra_args], args[-extra_args:], strict=False)
     49 ]

File ~/dev/zarr-python/src/zarr/api/synchronous.py:77, in open(store, mode, zarr_version, zarr_format, path, **kwargs)
     67 @_deprecate_positional_args
     68 def open(
     69     store: StoreLike | None = None,
   (...)
     75     **kwargs: Any,  # TODO: type kwargs as valid args to async_api.open
     76 ) -> Array | Group:
---> 77     obj = sync(
     78         async_api.open(
     79             store=store,
     80             mode=mode,
     81             zarr_version=zarr_version,
     82             zarr_format=zarr_format,
     83             path=path,
     84             **kwargs,
     85         )
     86     )
     87     if isinstance(obj, AsyncArray):
     88         return Array(obj)

File ~/dev/zarr-python/src/zarr/core/sync.py:141, in sync(coro, loop, timeout)
    138 return_result = next(iter(finished)).result()
    140 if isinstance(return_result, BaseException):
--> 141     raise return_result
    142 else:
    143     return return_result

File ~/dev/zarr-python/src/zarr/core/sync.py:100, in _runner(coro)
     95 """
     96 Await a coroutine and return the result of running it. If awaiting the coroutine raises an
     97 exception, the exception will be returned.
     98 """
     99 try:
--> 100     return await coro
    101 except Exception as ex:
    102     return ex

File ~/dev/zarr-python/src/zarr/api/asynchronous.py:308, in open(store, mode, zarr_version, zarr_format, path, storage_options, **kwargs)
    305     return await open_group(store=store_path, zarr_format=zarr_format, mode=mode, **kwargs)
    307 try:
--> 308     return await open_array(store=store_path, zarr_format=zarr_format, **kwargs)
    309 except (KeyError, NodeTypeValidationError):
    310     # KeyError for a missing key
    311     # NodeTypeValidationError for failing to parse node metadata as an array when it's
    312     # actually a group
    313     return await open_group(store=store_path, zarr_format=zarr_format, **kwargs)

File ~/dev/zarr-python/src/zarr/api/asynchronous.py:1066, in open_array(store, zarr_version, zarr_format, path, storage_options, **kwargs)
   1063 zarr_format = _handle_zarr_version_or_format(zarr_version=zarr_version, zarr_format=zarr_format)
   1065 try:
-> 1066     return await AsyncArray.open(store_path, zarr_format=zarr_format)
   1067 except FileNotFoundError:
   1068     if store_path.store.mode.create:

File ~/dev/zarr-python/src/zarr/core/array.py:670, in AsyncArray.open(cls, store, zarr_format)
    647 """
    648 Async method to open an existing Zarr array from a given store.
    649 
   (...)
    667 <AsyncArray memory://... shape=(100, 100) dtype=int32>
    668 """
    669 store_path = await make_store_path(store)
--> 670 metadata_dict = await get_array_metadata(store_path, zarr_format=zarr_format)
    671 # TODO: remove this cast when we have better type hints
    672 _metadata_dict = cast(ArrayV3MetadataDict, metadata_dict)

File ~/dev/zarr-python/src/zarr/core/array.py:135, in get_array_metadata(store_path, zarr_format)
    131     zarray_bytes, zattrs_bytes = await gather(
    132         (store_path / ZARRAY_JSON).get(), (store_path / ZATTRS_JSON).get()
    133     )
    134     if zarray_bytes is None:
--> 135         raise FileNotFoundError(store_path)
    136 elif zarr_format == 3:
    137     zarr_json_bytes = await (store_path / ZARR_JSON).get()

FileNotFoundError: <RemoteStore(S3FileSystem, mur-sst/zarr)>

So the issue here is the open() generally defaults to open via open_array() and then switches to open_group() if the correct exception is handled. Sure enough this works if using open_group() or open_consolidated():

In [3]: zarr.open_consolidated("s3://mur-sst/zarr/", zarr_version=2, storage_options=dict(anon=True))
Out[3]: <Group <RemoteStore(S3FileSystem, mur-sst/zarr)>>

In [4]: zarr.open_group("s3://mur-sst/zarr/", zarr_version=2, storage_options=dict(anon=True))
Out[4]: <Group <RemoteStore(S3FileSystem, mur-sst/zarr)>>

Additional output

No response

@agoodm agoodm added the bug Potential issues with the zarr-python library label Oct 29, 2024
@d-v-b
Copy link
Contributor

d-v-b commented Oct 29, 2024

the short-term fix is to handle the FileNotFound error here:

try:
return await open_array(store=store_path, zarr_format=zarr_format, **kwargs)
except (KeyError, NodeTypeValidationError):
# KeyError for a missing key
# NodeTypeValidationError for failing to parse node metadata as an array when it's
# actually a group
return await open_group(store=store_path, zarr_format=zarr_format, **kwargs)

Longer term, we should decide if "zarr.json wasn't found" amounts to a KeyError or FileNotFoundError, and then ensure that this exception is used consistently.

I started working on fixing this, it's turning out to be a bit thornier than I expected, mostly because we were not properly testing this behavior previously. I will have a PR up in a few days.

@jhamman
Copy link
Member

jhamman commented Oct 29, 2024

I've changed some of this logic in #2442. Once that is a bit more baked, I can see if this is still an issue.

@d-v-b
Copy link
Contributor

d-v-b commented Oct 30, 2024

@agoodm I tested your script in #2447 and it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
None yet
Development

No branches or pull requests

3 participants