Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zarr.open should fall back to opening a group #2310

Merged
merged 8 commits into from
Oct 10, 2024

Conversation

TomAugspurger
Copy link
Contributor

@TomAugspurger TomAugspurger commented Oct 8, 2024

Closes #2309

I'm not happy with this implementation. I broke / fixed this in my consolidated metadata PR as well.

Relying on the exception type from open_array to decide whether to fall back to opening a group seems pretty fragile. I might spend a few minutes trying out an alternative way of doing things that'll be a bit more robust (potentially at the cost of some unnecessary I/O; but I think that's OK for open since open_group and open_array offer more direct ways of doing things).

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

except KeyError:
except (KeyError, ValueError):
# KeyError for a missing key
# ValueError for failing to parse node metadata as an array when it's
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should create a specific exception for this case, since "valid document but the wrong node type" is different from "invalid document"

@TomAugspurger TomAugspurger marked this pull request as ready for review October 8, 2024 15:23
@TomAugspurger
Copy link
Contributor Author

I might spend a few minutes trying out an alternative way of doing things that'll be a bit more robust

https://github.com/TomAugspurger/zarr-python/blob/tom/fix/batched-io/src/zarr/core/metadata/_io.py has the start of that. I like the idea of using match statements and an explicit list of files to read. There's still a decent chunk of work to do (figure out which cases we know we files to read and error if any are missing, then discover the actual zarr_format and node_type based on what we read and parse into the correct objects, and finally updating open and hopefully open_array and open_group to use that or something similar). I don't think I'll pursue this now.

@TomAugspurger TomAugspurger added the downstream Downstream libraries using zarr label Oct 9, 2024
Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One test suggestion but otherwise this looks good to me.

tests/v3/test_api.py Show resolved Hide resolved
@jhamman jhamman added this to the 3.0.0.beta milestone Oct 9, 2024
@jhamman jhamman added the V3 label Oct 9, 2024
@jhamman jhamman mentioned this pull request Oct 9, 2024
6 tasks
@jhamman jhamman merged commit 395604d into zarr-developers:v3 Oct 10, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
downstream Downstream libraries using zarr
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Zarr v3.0.0a7 fail to open group dataset
3 participants