Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Top level V3 API #1884

Merged
merged 18 commits into from
Jun 7, 2024
Merged

Conversation

jhamman
Copy link
Member

@jhamman jhamman commented May 16, 2024

Marking as Work In Progress for now. Currently waiting on #1670 to merge.

Closes #1598
Fixes #1019

Open questions I could use feedback on:

  1. I've combined zarr.creation and zarr.convenience into a common API module. This makes sense to me because they are all basically wrappers that help us construct Group/Array objects. What people think about this?
  2. I've created parallel synchronous and asynchronous modules with basically identical APIs. All the synchronous functions simply call their asynchronous cousin. I like this much more than having two versions of each function (e.g. ones and async_ones). Thoughts?

More soon.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

cc @d-v-b, @aldenks

@jhamman jhamman mentioned this pull request May 16, 2024
4 tasks
@jhamman jhamman marked this pull request as ready for review May 30, 2024 06:55
@jhamman jhamman requested a review from d-v-b June 1, 2024 05:46
@jhamman
Copy link
Member Author

jhamman commented Jun 1, 2024

This is ready for an initial review. To set expectations, I don't plan to address the following optional arguments in this PR:

  • chunk_store
  • cache_attrs
  • synchronizer
  • meta_array
  • storage_options
  • dimension_separator
  • write_empty_chunks

These will need to be handled in later PRs.

@@ -35,6 +35,7 @@
Selection = slice | SliceSelection
ZarrFormat = Literal[2, 3]
JSON = None | str | int | float | Enum | dict[str, "JSON"] | list["JSON"] | tuple["JSON", ...]
MEMORY_ORDER = Literal["C", "F"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MEMORY_ORDER = Literal["C", "F"]
MemoryOrder = Literal["C", "F"]

make_store_path,
)

ShapeLike = tuple[int, ...] # TODO: support int for shape
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use ChunkCoords here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but maybe not for long. I went ahead and changed it but ShapeLike was intended to be int | tuple[int, ...]... Perhaps that will work for ChunkCoords too?

return shape, chunks


def _like_args(a: ArrayLike, kwargs: dict[str, Any]) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of mutating kwargs, can we copy it, and make changes to the copy, and return it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. This function was coped from v2 but is now refactored along the lines of what you were suggesting.

@d-v-b
Copy link
Contributor

d-v-b commented Jun 4, 2024

I added a few minor comments, none of which are blockers. A few high-level thoughts:

A lot of the routines zarr.convenience seem strange to me, e.g. save has a very weird function signature that we should probably give a second thought when the dust settles from the v3 release. But changes like that are out of scope for this work.

I've combined zarr.creation and zarr.convenience into a common API module. This makes sense to me because they are all basically wrappers that help us construct Group/Array objects. What people think about this?

This works for me.

I've created parallel synchronous and asynchronous modules with basically identical APIs. All the synchronous functions simply call their asynchronous cousin. I like this much more than having two versions of each function (e.g. ones and async_ones). Thoughts?

Definitely prefer what we have here over async_ones. I don't see any drawbacks to this organization, since presumably we want everything to rest on the async foundation, but I don't have much experience with libraries that expose async and sync apis.

@will-moore
Copy link

Hi, thanks for all the work on this.
I'm looking at using this from https://github.com/ome/ome-zarr-py and seeing this error when accessing via Dask:

  File "/Users/wmoore/Desktop/ZARR/ome-zarr-py/ome_zarr/io.py", line 177, in load
    return da.from_zarr(self.__store, subpath)
  File "/Users/wmoore/opt/anaconda3/envs/zarr_v3/lib/python3.10/site-packages/dask/array/core.py", line 3597, in from_zarr
    z = zarr.Array(url, read_only=True, path=component, **kwargs)
TypeError: Array.__init__() got an unexpected keyword argument 'read_only'

Is read_only another argument to add to the list above at #1884 (comment) ?

@jhamman
Copy link
Member Author

jhamman commented Jun 6, 2024

Hi @will-moore -- interesting to find out that Dask is using the array constructor directly. I would have expected to see zarr.array(...). This PR is covering the API methods formerly found in zarr.convenience and zarr.creation. We'll have to do a bit of work to handle the case you've shown here.

btw, zarr.array(url, read_only_true, path=component) should work with this PR.

@jhamman
Copy link
Member Author

jhamman commented Jun 6, 2024

I'm planning to merge this in the morning unless I get further reviews.

@jhamman jhamman merged commit ee30347 into zarr-developers:v3 Jun 7, 2024
18 checks passed
dcherian added a commit to dcherian/zarr-python that referenced this pull request Jun 25, 2024
* v3: (22 commits)
  [v3] `Buffer` ensure correct subclass based on the `BufferPrototype` argument (zarr-developers#1974)
  Fix doc build (zarr-developers#1987)
  Fix doc build warnings (zarr-developers#1985)
  Automatically generate API reference docs (zarr-developers#1918)
  Update `RemoteStore.__str__` and add UPath tests (zarr-developers#1964)
  [v3] Elevate codec pipeline (zarr-developers#1932)
  0 dim arrays: indexing (zarr-developers#1980)
  `parse_shapelike` allows 0 (zarr-developers#1979)
  Clean up typing and docs for indexing (zarr-developers#1961)
  add json indentation to config (zarr-developers#1952)
  chore: update pre-commit hooks (zarr-developers#1973)
  Bump pypa/gh-action-pypi-publish in the actions group (zarr-developers#1969)
  chore: update pre-commit hooks (zarr-developers#1957)
  Update release.rst (zarr-developers#1960)
  doc: update release notes for 3.0.0.alpha (zarr-developers#1959)
  Basic working FsspecStore (zarr-developers#1785)
  Feature: Top level V3 API (zarr-developers#1884)
  Buffer Prototype Argument (zarr-developers#1910)
  Create issue-metrics.yml
  fixes bug in transpose (zarr-developers#1949)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

port top level api to v3 branch. [v3] consider using position only and keyword only arguments
4 participants