Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

port top level api to v3 branch. #1598

Closed
4 tasks done
jhamman opened this issue Dec 7, 2023 · 8 comments · Fixed by #1884
Closed
4 tasks done

port top level api to v3 branch. #1598

jhamman opened this issue Dec 7, 2023 · 8 comments · Fixed by #1884
Assignees
Labels
design discussion help wanted Issue could use help from someone with familiarity on the topic
Milestone

Comments

@jhamman
Copy link
Member

jhamman commented Dec 7, 2023

Zarr's top level API is the primary entrypoint for users to interact with Zarr objects. Stuff like:

  • zarr.open
  • zarr.zeros, zarr.ones, zarr.empty
  • zarr.load, zarr.save
  • zarr.array, zarr.group

This API needs to be wired up to the new Array and Group classes in the v3 dev branch. We should definitely attempt to keep these all close to the same API but v3 will give us the chance to rework these if there is a strong reason to do so.

@jhamman jhamman added help wanted Issue could use help from someone with familiarity on the topic V3 labels Dec 7, 2023
@joshmoore
Copy link
Member

joshmoore commented Dec 8, 2023

if there is a strong reason to do so

Perhaps related to #1597, there have definitely been comments in the community that the current methods taken together can be confusing. If supporting this API too substantially impacts the new API / h5py-ness, then perhaps import zarr.v3_compat as zarr with a best effort on these methods would be an option.

@aldenks
Copy link
Contributor

aldenks commented Jan 18, 2024

I'm planning to pick this up! we talked about this in the most recent zarr-python refactor bi-weekly meeting. I don't have permissions to assign myself but anyone feel free to assign me.

@aldenks
Copy link
Contributor

aldenks commented Jan 31, 2024

I'm going slower than I'd like on this but I have my dev env set up and am getting moving for real

@aldenks
Copy link
Contributor

aldenks commented Mar 7, 2024

As I've started working on this im realizing a question about the goal of these top level methods. In v3 will their main purpose be to

  1. be the primary way users interact with the library (the case now)
  2. be the smoothest on ramp for v2 library users into v3, while Array.xxx and Group.xxx become the primary entrypoints.

If it's 1, I think we'd want to make these pretty thin wrappers around the .create and .open methods on the v3 sync Array and Group. That would imply more breaking api changes (mostly different names and types of args) and bring consistency between top level methods and those on Array and Group.

If 2, and the primary/suggested api is Array.create(..), Array.open(..), Group.open(..) etc. then I think it's worth it to do as much translating from the current v2 api of these top level functions into the arguments to the new methods on v3 Array and Group and be pretty lenient (with logged warnings) if arguments are passed that are not used in v3

In the roadmap doc it says "the primary user facing API will be synchronous Array and Group" which sounds like option 2 fairly clearly. I like that personally, but it will also mean a good amount of retraining of folks who are used to reaching for zarr.xxx.

Has this already been decided? @jhamman

@aldenks
Copy link
Contributor

aldenks commented Mar 13, 2024

Organizing some questions I'll share in a few mins at the bi-weekly refactor meeting

  • In the v3 library, are the top level methods
    • the primary way users interact with the library? or
    • the smoothest on ramp for v2 library users into v3? (and Array.xxx and Group.xxx become primary)
    • something else?
  • The kwargs to any method that can create an array are currently v2 specific (filters vs codecs, etc), plus there are a number of performance/behavior modifying args (cache_[metadata|attrs], partial_decompress, write_empty_chunks, dim separator). Do we
    • try not to change the api at all and try to translate into spec V3 land
    • make the kwargs actually match those of the Array.xxx v3 library methods, but also take in **v2_kwargs and translate where possible, checking for conflicts with v3 kwargs if provided
    • make the top level methods kwargs align with those of Array.create, etc
  • Currently zarr.open and similar will return an array if it exists, even if the existing array's dtype, codecs, etc don't match those provided.
    • Keep this?

@aldenks
Copy link
Contributor

aldenks commented Mar 13, 2024

From our discussion today

  • Alden / top level API
    • port top level api to v3 branch. #1598 (comment)
    • In the v3 library, are the top level methods
      • the primary way users interact with the library? or
      • the smoothest on ramp for v2 library users into v3? (and Array.xxx and Group.xxx become primary)
      • something else?
      • Notes:
        • Joe's thought: We want to provide a pretty similar interface, help massage or raise errors when args not compatible. We can start deprecating and changing behavior
        • Norman: like Array. and Group entrypoints, but we need to have these top level entry points. promote the Array and Group classmethods in the docs
        • Joe: don't love polymophism of .open, but it exists
    • The kwargs to any method that can create an array are currently v2 specific (filters vs codecs, etc), plus there are a number of performance/behavior modifying args (cache_[metadata|attrs], partial_decompress, write_empty_chunks, dim separator). Do we
      • try not to change the api at all and try to translate into spec V3 land
      • make the kwargs actually match those of the Array.xxx v3 library methods, but also take in **v2_kwargs and translate where possible, checking for conflicts with v3 kwargs if provided
      • make the top level methods kwargs align with those of Array.create, etc
        • Norman: for open: make it compatible, you get back your method, for create could make
          • Runtime parameters: many of these didn't know exist, can debate case by case
          • Joe: if run time param provided that v3 don't use, raise error
    • Currently zarr.open and similar will return an array if it exists, even if the existing array's dtype, codecs, etc don't match those provided.
      • Keep this?
      • Joe: think we should raise an error, wide agreement on this

@jhamman jhamman moved this to Todo in Zarr-Python - 3.0 Apr 5, 2024
@jhamman jhamman added this to the 3.0.0.alpha milestone Apr 5, 2024
@aldenks
Copy link
Contributor

aldenks commented Apr 23, 2024

I'm really failing to find the time to make progress on this, I apologize. I would like to work on it and will keep trying to find time but also would not be offended if someone else picked it up -- I don't want to hold v3 alpha up.

(I'll also be traveling and not able to make the biweekly v3 refactor meeting tomorrow but I will be there next time)

@jhamman jhamman self-assigned this May 16, 2024
@jhamman jhamman mentioned this issue May 16, 2024
6 tasks
@jhamman
Copy link
Member Author

jhamman commented May 16, 2024

Update here: I've opened #1884 with a big step forward on this. Currently waiting on #1670 to land before doing the hardest parts.

@jhamman jhamman moved this from Todo to In Progress in Zarr-Python - 3.0 May 17, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in Zarr-Python - 3.0 Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design discussion help wanted Issue could use help from someone with familiarity on the topic
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants