Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sharding storage transformer for v3 #1111

Merged

Conversation

jstriebel
Copy link
Member

@jstriebel jstriebel commented Aug 22, 2022

Description

This PR

  • adds a sharding storage transformer:
    • based on the current spec proposal from ZEP2
    • supports partial reads
    • writes are limited so far: partial writes are not efficient yet, shards are always completely rewritten
      (a more efficient implementation might be a good follow-up)
  • adapts the v3 fsspec store to allow partial reads
  • uses partial reads for uncompressed v3 arrays
    (some refactoring might be needed in core.py to simplify the code-paths, but I'd rather defer this to a separate PR to keep the diff readable)
  • added test cases
  • resolves parts of v3 stores: Implement efficient get/set_partial_values #1106

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst (v3 is not yet documented)
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)
  • Consider adding a short-cut for sharded array creation (e.g. having a chunks_per_shard argument directly)

@jstriebel jstriebel changed the title Sharding storage transformers Sharding storage transformer for v3 Aug 22, 2022
@codecov
Copy link

codecov bot commented Aug 22, 2022

Codecov Report

Merging #1111 (a2eb332) into main (6f11ae7) will not change coverage.
The diff coverage is 100.00%.

❗ Current head a2eb332 differs from pull request most recent head dbf9fff. Consider uploading reports for the commit dbf9fff to get more accurate results

@@            Coverage Diff             @@
##              main     #1111    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           35        36     +1     
  Lines        14410     14739   +329     
==========================================
+ Hits         14410     14739   +329     
Impacted Files Coverage Δ
zarr/_storage/v3.py 100.00% <100.00%> (ø)
zarr/_storage/v3_storage_transformers.py 100.00% <100.00%> (ø)
zarr/core.py 100.00% <100.00%> (ø)
zarr/meta.py 100.00% <100.00%> (ø)
zarr/tests/test_core.py 100.00% <100.00%> (ø)
zarr/tests/test_storage_v3.py 100.00% <100.00%> (ø)
zarr/util.py 100.00% <100.00%> (ø)
zarr/storage.py 100.00% <0.00%> (ø)
zarr/tests/test_storage.py 100.00% <0.00%> (ø)

@lgtm-com
Copy link

lgtm-com bot commented Aug 22, 2022

This pull request introduces 2 alerts when merging 06ce675 into 44de0e4 - view on LGTM.com

new alerts:

  • 1 for `__eq__` not overridden when adding attributes
  • 1 for Use of the return value of a procedure

@jstriebel
Copy link
Member Author

@joshmoore I just added a ZARR_V3_SHARDING flag for sharding usage, as discussed previously.

joshmoore added a commit that referenced this pull request Jan 16, 2023
* add storage_transformers and get/set_partial_values

* formatting

* add docs and release notes

* add test_core testcase

* Update zarr/creation.py

Co-authored-by: Gregory Lee <[email protected]>

* apply PR feedback

* add comment that storage_transformers=None is the same as storage_transformers=[]

* use empty tuple as default for storage_transformers

* make mypy happy

* better coverage, minor fix, adding rmdir

* add missing rmdir to test

* increase coverage

* improve test coverage

* fix TestArrayWithStorageTransformersV3

* Update zarr/creation.py

Co-authored-by: Gregory Lee <[email protected]>

* pick generic storage transformer changes from #1111

* increase coverage

* fix order of storage transformers

* retrigger CI

* minor fixes

* make flake8 happy

* apply PR feedback

Co-authored-by: Gregory Lee <[email protected]>
Co-authored-by: Josh Moore <[email protected]>
@joshmoore
Copy link
Member

@jstriebel : ok to leave you to push the update here? (If not, I'll dig in)

@jstriebel
Copy link
Member Author

@jstriebel : ok to leave you to push the update here? (If not, I'll dig in)

Sure, done 👍

Copy link
Member

@joshmoore joshmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment I've been struggling with that's a bit outwith this PR. Getting this merged and then I'll prep a 2.14.0 (including ensure_bytes) unless there are any concrete versioning suggestions.

.github/workflows/minimal.yml Show resolved Hide resolved
zarr/core.py Show resolved Hide resolved
zarr/core.py Show resolved Hide resolved
@jstriebel
Copy link
Member Author

Getting this merged

🚀

I'll prep a 2.14.0 (including ensure_bytes) unless there are any concrete versioning suggestions.

Seems fine to me 👍

@jstriebel
Copy link
Member Author

@joshmoore I don't see any further actionables for this PR, I'd be glad if we could merge it soon.

@joshmoore
Copy link
Member

Yessirs.

@joshmoore joshmoore merged commit c714d2b into zarr-developers:main Feb 2, 2023
@rabernat
Copy link
Contributor

rabernat commented Feb 2, 2023

This is huge! 🙌

@jstriebel
Copy link
Member Author

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants