Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge consecutive reads in shards #1358

Closed

Conversation

jstriebel
Copy link
Member

This PR adds a feature for the v3 sharding implementation. When multiple chunks are read at once with a store that supports partial reads, each chunk was requested separately before, even if it was in the same shard. This PR adds the feature to merge consecutive (on a byte-level) requests in the same shard.

TODO:

  • Add unit tests and/or doctests in docstrings
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Feb 27, 2023
@codecov
Copy link

codecov bot commented Feb 27, 2023

Codecov Report

Merging #1358 (a98c7b8) into main (5ece3e6) will decrease coverage by 0.04%.
The diff coverage is 83.33%.

@@             Coverage Diff             @@
##              main    #1358      +/-   ##
===========================================
- Coverage   100.00%   99.96%   -0.04%     
===========================================
  Files           36       36              
  Lines        14775    14804      +29     
===========================================
+ Hits         14775    14799      +24     
- Misses           0        5       +5     
Impacted Files Coverage Δ
zarr/_storage/v3_storage_transformers.py 97.97% <83.33%> (-2.03%) ⬇️

@rabernat
Copy link
Contributor

rabernat commented Mar 7, 2023

Thanks for working on this important optimization! @martindurant implemented something similar at the fsspec level in fsspec/filesystem_spec#1063. I'm curious how this implementation here would interact with that one. Where in this stack should this optimization live?

@normanrz
Copy link
Contributor

normanrz commented Mar 7, 2023

Good point. We weren't aware of that optimization. I almost think it makes more sense in the fsspec. @jstriebel wdyt?

@rabernat
Copy link
Contributor

rabernat commented Mar 7, 2023

I almost think it makes more sense in the fsspec.

That's a reasonable point of view. However, currently not all zarr I/O goes through fsspec. This is a rather major architectural issue that we need to confront for zarr-python: how dependent do we want to be on fsspec? If we require fsspec as a core dependency, we could remove quite a bit of code from zarr-python itself. Currently, we're somewhere in the middle--we rely on fsspec for some things, but we still have many of our own implementations for different stores.

We need to discuss this as a team.

@martindurant
Copy link
Member

Totally agree with @rabernat , of course you might know which way I lean.

For merging the requests via fsspec, this should be available via the cat_ranges() method - it is intended but not implemented yet.

@jhamman
Copy link
Member

jhamman commented Dec 7, 2023

closing since now that the sharding implementation is in a codec, this would need to be reworked. This feature conversation is also happening as part of #1583

@jhamman jhamman closed this Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs release notes Automatically applied to PRs which haven't added release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants