Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: improves delete_dir for s3fs-backed FsspecStore #2661

Merged
merged 20 commits into from
Feb 14, 2025

Conversation

carshadi
Copy link
Contributor

@carshadi carshadi commented Jan 6, 2025

Improves performance of FsspecStore.delete_dir when underlying fs is s3fs, by passing a list of filepaths to be removed in bulk instead of one-by-one.

Resolves #2659

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

carshadi and others added 2 commits January 6, 2025 14:38
- override Store.delete_dir default method, which deletes keys one by one, to support bulk deletion for fsspec implementations that support a list of paths in the fs._rm method.
- This can greatly reduce the number of requests to S3, which reduces likelihood of running into throttling errors and improves delete performance.
- Currently, only s3fs is supported.
@jhamman jhamman requested a review from martindurant January 6, 2025 21:55
src/zarr/storage/_fsspec.py Outdated Show resolved Hide resolved
src/zarr/storage/_fsspec.py Outdated Show resolved Hide resolved
src/zarr/storage/_fsspec.py Outdated Show resolved Hide resolved
src/zarr/storage/_fsspec.py Outdated Show resolved Hide resolved
Copy link
Member

@martindurant martindurant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reservations for this code. It seems to me, that calling ._rm() should be all that's required, and let fsspec handle everything else.

src/zarr/storage/_fsspec.py Outdated Show resolved Hide resolved
src/zarr/storage/_fsspec.py Outdated Show resolved Hide resolved
src/zarr/storage/_fsspec.py Outdated Show resolved Hide resolved
@carshadi carshadi requested review from martindurant and d-v-b January 8, 2025 18:00
@dstansby dstansby added the needs release notes Automatically applied to PRs which haven't added release notes label Jan 9, 2025
@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Jan 10, 2025
Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @carshadi for continued efforts here. I'd like to see some fsspec specific tests here if possible.

src/zarr/storage/_fsspec.py Outdated Show resolved Hide resolved
@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Jan 24, 2025
@carshadi
Copy link
Contributor Author

Thanks @carshadi for continued efforts here. I'd like to see some fsspec specific tests here if possible.

@jhamman added a few test cases. Let me know if there are others that would be useful.

Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good here!

@carshadi - the only thing this needs is a release note.

@martindurant - care to do the final review and/or merge?

src/zarr/storage/_fsspec.py Outdated Show resolved Hide resolved
src/zarr/storage/_fsspec.py Outdated Show resolved Hide resolved
tests/test_store/test_fsspec.py Outdated Show resolved Hide resolved
tests/test_store/test_fsspec.py Show resolved Hide resolved
@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Feb 14, 2025
@carshadi
Copy link
Contributor Author

Hi @martindurant , @jhamman , let me know if there's anything else I can do here. thanks

@martindurant
Copy link
Member

+1

@dcherian dcherian enabled auto-merge (squash) February 14, 2025 18:52
@dcherian dcherian merged commit 48f7c9a into zarr-developers:main Feb 14, 2025
30 checks passed
@carshadi carshadi deleted the feat-fsspecstore-bulk-delete branch February 14, 2025 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FsspecStore directory deletion performance improvements
6 participants