Feat/concurrent members #2519

d-v-b · 2024-11-26T14:38:53Z

Makes AsyncGroup.members() fetch keys and fetch metadata concurrently, which provides a big performance win for high-latency storage backends. The number of concurrent operations is limited by the zarr-wide configuration setting.

in main, Group.members() requires ~O(num_members) time to complete, because it does not perform IO concurrently. In this PR, Group.members runs in constant time (until the number of concurrent requests exceeds the concurrency limit).

…at/latency-store

…o feat/concurrent-members

…at/concurrent-members

d-v-b · 2024-11-26T14:39:48Z

note: this PR depends on #2474

src/zarr/core/group.py

tests/test_group.py

Co-authored-by: Deepak Cherian <[email protected]>

…at/concurrent-members

…n into feat/concurrent-members

…ess than the number of groups * latency

TomAugspurger · 2025-01-01T01:03:23Z

src/zarr/core/group.py

+            # as opposed to a prefix, in the store under the prefix associated with this group
+            # in which case `key` cannot be the name of a sub-array or sub-group.
+            warnings.warn(
+                f"Object at {e.args[0]} is not recognized as a component of a Zarr hierarchy.",


It's technically possible for KeyError.args to be empty, so this [0] would raise an IndexError:

IndexError Traceback (most recent call last) Cell In[10], line 1 ----> 1 KeyError().args[0]

If we're comfortable assuming / requiring that things raising a KeyError here will populate that with the key, then I think this is fine to ignore. Otherwise, we might want to catch that IndexError.

I'm pretty sure that args will always be populated, because we are handling exceptions coming downstream of a function that necessarily takes a concrete key to query from storage. For that reason, even if I did add code to handle the case when e.args is not populated, I have no idea how we would test this case with our current Group api, since there's no way that I know of to reach this line without some concrete keys.

TomAugspurger · 2025-01-01T01:05:09Z

src/zarr/core/group.py

+    manager provided by that semaphore. If the semaphore parameter is None, then getitem is invoked
+    without a context manager.
+    """
+    if semaphore is not None:


Style nitpick: I like contextlib.nullcontext for cases where you may or may not have a context manager: https://docs.python.org/3/library/contextlib.html#contextlib.nullcontext

3.10 added support for asynchronous context managers, so this should be usable here:

semaphore = semaphore or contextlib.nullcontext() async with semaphore: ....

And I wonder whether getitem_semaphore should be the one to look up async.concurrency from the zarr config? Maybe we want that to be the default so that we don't miss it anywhere (and have some other way to indicate unbounded concurrency?)

I tried contextlib.nullcontext but I didn't feel like the LOC saved was worth the added indirection in this case. I assume that contextlib.nullcontext doesn't add notable performance overhead, by contrast, I find the current if... else construction easier to reason about, and it requires 1 fewer import.

And I wonder whether getitem_semaphore should be the one to look up async.concurrency from the zarr config? Maybe we want that to be the default so that we don't miss it anywhere (and have some other way to indicate unbounded concurrency?)

As I understand it, we only query the concurrency limit when the semaphore is created, which is necessarily before getitem_semaphore gets called. All invocations of getitem_semaphore have to use the same Semaphore instance, because otherwise there would be no coordination mechanism for rate-limiting. So I think that means getitem_semaphore shouldn't know anything about the config, or even concurrency limits.

…into feat/concurrent-members

…at/concurrent-members

…n into feat/concurrent-members

d-v-b · 2025-01-07T14:05:00Z

@dstansby how can I re-rerun the docs build? and do you have any idea why it failed?

dstansby · 2025-01-07T14:23:17Z

Do you have a readtheodcs account? If you send me the email or username for that, I can add you as a maintainer to the project on readthedocs.org. Then you'll get a "Rebuild this build" button:

I'm not sure why this one failed... I will restart!

d-v-b · 2025-01-07T14:31:43Z

thanks for fixing it @dstansby!

d-v-b added 21 commits November 8, 2024 14:45

feat: add wrapperstore

8407c64

feat: add latencystore

5e9ffb8

rename noisysetter -> noisygetter

5d7abf4

rename _wrapped to _store

c486351

loggingstore inherits from wrapperstore

f97b27c

initial commit

ffca710

working members traversal

d33cb7d

Merge branch 'main' into feat/latency-store

5ba51af

bolt concurrent members implementation onto async group

8f87977

update scratch file

87e0b83

use metadata / node builders for v3 node creation

502ad5e

Merge branch 'main' of github.com:zarr-developers/zarr-python into fe…

53c8738

…at/latency-store

Merge branch 'feat/latency-store' of github.com:d-v-b/zarr-python int…

70a4ff5

…o feat/concurrent-members

fix key/name handling in recursion

d10d805

add latency-based test

4c624e1

add latency-based concurrency tests for group.members

f23ee85

improve comments for test

cba42f3

add concurrency limit

9691102

add test for concurrency limiting

d790379

docstrings

aadbece

Merge branch 'main' of github.com:zarr-developers/zarr-python into fe…

1156859

…at/concurrent-members

d-v-b requested review from jhamman and TomAugspurger and removed request for jhamman November 26, 2024 14:42

d-v-b added 3 commits November 26, 2024 15:59

remove function that was only calling itself

e19238d

docstrings

db74205

relax timing requirement for concurrency test

46ba0cb

jhamman added the V3 label Nov 29, 2024

Merge branch 'main' into feat/concurrent-members

180ca9b

dcherian reviewed Dec 12, 2024

View reviewed changes

src/zarr/core/group.py Outdated Show resolved Hide resolved

dcherian reviewed Dec 12, 2024

View reviewed changes

src/zarr/core/group.py Show resolved Hide resolved

dcherian reviewed Dec 12, 2024

View reviewed changes

src/zarr/core/group.py Show resolved Hide resolved

dcherian reviewed Dec 12, 2024

View reviewed changes

tests/test_group.py Show resolved Hide resolved

d-v-b and others added 2 commits December 12, 2024 11:04

Update src/zarr/core/group.py

a48efa8

Co-authored-by: Deepak Cherian <[email protected]>

Merge branch 'main' into feat/concurrent-members

e53f847

dstansby removed the V3 label Dec 12, 2024

d-v-b added 6 commits December 18, 2024 15:28

Merge branch 'main' of github.com:zarr-developers/zarr-python into fe…

824ff03

…at/concurrent-members

Merge branch 'feat/concurrent-members' of github.com:d-v-b/zarr-pytho…

9a17dfd

…n into feat/concurrent-members

exists_ok -> overwrite

ebfc200

simplify group_members_perf test, just require that the duration is l…

39ec6b5

…ess than the number of groups * latency

update test docstring

f25bda3

Merge branch 'main' into feat/concurrent-members

3f8de76

d-v-b requested a review from dcherian December 20, 2024 08:43

dcherian mentioned this pull request Dec 30, 2024

More efficient Group.arrays() for v3 stores #1721

Closed

TomAugspurger reviewed Jan 1, 2025

View reviewed changes

Merge branch 'main' of https://github.com/zarr-developers/zarr-python …

d920a9d

…into feat/concurrent-members

TomAugspurger approved these changes Jan 2, 2025

View reviewed changes

dcherian approved these changes Jan 2, 2025

View reviewed changes

d-v-b added 3 commits January 7, 2025 14:33

Merge branch 'main' of github.com:zarr-developers/zarr-python into fe…

191ec75

…at/concurrent-members

Merge branch 'feat/concurrent-members' of github.com:d-v-b/zarr-pytho…

76341db

…n into feat/concurrent-members

remove vestigial test

08440e5

d-v-b merged commit bc5877b into zarr-developers:main Jan 7, 2025
28 checks passed

d-v-b deleted the feat/concurrent-members branch January 7, 2025 14:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/concurrent members #2519

Feat/concurrent members #2519

d-v-b commented Nov 26, 2024

d-v-b commented Nov 26, 2024

TomAugspurger Jan 1, 2025

d-v-b Jan 1, 2025

TomAugspurger Jan 1, 2025

d-v-b Jan 1, 2025

d-v-b commented Jan 7, 2025

dstansby commented Jan 7, 2025

d-v-b commented Jan 7, 2025

Feat/concurrent members #2519

Feat/concurrent members #2519

Conversation

d-v-b commented Nov 26, 2024

d-v-b commented Nov 26, 2024

TomAugspurger Jan 1, 2025

Choose a reason for hiding this comment

d-v-b Jan 1, 2025

Choose a reason for hiding this comment

TomAugspurger Jan 1, 2025

Choose a reason for hiding this comment

d-v-b Jan 1, 2025

Choose a reason for hiding this comment

d-v-b commented Jan 7, 2025

dstansby commented Jan 7, 2025

d-v-b commented Jan 7, 2025