-
-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/concurrent members #2519
Feat/concurrent members #2519
Conversation
…o feat/concurrent-members
…at/concurrent-members
note: this PR depends on #2474 |
Co-authored-by: Deepak Cherian <[email protected]>
…at/concurrent-members
…n into feat/concurrent-members
…ess than the number of groups * latency
# as opposed to a prefix, in the store under the prefix associated with this group | ||
# in which case `key` cannot be the name of a sub-array or sub-group. | ||
warnings.warn( | ||
f"Object at {e.args[0]} is not recognized as a component of a Zarr hierarchy.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's technically possible for KeyError.args
to be empty, so this [0]
would raise an IndexError:
IndexError Traceback (most recent call last)
Cell In[10], line 1
----> 1 KeyError().args[0]
If we're comfortable assuming / requiring that things raising a KeyError here will populate that with the key, then I think this is fine to ignore. Otherwise, we might want to catch that IndexError.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure that args
will always be populated, because we are handling exceptions coming downstream of a function that necessarily takes a concrete key to query from storage. For that reason, even if I did add code to handle the case when e.args
is not populated, I have no idea how we would test this case with our current Group
api, since there's no way that I know of to reach this line without some concrete keys.
manager provided by that semaphore. If the semaphore parameter is None, then getitem is invoked | ||
without a context manager. | ||
""" | ||
if semaphore is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nitpick: I like contextlib.nullcontext
for cases where you may or may not have a context manager: https://docs.python.org/3/library/contextlib.html#contextlib.nullcontext
3.10 added support for asynchronous context managers, so this should be usable here:
semaphore = semaphore or contextlib.nullcontext()
async with semaphore:
....
And I wonder whether getitem_semaphore
should be the one to look up async.concurrency
from the zarr config? Maybe we want that to be the default so that we don't miss it anywhere (and have some other way to indicate unbounded concurrency?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried contextlib.nullcontext
but I didn't feel like the LOC saved was worth the added indirection in this case. I assume that contextlib.nullcontext
doesn't add notable performance overhead, by contrast, I find the current if... else
construction easier to reason about, and it requires 1 fewer import.
And I wonder whether getitem_semaphore should be the one to look up async.concurrency from the zarr config? Maybe we want that to be the default so that we don't miss it anywhere (and have some other way to indicate unbounded concurrency?)
As I understand it, we only query the concurrency limit when the semaphore is created, which is necessarily before getitem_semaphore
gets called. All invocations of getitem_semaphore
have to use the same Semaphore
instance, because otherwise there would be no coordination mechanism for rate-limiting. So I think that means getitem_semaphore
shouldn't know anything about the config, or even concurrency limits.
…into feat/concurrent-members
…at/concurrent-members
…n into feat/concurrent-members
@dstansby how can I re-rerun the docs build? and do you have any idea why it failed? |
thanks for fixing it @dstansby! |
Makes
AsyncGroup.members()
fetch keys and fetch metadata concurrently, which provides a big performance win for high-latency storage backends. The number of concurrent operations is limited by the zarr-wide configuration setting.in
main
,Group.members()
requires~O(num_members)
time to complete, because it does not perform IO concurrently. In this PR,Group.members
runs in constant time (until the number of concurrent requests exceeds the concurrency limit).