-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure adaptive scaling is properly awaited and closed #4720
Conversation
@@ -19,43 +20,6 @@ | |||
) | |||
|
|||
|
|||
@pytest.mark.asyncio | |||
async def test_simultaneous_scale_up_and_down(cleanup): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was added in #1608 to disallow simultaneous up-/down-scaling but the current implementation doesn't work at all since the API changed. Instead it only raises error logs
77b5a88
to
6b32e66
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fjetter! I left a few small comments, but overall the changes here look good to me
|
||
|
||
@pytest.mark.asyncio | ||
async def test_adaptive_stopped(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this uses an async cluster, do we need the async_wait_for
+ timeout or can we just assert
the various attributes directly? For example, when I make the following changes locally this test still passes:
diff --git a/distributed/deploy/tests/test_adaptive.py b/distributed/deploy/tests/test_adaptive.py
index 39284302..8cf4f381 100644
--- a/distributed/deploy/tests/test_adaptive.py
+++ b/distributed/deploy/tests/test_adaptive.py
@@ -466,16 +466,8 @@ async def test_adaptive_stopped():
"""
async with LocalCluster(n_workers=0, asynchronous=True) as cluster:
async with Client(cluster, asynchronous=True) as client:
- instance = cluster.adapt(interval="10ms")
+ pc = cluster.adapt(interval="10ms").periodic_callback
+ assert pc is not None
+ assert pc.is_running() is not None
- await async_wait_for(
- lambda: instance.periodic_callback is not None, timeout=5
- )
-
- await async_wait_for(
- lambda: instance.periodic_callback.is_running() is not None, timeout=5
- )
-
- pc = instance.periodic_callback
-
- await async_wait_for(lambda: pc.is_running() is not None, timeout=5)
+ assert pc.is_running() is not None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, at the very least we'll need to have one wait in between which actually waits for the entire thing to start. If the PC was never started, the conditions are trivially true. I'll add a comment and what I can remove for it to still work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the is not None
checks where useless which is why the test did not fail. I properly assert for bool now and now I need to wait
# Need to call stop here before we close all servers to avoid having | ||
# dangling tasks in the ioloop | ||
with suppress(AttributeError): | ||
self._adaptive.stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit unfortunate since I need to call it up here and not down in Cluster
. If I do this only in cluster, the even loop seems to close too soon and we have still pending tasks from AdaptiveCore.adapt
.
I'm wondering if we ever considered adding PYTHONASYNCIODEBUG=1
to our test suite which would raise in these instances. Not sure how much would break or if this is a bad idea in general
* Ensure adaptive scaling is properly awaited and closed * review comments * Ensure no tasks are pending when closing adaptive cluster * remvoe assert in stop * break cyclic ref in adaptive core
sync
method is used which is more common than theloop.add_callback
.