-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Round-robin worker selection makes poor choices with worker-saturation > 1.0
#7197
Comments
I'm not a big fan of the round robin in general xref #6974 I'm a bit nervous about the change to is_rootish since I'm having a hard time estimating the impact FWIW I'm wondering if this test even makes any sense the way it is written. I'd like to see a permutation in the order in which we submit these tasks and I bet main would be failing as well |
Yeah, I'm also not a fan of the round robin. The |
For posterity, here's why this test currently fails with We have 3 tasks, and a cluster with 3 threads—2 on one worker, 1 on another. Even though the tasks are clearly root tasks (they have no dependencies #7274), there aren't enough of them to be considered root-ish #7273. So we don't use the queuing-related code path, even though queuing is enabled. Instead, we select workers for them via this old round-robin logic. Then this happens:
Why does this even pass with queuing off, or with I'd argue this is simply a bug in the round-robin code path, by not sorting by quite the right criteria. It's inconsistent with diff --git a/distributed/scheduler.py b/distributed/scheduler.py
index eb5828bf..fd487cc8 100644
--- a/distributed/scheduler.py
+++ b/distributed/scheduler.py
@@ -2212,7 +2212,7 @@ class SchedulerState:
wp_vals = cast("Sequence[WorkerState]", worker_pool.values())
n_workers: int = len(wp_vals)
if n_workers < 20: # smart but linear in small case
- ws = min(wp_vals, key=operator.attrgetter("occupancy"))
+ ws = min(wp_vals, key=lambda ws: ws.occupancy / ws.nthreads)
assert ws
if ws.occupancy == 0:
# special case to use round-robin; linear search then the test passes at 1.0, 1.1, and inf. |
Another issue with this round-robin code path: it doesn't take memory into consideration. This test, to be added in #7248, fails on main right now both with queuing on and off: distributed/distributed/tests/test_scheduler.py Lines 484 to 505 in 6f0bbc5
This is the case @crusaderky cares about. In an idle cluster, if some workers have keys in memory and others don't, we shouldn't keep adding more tasks to the workers that already have data in memory. |
Finally, the >20 workers case (which would be heavily used in most real-world non-local clusters, especially with the Futures API) is only covered by one test: #7275. Obviously, both this "pick the worker with open threads" and "pick the worker with less memory" go out the window window with >20 workers, since we're not taking the Given that selecting the That's why eliminating this round-robin code path and consolidating it into the |
Note that this was alleviated, but not entirely fixed, by #7278. (It's still not sorting by a great objective function, but totally-full candidates are now removed regardless of |
test_wait_first_completed
is failing in #7191, with theworker-saturation
value set to 1.1distributed/distributed/tests/test_client.py
Lines 732 to 746 in 0983731
It works fine with 1.0, but because of the round-up logic #7116 allowing workers to be oversaturated, fails for 1.1
It blocks forever because the worker with 1 thread gets assigned
[block_on_event, inc]
, and the worker with 2 threads gets assigned[block_on_event]
. It should be the other way around.The culprit has something to do with the round-robin logic that only applies to rare situations like this, where the cluster is small but larger than the TaskGroup being assigned
distributed/distributed/scheduler.py
Lines 2210 to 2236 in 0983731
If I update
is_rootish
like so:the test passes.
cc @fjetter @crusaderky
The text was updated successfully, but these errors were encountered: