Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use cases for work stealing #6600

Open
fjetter opened this issue Jun 20, 2022 · 3 comments
Open

Use cases for work stealing #6600

fjetter opened this issue Jun 20, 2022 · 3 comments
Labels
adaptive All things relating to adaptive scaling discussion Discussing a topic with no specific actions yet performance scheduling stability Issue or feature related to cluster stability (e.g. deadlock) stealing

Comments

@fjetter
Copy link
Member

fjetter commented Jun 20, 2022

Work stealing is a fairly complex machinery intended to redistribute tasks on a cluster to achieve a homogeneous occupancy, i.e. all workers will be busy for approximately the same time.

I'm currently aware of two use cases for it

A) Adaptive scaling or generally upscaling scenarios, i.e. adding more workers to a cluster require some kind of load balancing. Without work stealing, a newly added worker would sit idle until the scheduler might assign a task which is not even guaranteed or might work very poorly (e.g. #4471)
By having work stealing enabled, we are automatically ensuring that any newly added worker is able to start working since it gets tasks assigned via the stealing mechanism. However, this is known to not work well (list not exhaustive)

B) Another application would be a workload with vastly different runtimes in a TaskGroup. This is particularly concerning if there are few tasks in this task group or the runtime distribution is asymmetrical such that even after running many tasks the runtime differences would not cancel themselves and we'd have few workers with very large queues, effectively extending overall runtime by having a large tail in the computation.

I am not entirely sure if this usecase is actually very relevant and would appreciate some additional information around it. If this is indeed relevant we may benefit from an improved runtime tracking, e.g. with error measurement (e.g. #4028) in combination with a simpler, more selective algorithm.

The current work stealing algorithm has a couple of issues. Currently open issues can be filtered by the label stealing

Stealing also is known to be a trigger for deadlocks (at least four have been reported and fixed by now) since it requires a handshake that can cause timing issues (see e.g. https://github.com/dask/distributed/pulls?q=is%3Apr+is%3Aclosed+stealing+label%3Astealing+label%3Adeadlock)

There are even cases where work stealing is known to cause harm by reverting smart scheduler decisions, e.g. #6573

I'm currently trying to estimate whether we should pursue work stealing and try to make it robust or abandon this extension in favor of a less general but more robust solution for A and possibly B.

Thoughts?

cc @mrocklin @crusaderky @gjoseph92

@fjetter fjetter added performance discussion Discussing a topic with no specific actions yet stability Issue or feature related to cluster stability (e.g. deadlock) scheduling stealing labels Jun 20, 2022
@fjetter
Copy link
Member Author

fjetter commented Jun 20, 2022

There is currently a proposal open to introduce a third use case

C) Redistribute tasks that are currently assigned to paused workers (see #3761)

@gjoseph92
Copy link
Collaborator

In practice, I feel like work stealing mostly exists to rebalance the 10s or 100s of thousands of root tasks in processing, especially when a new worker arrives.

This is a pretty different use case than moving a handful of downstream tasks to a new worker to increase parallelism.

I think if these root vs downstream cases were separate, we could have better algorithms for both.

Holding back root tasks on the scheduler (and implementing an alternative load-balancing approach for those held-back tasks) would do that. I think that with #6560, we might be okay without work stealing at all. I think we'd still want it eventually for some scenarios, but the immediate need for it would be much less.

@mrocklin
Copy link
Member

mrocklin commented Jul 1, 2022

Sorry for missing this issue (I stopped tracking Dask issues a couple weeks ago while on vacation)

Some thoughts:

  1. (for A) Upscaling is pretty common. We'll need to have a solution for it
  2. (for B) It's not just "vastly different runtimes within a TaskGroup". When you get to the end of a long computation there are stragglers. The unbalance here can be non-trivial.
  3. However, this becomes less of an issue if we're allocating tasks non-eagerly as @gjoseph92 suggests. The value of work-stealing goes down significantly if we don't have a bunch of stuff on the workers
  4. The deadlocks will likely still be around. Tasks will get moved around for other reasons I think. I could be wrong here though.

Anyway, I agree with @gjoseph92 that if we hold back tasks on the scheduler then work stealing becomes more optional than it is today. I'd be curious about up-scaling, but it's probably not a huge issue.

@fjetter fjetter added the adaptive All things relating to adaptive scaling label Aug 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adaptive All things relating to adaptive scaling discussion Discussing a topic with no specific actions yet performance scheduling stability Issue or feature related to cluster stability (e.g. deadlock) stealing
Projects
None yet
Development

No branches or pull requests

3 participants