Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beyond Scheduler.is_rootish #8934

Open
fjetter opened this issue Nov 13, 2024 · 1 comment
Open

Beyond Scheduler.is_rootish #8934

fjetter opened this issue Nov 13, 2024 · 1 comment

Comments

@fjetter
Copy link
Member

fjetter commented Nov 13, 2024

The scheduler currently relies on a crude heuristic to infer topologies that may suggest that certain tasks are "root-ish". If the tasks are detected as such, they are "queued" to avoid memory pressure on the workers due to various scheduling artifacts.

There are a couple of known problems with this

  • The heuristic itself is known to be problematic. Only recently we added a configuration parameter to bypass some of the brittleness for specific cases, see Add configurations for rootish taskgroup threshold #8898
  • For P2P to work reliably, we already had to introduce an internal-only flag _queueable that is being set by the P2P extension to explicitly disable queuing in the rare cases where the P2P topologies would be simple enough to fall into the heuristics classification.
  • We also ran into a subtle but severe regression in Remove alias resolving to fix queuing #8933

Instead of relying on this heuristic to control queuing I suggest to leverage the newly introduced task spec classes.

I propose to add an attribute to the Task class that is flagging whether or not a task is considered rootish (Instead of root-ish we may actually want to flag the task as IO but this is still TBD)

This flag would work very similar to how annotations worked before with the important difference that we wouldn't loose any annotations when moving from HLG to Low level graph representation. Therefore, to make this work we'd also have to reimplement low level fusion code (which from all we know is still very important for array like workloads, see also dask/dask#11458)

@hendrikmakait
Copy link
Member

Note that there is a difference between tasks being rootish and tasks getting not getting queued. In #8873, we had to move from flagging a task as non-rootish to being non-queuable due to the difference in scheduling behavior for rootish and non-rootish tasks while queueing is disabled. This may not be relevant but should be taken into account when replacing Scheduler.is_rootish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants