Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-scheduler Job Starvation #585

Closed
thinkharderdev opened this issue Jan 5, 2023 · 0 comments · Fixed by #586
Closed

Multi-scheduler Job Starvation #585

thinkharderdev opened this issue Jan 5, 2023 · 0 comments · Fixed by #586
Labels
bug Something isn't working

Comments

@thinkharderdev
Copy link
Contributor

Describe the bug
A clear and concise description of what the bug is.

In scenarios where multiple schedulers are running concurrently it is possible to run into the following scenario:

  1. Job A gets submitted to scheduler A and is scheduled on all available task slots.
  2. Job B gets submitted to scheduler B and there are no available task slots for scheduling.
  3. All task updates from Job A go back to scheduler A. It can not schedule any tasks for Job B (because that job is owned by scheduler B)
  4. Because no task updates land on scheduler B, Job B will never be scheduled anywhere.

To Reproduce
Steps to reproduce the behavior:

  1. Start a cluster with two schedulers
  2. Submit a job to scheduler 1 that consumes all available executor slots
  3. Before any task on job 1 complete, submit a job to scheduler 2
  4. Job 2 will never run

Expected behavior
A clear and concise description of what you expected to happen.

Job 2 should start running whenever executor task slots become available

Additional context
Add any other context about the problem here.

The fix here is simple. In the event loop, if a job is submitted and there are not task slots available, resubmit the job to the event loop (with a small delay to prevent excessive CPU consumption).

@thinkharderdev thinkharderdev added the bug Something isn't working label Jan 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant