-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: reschedule evicted batch job when resources become available #13205
Conversation
using the same test setup in #9890
|
This PR fixes a bug where an evicted batch job would not be rescheduled once resources become available. Closes #9890
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Nice debugging work!
Thanks team! This is really useful for some of our workloads. Great change! |
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
This PR fixes a bug where an evicted batch job would not be rescheduled
once resources become available.
Intuition: previously the scheduler would filter an alloc that was evicted but its tasked completed successfully. The problem is that that tasks were stopped because of the eviction stop signal, not because they had run to completion. Such an alloc needs to be queued to run again.
Closes #9890