Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: reschedule evicted batch job when resources become available #13205

Merged
merged 1 commit into from
Jun 2, 2022

Conversation

shoenig
Copy link
Member

@shoenig shoenig commented Jun 2, 2022

This PR fixes a bug where an evicted batch job would not be rescheduled
once resources become available.

Intuition: previously the scheduler would filter an alloc that was evicted but its tasked completed successfully. The problem is that that tasks were stopped because of the eviction stop signal, not because they had run to completion. Such an alloc needs to be queued to run again.

Closes #9890

@shoenig
Copy link
Member Author

shoenig commented Jun 2, 2022

using the same test setup in #9890

➜ nomad job status low
ID            = low
Name          = low
Submit Date   = 2022-06-02T13:32:47-05:00
Type          = batch
Priority      = 50
Datacenters   = dc1
Namespace     = default
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost  Unknown
group       0       0         3        0       1         0     0

Allocations
ID        Node ID   Task Group  Version  Desired  Status    Created  Modified
af37bf7d  81e51b03  group       0        run      running   11s ago  8s ago
0f968375  81e51b03  group       0        evict    complete  59s ago  34s ago
3cc484a6  81e51b03  group       0        run      running   59s ago  55s ago
af8f1ff6  81e51b03  group       0        run      running   59s ago  55s ago

This PR fixes a bug where an evicted batch job would not be rescheduled
once resources become available.

Closes #9890
@shoenig shoenig force-pushed the b-batch-preempt2 branch from 92b0696 to 682dbaa Compare June 2, 2022 19:04
@shoenig shoenig marked this pull request as ready for review June 2, 2022 19:51
@shoenig shoenig requested review from DerekStrickland and tgross June 2, 2022 19:51
@shoenig shoenig added backport/1.1.x backport to 1.1.x release line backport/1.2.x backport to 1.1.x release line backport/1.3.x backport to 1.3.x release line labels Jun 2, 2022
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice debugging work!

@Fuco1
Copy link
Contributor

Fuco1 commented Aug 7, 2022

Thanks team! This is really useful for some of our workloads. Great change!

@github-actions
Copy link

github-actions bot commented Dec 6, 2022

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backport/1.1.x backport to 1.1.x release line backport/1.2.x backport to 1.1.x release line backport/1.3.x backport to 1.3.x release line
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Preempted dispatch alloc is not replaced after resources become available
4 participants