-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preempted dispatch alloc is not replaced after resources become available #9890
Comments
What I observed is that it doesn't even need to be a dispatched job, any batch job would do. I now started two batch jobs with count = 1400, one with higher priority and one with 10 lower. The allocations from the low priority job were quickly preempted, but then they never returned after the high priority job finished (and about 300 allocations went straight into a failed state and never recovered). In the end Nomad was telling me there is 1000 queued tasks but nothing was happening. Nomad version is 1.2.6. |
Indeed I'm able to reproduce with just a normal nomad.hclclient {
enabled = true
}
server {
enabled = true
default_scheduler_config {
preemption_config {
service_scheduler_enabled = true
batch_scheduler_enabled = true
}
}
} low.nomadjob "low" {
datacenters = ["dc1"]
priority = 50
type = "batch"
group "group" {
count = 3
task "sleep" {
driver = "exec"
config {
command = "/bin/sleep"
args = ["10000"]
}
resources {
cpu = 500
memory = 10000
}
}
}
} high.nomadjob "low" {
datacenters = ["dc1"]
priority = 50
type = "batch"
group "group" {
count = 3
task "sleep" {
driver = "exec"
config {
command = "/bin/sleep"
args = ["30"]
}
resources {
cpu = 500
memory = 10000
}
}
}
}
|
This PR fixes a bug where an evicted batch job would not be rescheduled once resources become available. Closes #9890
This PR fixes a bug where an evicted batch job would not be rescheduled once resources become available. Closes #9890
This PR fixes a bug where an evicted batch job would not be rescheduled once resources become available. Closes #9890
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
When a dispatch job allocation is displaced by pre-emption, it is never replaced even after resources become available.
Users have reported they expect the dispatch job to only be temporarily displaced. The evicted allocation should be replaced but in a
blocked
state while waiting for resources to become available, just as happens with placement failures.This is borderline bug/enhancement because the behavior is not well-defined in the documentation, but it's certainly surprising to users.
To reproduce, run Nomad and enable batch preemption:
Verify the resources available:
Low-priority job:
High-priority job, with memory requirements that force pre-emption:
Register both jobs.
Dispatch the low priority job and note that it's running:
While that job is still running, dispatch the high priority job and note that the low-priority dispatched job is now dead because it's been evicted:
Wait for the high-priority job to complete and note that the low-priority job is not replaced:
The text was updated successfully, but these errors were encountered: