-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System job keeps running after I try to remove it from a DC #11373
Comments
Hi @mikehardenize, Thanks for using Nomad! Would you mind posting your full job file (without any secrets) for me to take a look at? |
job "traefik" {
type = "system"
datacenters = ["us-east4-a"]
constraint {
attribute = "${node.class}"
value = "job"
}
group "traefik" {
network {
port "http" {
static = 80
}
port "https" {
static = 443
}
}
volume "traefik" {
type = "host"
read_only = false
source = "traefik"
}
task "traefik" {
driver = "docker"
service {
name = "traefik-http"
port = "http"
check {
type = "http"
path = "/ping"
interval = "5s"
timeout = "2s"
}
}
volume_mount {
volume = "traefik"
destination = "/host"
read_only = false
}
config {
image = "traefik:2.5"
cap_add = ["net_raw"]
ports = ["http", "https"]
network_mode = "host"
dns_servers = ["127.0.0.1"]
auth_soft_fail = true
}
}
}
} |
Thanks @mikehardenize for reporting the bug. I was able to reproduce it and identify the causes. We'll have a fix PR soon. |
The system scheduler should leave allocs on draining nodes as-is, but stop node stop allocs on nodes that are no longer part of the job datacenters. Previously, the scheduler did not make the distinction and left system job allocs intact if they are already running. I've added a failing test first, which you can see in https://app.circleci.com/jobs/github/hashicorp/nomad/179661 . Fixes #11373
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad v1.1.5 (117a23d)
Operating system and Environment details
Centos 7
Issue
I have two nomad agents in different DCs. One in us-east4-a and another in us-east4-b.
I created a system job, but it only had
datacenters = ["us-east4-a"]
so it only ran on one of the agents.I then updated the job to contain
datacenters = ["us-east4-a", "us-east4-b"]
and re-ran it. It then started running on both agents (as expected).However, I then switched it back to
datacenters = ["us-east4-a"]
and re-ran the job, and it unexpectedly continued running on the us-east4-b agent.When I do a "nomad status jobname" it has "Datacenters = us-east4-a" in the output, but it also lists an allocation for each agent:
The text was updated successfully, but these errors were encountered: