-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nomad performs in-place updates when datacenter is changed. #10746
Comments
I've tested this against a cluster with 3 DCs and found that even if we update the job to a DC that exists, the update still happens in-place and the allocation isn't moved.
One of the tricky bits here is that the The following patch triggers a destructive update if any member of the diff --git a/scheduler/util.go b/scheduler/util.go
index 3bf944373..b9a220acc 100644
--- a/scheduler/util.go
+++ b/scheduler/util.go
@@ -7,6 +7,7 @@ import (
log "github.com/hashicorp/go-hclog"
memdb "github.com/hashicorp/go-memdb"
+ "github.com/hashicorp/nomad/helper"
"github.com/hashicorp/nomad/nomad/structs"
)
@@ -346,6 +347,10 @@ func tasksUpdated(jobA, jobB *structs.Job, taskGroup string) bool {
a := jobA.LookupTaskGroup(taskGroup)
b := jobB.LookupTaskGroup(taskGroup)
+ if !helper.CompareSliceSetString(jobA.Datacenters, jobB.Datacenters) {
+ return true
+ }
+
// If the number of tasks do not match, clearly there is an update
if len(a.Tasks) != len(b.Tasks) {
return true But unfortunately that will cause unneccessary destructive allocation updates. For example, if our job starts with
We can update to
But if we hack in a check of the datacenters in the calling functions: diff --git a/scheduler/util.go b/scheduler/util.go
index 3bf944373..865803720 100644
--- a/scheduler/util.go
+++ b/scheduler/util.go
@@ -7,6 +7,7 @@ import (
log "github.com/hashicorp/go-hclog"
memdb "github.com/hashicorp/go-memdb"
+ "github.com/hashicorp/nomad/helper"
"github.com/hashicorp/nomad/nomad/structs"
)
@@ -700,6 +701,11 @@ func inplaceUpdate(ctx Context, eval *structs.Evaluation, job *structs.Job,
continue
}
+ // The alloc is on a node that's now in an ineligible DC
+ if !helper.SliceStringContains(job.Datacenters, node.Datacenter) {
+ continue
+ }
+
// Set the existing node as the base set
stack.SetNodes([]*structs.Node{node})
@@ -983,6 +989,11 @@ func genericAllocUpdateFn(ctx Context, stack Stack, evalID string) allocUpdateTy
return false, true, nil
}
+ // The alloc is on a node that's now in an ineligible DC
+ if !helper.SliceStringContains(newJob.Datacenters, node.Datacenter) {
+ return false, true, nil
+ }
+
// Set the existing node as the base set
stack.SetNodes([]*structs.Node{node}) Let's try that one:
The alloc is in "dc2", so if we change to
But if we change to
I'm not wild about where that function lands in terms of the design of the scheduler and testability, but I'll open a PR with this patch for further discussion of those details. |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad v1.1.1 (7feec97)
Reproduced in Nomad 0.11.8 as well
Issue
Nomad always performs an in-place update if datacenter is the only change to a job. This can make an allocation misreport the datacenter it is in, and can allow a user to update a job to an invalid datacenter. These changes would only be seen if the allocations were restarted.
Reproduction steps
Start a nomad dev agent
Run e1.nomad
Elided output, click for full.
Plan e2.nomad
Notice that is says it's going to change the datacenter via in-place update.
Run e2.nomad
Allocation says that it's v1 now.
Elided output, click for full.
If you stop the alloc...
Expected Result
Because "🤯" was invalid in my case (I did test with an invalid plaintext datacenter identifier too), I expected either an error or a create/destroy update which then became a blocked alloc.
Actual Result
Job files
e1.nomad
e2.nomad
The text was updated successfully, but these errors were encountered: