-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datacenter balance constraint #2168
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey,
Thanks for the great PR. I have left specific feedback in the code but the larger comment is that the way it is implemented right now will not work as expected.
Currently the iterator is acting a bit like roundrobin between the various DCs by restricting the count difference from being more than 1. However that breaks down if the input nodes aren't ordered round robin :)
We need the iterator to return true as long as the count does not exceed the desired balance. As in if we want to spread 9 instances on 3 DCs, it knows each DC should have 3 instances so as long as there aren't more than 3 in the DC it should return true.
nomad/structs/structs.go
Outdated
@@ -2864,6 +2864,7 @@ func (ta *TaskArtifact) Validate() error { | |||
} | |||
|
|||
const ( | |||
ConstraintBalance = "balance_datacenter" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change it to balance_datacenters
as it is balancing among multiple DCs
nomad/structs/structs.go
Outdated
@@ -2864,6 +2864,7 @@ func (ta *TaskArtifact) Validate() error { | |||
} | |||
|
|||
const ( | |||
ConstraintBalance = "balance_datacenter" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConstraintBalance
-> ConstraintBalanceDataCenters
@@ -114,6 +114,19 @@ constraint { | |||
} | |||
``` | |||
|
|||
- `"balance_datacenter"` - Instructs the scheduler to force an equal spread across | |||
all datacenters specified in the Job. When specified as a job constraint, it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the job's [datacenter list](https://www.nomadproject.io/docs/job-specification/job.html#datacenters)
scheduler/feasible.go
Outdated
@@ -157,6 +159,9 @@ type ProposedAllocConstraintIterator struct { | |||
// they don't have to be calculated every time Next() is called. | |||
tgDistinctHosts bool | |||
jobDistinctHosts bool | |||
|
|||
tgBalance bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put a comment above these
scheduler/feasible.go
Outdated
@@ -157,6 +159,9 @@ type ProposedAllocConstraintIterator struct { | |||
// they don't have to be calculated every time Next() is called. | |||
tgDistinctHosts bool | |||
jobDistinctHosts bool | |||
|
|||
tgBalance bool | |||
jobBalance bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tgBalance -> tgBalanceDCs
jobBalance -> jobBalanceDCs
scheduler/feasible.go
Outdated
jobCollision := alloc.JobID == iter.job.ID | ||
taskCollision := alloc.TaskGroup == iter.tg.Name | ||
|
||
// skip jobs not in this job or taskgroup (for jobBalance/tgBalance) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Skip allocations on the node that are not for this job or for the task group when using the balance constraint on the group
scheduler/feasible.go
Outdated
taskCollision := alloc.TaskGroup == iter.tg.Name | ||
|
||
// skip jobs not in this job or taskgroup (for jobBalance/tgBalance) | ||
if !(jobCollision && (iter.jobBalance || taskCollision)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// It has to also collide on the job because there is nothing forcing the group name to be unique across different // jobs that may be on the node.
groupCollision = alloc.TaskGroup == iter.tg.Name && job.Collision
if !jobCollision && !groupCollision || !iter.tgBalance {
continue
}
scheduler/feasible_test.go
Outdated
Datacenters: []string{"dc1", "dc2"}, | ||
TaskGroups: []*structs.TaskGroup{tg1, tg2}, | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you want to test is adding an allocation to the proposed and seeing that it returns nothing
scheduler/feasible.go
Outdated
} | ||
} | ||
|
||
min := math.MaxInt32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this logic should be detect the desired count of the task group and then divide it by the number of DCs.
For any DC that has less than that max you would return true and for any over you return false.
There is an edge case to consider that if the count is 10 and len(dcs) == 3, the code would have to detect that only one DC should be allowed to get to count 4.
There is a problem with the current implementation because imagine we have count = 4
and we have 2 nodes in DC1, and one in DC2 and DC3. You would expect this to work but if the input was [1, 1, 2, 3]
. It would return false on the second node in DC1.
if len(out) != 2 { | ||
t.Fatalf("Bad: %#v", out) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test to stack_test.go.
That is where you can really test the behavior is working correctly.
Interesting feature @Crypto89 ! |
* Add Balance constraint * Fix constraint checker * fix node iteration * Add proposed allocs in the balancing * Add balancing on tg and job level * Add some more tests
* to rely on currently allocated allocations rather then on the RR nature of the iterator. The allocations per datacenter is now calculated deterministically based on the number of allocations and on the number of datacenters.
a4151ad
to
ede929f
Compare
So I've changed a couple of things here:
This last change does mean that is one of the datacenters dies with more allocations then the others, the overflow of allocations will not be rescheduled to the other datacenters. Since I personally think this is an edge case I doubt this will cause any problems in real-world usecases, however it is worth mentioning. Last but not least, the testcases do test something, but probably not everything. since it requires a little bit more knowledge of the scheduler I would appreciate someone to give me some points/append this PR. |
Closing this because the work in 0.9 with the spread stanza achieves this. |
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
This patch implements a dynamic constraint to force balancing of a job/task across multiple datacenters.
For some tasks, such as applications requiring quorum, it is desired (required) to spread them equally across all datacenters to prevent a loss of quorum when multiple allocations are running on the same DC.
Implements #517