-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix system sched constraint errors #5631
Conversation
// Register some nodes | ||
// the tag "aaaaaa" is hashed so that the nodes are processed | ||
// in an order other than good, good, bad | ||
for _, tag := range []string{"aaaaaa", "foo", "foo", "foo"} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the last tag be "bar" or something else that doesn't match?
nvmd I see that you mark the last node as ineligible below
scheduler/system_sched_test.go
Outdated
|
||
// Mark the last node as ineligible | ||
node.SchedulingEligibility = structs.NodeSchedulingIneligible | ||
// node.ComputeClass() // should only need to be updated at registration? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comment
scheduler/system_sched_test.go
Outdated
node.SchedulingEligibility = structs.NodeSchedulingIneligible | ||
// node.ComputeClass() // should only need to be updated at registration? | ||
|
||
// Make a job with a partially matching constraint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested rewording for this - make a job with a constraint that matches a subset of nodes
@@ -297,13 +297,28 @@ func (s *SystemScheduler) computePlacements(place []allocTuple) error { | |||
desired := s.plan.Annotations.DesiredTGUpdates[missing.TaskGroup.Name] | |||
desired.Place -= 1 | |||
} | |||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to add a comment on why we continue here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@preetapan I put the comment up at the top of the option == nil
where it applies to all 3 of the continue
statements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably worth commenting why why we don't increment failedTGAllocs if NodeFiltered > 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that ^ is what i was getting at - documenting that we don't increment failedTGAllocs because nodeFiltered is > 0 only when there's a constraint that didn't match, and a failed constraint condition should not be counted as a failure to place that task group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if the additional comments provide enough context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
scheduler/system_sched.go
Outdated
alloc.PreviousAllocation = missing.Alloc.ID | ||
} | ||
// If the new allocation is replacing an older allocation then we | ||
// set the record the older allocation id so that they are chained |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove "set the". Looks like an old grammar error in the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small suggestions, looks good otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - but i think a comment about failedTGAllocs
updates when NodeFiltered > 0
is still useful.
@@ -297,13 +297,28 @@ func (s *SystemScheduler) computePlacements(place []allocTuple) error { | |||
desired := s.plan.Annotations.DesiredTGUpdates[missing.TaskGroup.Name] | |||
desired.Place -= 1 | |||
} | |||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably worth commenting why why we don't increment failedTGAllocs if NodeFiltered > 0
.
83ae367
to
1a49487
Compare
1a49487
to
b122853
Compare
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Fix the system scheduler so that filtered nodes are not reported to the user as failures.
Fixes #2381 and #5169