Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix system sched constraint errors #5631

Merged
merged 5 commits into from
May 6, 2019

Conversation

langmartin
Copy link
Contributor

Fix the system scheduler so that filtered nodes are not reported to the user as failures.
Fixes #2381 and #5169

// Register some nodes
// the tag "aaaaaa" is hashed so that the nodes are processed
// in an order other than good, good, bad
for _, tag := range []string{"aaaaaa", "foo", "foo", "foo"} {
Copy link
Contributor

@preetapan preetapan Apr 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the last tag be "bar" or something else that doesn't match?
nvmd I see that you mark the last node as ineligible below


// Mark the last node as ineligible
node.SchedulingEligibility = structs.NodeSchedulingIneligible
// node.ComputeClass() // should only need to be updated at registration?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove comment

node.SchedulingEligibility = structs.NodeSchedulingIneligible
// node.ComputeClass() // should only need to be updated at registration?

// Make a job with a partially matching constraint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested rewording for this - make a job with a constraint that matches a subset of nodes

@@ -297,13 +297,28 @@ func (s *SystemScheduler) computePlacements(place []allocTuple) error {
desired := s.plan.Annotations.DesiredTGUpdates[missing.TaskGroup.Name]
desired.Place -= 1
}
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to add a comment on why we continue here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@preetapan I put the comment up at the top of the option == nil where it applies to all 3 of the continue statements

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably worth commenting why why we don't increment failedTGAllocs if NodeFiltered > 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that ^ is what i was getting at - documenting that we don't increment failedTGAllocs because nodeFiltered is > 0 only when there's a constraint that didn't match, and a failed constraint condition should not be counted as a failure to place that task group.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if the additional comments provide enough context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm :shipit:

alloc.PreviousAllocation = missing.Alloc.ID
}
// If the new allocation is replacing an older allocation then we
// set the record the older allocation id so that they are chained
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove "set the". Looks like an old grammar error in the comment

Copy link
Contributor

@preetapan preetapan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small suggestions, looks good otherwise

@langmartin langmartin requested a review from preetapan April 30, 2019 20:38
Copy link
Contributor

@notnoop notnoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - but i think a comment about failedTGAllocs updates when NodeFiltered > 0 is still useful.

@@ -297,13 +297,28 @@ func (s *SystemScheduler) computePlacements(place []allocTuple) error {
desired := s.plan.Annotations.DesiredTGUpdates[missing.TaskGroup.Name]
desired.Place -= 1
}
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably worth commenting why why we don't increment failedTGAllocs if NodeFiltered > 0.

@langmartin langmartin force-pushed the b-system-sched-constraint-errors branch from 83ae367 to 1a49487 Compare May 1, 2019 14:22
@langmartin langmartin force-pushed the b-system-sched-constraint-errors branch from 1a49487 to b122853 Compare May 1, 2019 16:25
@langmartin langmartin merged commit c75357c into master May 6, 2019
@langmartin langmartin deleted the b-system-sched-constraint-errors branch May 6, 2019 16:04
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Output of "nomad run" seems wrong for system job with constraints.
3 participants