-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't cache NodeInfo for recently Ready nodes #4641
Conversation
/assign @yaroslava-serdiuk |
@x13n: GitHub didn't allow me to assign the following users: yaroslava-serdiuk. Note that only kubernetes members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall lgtm, left a few minor comments.
@@ -56,6 +59,7 @@ func (p *MixedTemplateNodeInfoProvider) Process(ctx *context.AutoscalingContext, | |||
// TODO(mwielgus): Review error policy - sometimes we may continue with partial errors. | |||
result := make(map[string]*schedulerframework.NodeInfo) | |||
seenGroups := make(map[string]bool) | |||
now := time.Now() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This is called early enough in CA loop that it probably doesn't matter as much, but maybe passing currentTime from RunOnce() would be more consistent? Since CA loop operates on a snapshot we generally use timestamp of loop start as 'now' for this type of checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -90,7 +94,7 @@ func (p *MixedTemplateNodeInfoProvider) Process(ctx *context.AutoscalingContext, | |||
|
|||
for _, node := range nodes { | |||
// Broken nodes might have some stuff missing. Skipping. | |||
if !kube_util.IsNodeReadyAndSchedulable(node) { | |||
if !isNodeGoodForCaching(node, now) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're using this check to see if a node can be used as a template node for this loop (processNode call that you're skipping adds the node to the result), not just if it can be cached. That may actually be the right thing to do (node without DS doesn't make for a good template), but the function name is misleading. I'd either rename the function or, if we think this check should only apply to caching, move it check to L104 where we actually handle caching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There's a race condition between DaemonSet pods getting scheduled to a new node and Cluster Autoscaler caching that node for the sake of predicting future nodes in a given node group. We can reduce the risk of missing some DaemonSet by providing a grace period before accepting nodes in the cache. 1 minute should be more than enough, except for some pathological edge cases.
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: MaciekPytel, x13n The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Which component this PR applies to?
cluster-autoscaler
What type of PR is this?
/kind bug
What this PR does / why we need it:
There's a race condition between DaemonSet pods getting scheduled to a
new node and Cluster Autoscaler caching that node for the sake of
predicting future nodes in a given node group. We can reduce the risk of
missing some DaemonSet by providing a grace period before accepting nodes in the
cache. 1 minute should be more than enough, except for some pathological
edge cases.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
/assign @MaciekPytel