-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NodeInfo processor to refine template-based NodeInfos #3761
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: bpineau The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
068bcf8
to
060e78d
Compare
A general comment - this strongly assumes that two NodeGroups that are "similar" according to NodeGroupSet processor are identical for scheduling purposes (except for location related labels). I think this is likely to be true in most cases, but it's still a pretty significant assumption. |
cluster-autoscaler/processors/nodeinfos/refine_node_infos_processor.go
Outdated
Show resolved
Hide resolved
cluster-autoscaler/processors/nodeinfos/refine_node_infos_processor.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this PR tries to find real or similar real nodeinfo as much as possible. Could we try the following routine.
- If it is a real nodeinfo, update nodecache.
- Else,
- If nodecache has cached real nodeinfo, use cached nodeinfo.
- Else, use the most similar real nodeinfo and update the cache.
060e78d
to
7b92a87
Compare
@MaciekPytel agreed, added a (default false) flag. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for misunderstanding. I list my understanding as follows. Please point out if anything wrong.
This PR tries to do,
If a NodeInfo is a template, find the most "similar" RealNode to replace it. But instead of comparing template node with real nodes, it compare template node with template nodes. Right?
IIUC, if this logic only influences "BalanceSimilarNodeGroups", would it be better to make it closer to "BalanceSimilarNodeGroups" logic, e.g. put this logic into processors.NodeGroupSetProcessor.FindSimilarNodeGroups(context, bestOption.NodeGroup, nodeInfos)
This logic takes place at the very beginning of RunOnce logic. As @MaciekPytel said,
this strongly assumes that two NodeGroups that are "similar" according to NodeGroupSet processor are identical for scheduling purposes
If the assumption is incorrect, it will lead to incorrect scale-up decision.
WDYT? Also @MaciekPytel .
@umialpha Yes, that PR logic description is correct. We're following suggestions from previous attempts at solving this (in particular #2892 (comment) and #3608 (comment) ). You're right about the assumption, hence the new flag to be enabled when using scale-up-from-zero (otherwise we already have real nodes and this change is no-op) and where nodeGroups having similar nodeInfos templates are similar nodeGroups. Having more accurately evaluated nodeGroups capacities (the intent of this PR) is useful in other situations than just keeping nodegroups balanced with FindSimilarNodeGroups during upscale; for instance when evaluating options to re-pack undersubscribed nodes. |
Agreed, this goes beyond just balancing. The fact that TemplateNodeInfo generally doesn't get resources exactly right is a problem that can have more impact that not matching NodeGroups for balancing purposes. Consider an example where TemplateNodeInfo overestimates memory by a little bit and a pod that won't really fit on the node would look like it fits - in this case CA would trigger an incorrect scale-up. NodeGroupSetProcessor doesn't really apply to this situation. Using a template from a real node is generally more likely to be correct and avoid problems like this if you are sure the new node will really be the same (except for location-related stuff). As per my previous comment I don't think we should make this assumption for the users, but if you happen to know that it's true in your env what this PR does makes a lot of sense to me. As such I'm ok with adding this as an opt-in feature (note: I haven't done a detailed review, just talking about general approach). |
7b92a87
to
449879e
Compare
@mwielgus (and/or @MaciekPytel ) mind taking an other look? it's updated according to the comments |
449879e
to
32d07a0
Compare
cluster-autoscaler/processors/nodeinfos/refine_node_infos_processor.go
Outdated
Show resolved
Hide resolved
Sorry for delay, I left a few minor comments. Overall, looks good, they're all pretty minor comments. |
32d07a0
to
5451133
Compare
Thanks for the review @MaciekPytel ! Updated the PR accordingly, please take an other look |
Comparing synthetic NodeInfos obtained from nodegroup's TemplateInfo() to NodeInfos obtained from real-world nodes is bound to fail, even with kube reservations provided through nodegroups labels/annotations (for instance: kernel mem reservation is hard to predict). This makes `balance-similar-node-groups` likely to misbehave when `scale-up-from-zero` is enabled (and a first nodegroup gets a real node), for instance. Following [Maciek Pytel suggestion](kubernetes#3608 (comment)) (from discussions on a previous attempt at solving this), we can implement a NodeInfo Processor that would improve template-generated NodeInfos whenever a node was created off a similar nodegroup. We're storing node's virtual origin through machineid, which works fine but is a bit ugly (suggestions welcome). Tested this solves balance-similar-node-groups + scale-up-from-zero, with various instance types on AWS and GCP. Previous attempts to solve that issue/discussions: * kubernetes#2892 (comment) * kubernetes#3608 (comment)
5451133
to
a153405
Compare
@MaciekPytel you guessed right; I think we can close this one |
Comparing synthetic NodeInfos obtained from nodegroup's TemplateInfo() to NodeInfos obtained from real-world nodes is bound to fail, even with kube reservations provided through nodegroups labels/annotations (for instance: kernel mem reservation is hard to predict, in particular on AWS instances).
This makes
balance-similar-node-groups
likely to misbehave whenscale-up-from-zero
is enabled (and a first nodegroup gets a real node), for instance.Following Maciek suggestion (from discussions on a previous attempt at solving this), we can implement a NodeInfo Processor that would improve template-generated NodeInfos whenever a node was created off a similar nodegroup.
We're storing node's virtual origin through machineid, which works fine but is a bit ugly (suggestions welcome).
Tested this solves balance-similar-node-groups + scale-up-from-zero, with various instance types on AWS and GCP.
Previous attempts to solve that issue/discussions: