Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NodeInfo processor to refine template-based NodeInfos #3761

Closed
wants to merge 1 commit into from

Conversation

bpineau
Copy link
Contributor

@bpineau bpineau commented Dec 14, 2020

Comparing synthetic NodeInfos obtained from nodegroup's TemplateInfo() to NodeInfos obtained from real-world nodes is bound to fail, even with kube reservations provided through nodegroups labels/annotations (for instance: kernel mem reservation is hard to predict, in particular on AWS instances).

This makes balance-similar-node-groups likely to misbehave when scale-up-from-zero is enabled (and a first nodegroup gets a real node), for instance.

Following Maciek suggestion (from discussions on a previous attempt at solving this), we can implement a NodeInfo Processor that would improve template-generated NodeInfos whenever a node was created off a similar nodegroup.

We're storing node's virtual origin through machineid, which works fine but is a bit ugly (suggestions welcome).

Tested this solves balance-similar-node-groups + scale-up-from-zero, with various instance types on AWS and GCP.

Previous attempts to solve that issue/discussions:

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 14, 2020
@bpineau bpineau changed the title NodeInfo processor to Refine synthetic NodeInfos NodeInfo processor to refine template-based NodeInfos Dec 14, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bpineau
To complete the pull request process, please assign feiskyer after the PR has been reviewed.
You can assign the PR to them by writing /assign @feiskyer in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@MaciekPytel
Copy link
Contributor

A general comment - this strongly assumes that two NodeGroups that are "similar" according to NodeGroupSet processor are identical for scheduling purposes (except for location related labels). I think this is likely to be true in most cases, but it's still a pretty significant assumption.
If the NodeGroups in question are not identical this assumption can lead to incorrect scale-up decision (since the template used in binpacking would differ from actual node). This makes me think that enabling the processor should probably be controlled via a flag. WDYT?

Copy link
Contributor

@umialpha umialpha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this PR tries to find real or similar real nodeinfo as much as possible. Could we try the following routine.

  1. If it is a real nodeinfo, update nodecache.
  2. Else,
    • If nodecache has cached real nodeinfo, use cached nodeinfo.
    • Else, use the most similar real nodeinfo and update the cache.

@bpineau bpineau force-pushed the refine-nodeinfo-processor branch from 060e78d to 7b92a87 Compare December 23, 2020 13:30
@bpineau
Copy link
Contributor Author

bpineau commented Dec 23, 2020

@MaciekPytel agreed, added a (default false) flag.

Copy link
Contributor

@umialpha umialpha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for misunderstanding. I list my understanding as follows. Please point out if anything wrong.

This PR tries to do,
If a NodeInfo is a template, find the most "similar" RealNode to replace it. But instead of comparing template node with real nodes, it compare template node with template nodes. Right?

IIUC, if this logic only influences "BalanceSimilarNodeGroups", would it be better to make it closer to "BalanceSimilarNodeGroups" logic, e.g. put this logic into processors.NodeGroupSetProcessor.FindSimilarNodeGroups(context, bestOption.NodeGroup, nodeInfos)

This logic takes place at the very beginning of RunOnce logic. As @MaciekPytel said,

this strongly assumes that two NodeGroups that are "similar" according to NodeGroupSet processor are identical for scheduling purposes

If the assumption is incorrect, it will lead to incorrect scale-up decision.

WDYT? Also @MaciekPytel .

@bpineau
Copy link
Contributor Author

bpineau commented Dec 24, 2020

@umialpha Yes, that PR logic description is correct. We're following suggestions from previous attempts at solving this (in particular #2892 (comment) and #3608 (comment) ).

You're right about the assumption, hence the new flag to be enabled when using scale-up-from-zero (otherwise we already have real nodes and this change is no-op) and where nodeGroups having similar nodeInfos templates are similar nodeGroups.

Having more accurately evaluated nodeGroups capacities (the intent of this PR) is useful in other situations than just keeping nodegroups balanced with FindSimilarNodeGroups during upscale; for instance when evaluating options to re-pack undersubscribed nodes.

@MaciekPytel
Copy link
Contributor

Agreed, this goes beyond just balancing. The fact that TemplateNodeInfo generally doesn't get resources exactly right is a problem that can have more impact that not matching NodeGroups for balancing purposes. Consider an example where TemplateNodeInfo overestimates memory by a little bit and a pod that won't really fit on the node would look like it fits - in this case CA would trigger an incorrect scale-up. NodeGroupSetProcessor doesn't really apply to this situation.

Using a template from a real node is generally more likely to be correct and avoid problems like this if you are sure the new node will really be the same (except for location-related stuff). As per my previous comment I don't think we should make this assumption for the users, but if you happen to know that it's true in your env what this PR does makes a lot of sense to me. As such I'm ok with adding this as an opt-in feature (note: I haven't done a detailed review, just talking about general approach).

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 14, 2021
@bpineau bpineau force-pushed the refine-nodeinfo-processor branch from 7b92a87 to 449879e Compare January 18, 2021 15:10
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 18, 2021
@bpineau
Copy link
Contributor Author

bpineau commented Jan 18, 2021

@mwielgus (and/or @MaciekPytel ) mind taking an other look? it's updated according to the comments

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 21, 2021
@bpineau bpineau force-pushed the refine-nodeinfo-processor branch from 449879e to 32d07a0 Compare January 25, 2021 11:32
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 25, 2021
@MaciekPytel
Copy link
Contributor

Sorry for delay, I left a few minor comments. Overall, looks good, they're all pretty minor comments.

@bpineau bpineau force-pushed the refine-nodeinfo-processor branch from 32d07a0 to 5451133 Compare February 26, 2021 09:29
@bpineau
Copy link
Contributor Author

bpineau commented Feb 26, 2021

Thanks for the review @MaciekPytel ! Updated the PR accordingly, please take an other look

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 9, 2021
Comparing synthetic NodeInfos obtained from nodegroup's TemplateInfo()
to NodeInfos obtained from real-world nodes is bound to fail, even with
kube reservations provided through nodegroups labels/annotations
(for instance: kernel mem reservation is hard to predict).

This makes `balance-similar-node-groups` likely to misbehave when
`scale-up-from-zero` is enabled (and a first nodegroup gets a real
node), for instance.

Following [Maciek Pytel suggestion](kubernetes#3608 (comment))
(from discussions on a previous attempt at solving this), we can
implement a NodeInfo Processor that would improve template-generated
NodeInfos whenever a node was created off a similar nodegroup.

We're storing node's virtual origin through machineid, which works
fine but is a bit ugly (suggestions welcome).

Tested this solves balance-similar-node-groups + scale-up-from-zero,
with various instance types on AWS and GCP.

Previous attempts to solve that issue/discussions:
* kubernetes#2892 (comment)
* kubernetes#3608 (comment)
@bpineau bpineau force-pushed the refine-nodeinfo-processor branch from 5451133 to a153405 Compare April 19, 2021 13:40
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 19, 2021
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 2, 2021
@MaciekPytel
Copy link
Contributor

@bpineau I suspect the motivation for #4191 is to replace this with a version based on a new interface. Is that correct? Or do you still want to include this one?

@bpineau
Copy link
Contributor Author

bpineau commented Aug 16, 2021

@MaciekPytel you guessed right; I think we can close this one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants