Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add expire time for nodeInfo cache items #4669

Merged
merged 1 commit into from
Feb 16, 2022

Conversation

yaroslava-serdiuk
Copy link
Contributor

Which component this PR applies to?

cluster-autoscaler

What type of PR is this?

/kind bug

What this PR does / why we need it:

Adding expire time for nodeInfo cache items. Expire time could be set via flag --node-info-cache-expire-time, by default there is no expire time (old behavior).

Currently, item in the node info cache is updated only if there are any nodes in the node group. If node group is scaled down to 0, node info cache will store last node and use it in scale up algorithm. Thus if the last node is a wrong representative of the node group (for example it has custom taints), then Cluster Autoscaler may struggle to scale up this node group.

Added `--node-info-cache-expire-time` flag for Cluster Autoscaler to specify the time to store node info cache items.

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 7, 2022
@yaroslava-serdiuk
Copy link
Contributor Author

/assign @x13n

@yaroslava-serdiuk yaroslava-serdiuk force-pushed the node_info_cache branch 2 times, most recently from fd7c94b to 336301e Compare February 8, 2022 09:33
}

func (p *MixedTemplateNodeInfoProvider) isCacheItemExpired(added time.Time) bool {
if p.ttl == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Why do you use a pointer at all? If ttl was a regular value you wouldn't have to do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left pointer for test that mocks MixedTemplateNodeInfoProvider, so there is no need to pass a specific parameter there. However it should be handled during provider creation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you have different state for tests and a different one for actual execution. If you used a const like veryLongTime = 123456 * time.Hour in the tests, you wouldn't need special casing nil.

Copy link
Contributor Author

@yaroslava-serdiuk yaroslava-serdiuk Feb 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't say that state for test is different, a nil pointer correspond to default value — the behavior is identical as if --node-info-cache-expire-time flag is not set. Anyway we need to handle a nil pointer at same place, and I think this is the best place: https://github.com/kubernetes/autoscaler/pull/4669/files#diff-ad9d6ea8aa127e0dc3b271f9b5761df7ba18f9fd63ffd0a7f69a6886c86ef317R54
I think that if new test is introduced that doesn't test nodeInforCache directly but use it, then the writer shouldn't come up with a new veryLongTime variable.
Please, explain if my line of thought is wrong.
(Also, after your argument I used nil pointer for initialising in DefaultProcessor(), which is used not only for test.)

@x13n
Copy link
Member

x13n commented Feb 9, 2022

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 9, 2022
@yaroslava-serdiuk
Copy link
Contributor Author

/assign @MaciekPytel

@towca
Copy link
Collaborator

towca commented Feb 16, 2022

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: towca, yaroslava-serdiuk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 16, 2022
@k8s-ci-robot k8s-ci-robot merged commit 69b49ff into kubernetes:master Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants