Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get capi targetsize from cache #5025

Merged
merged 1 commit into from
Jul 18, 2022

Conversation

enxebre
Copy link
Member

@enxebre enxebre commented Jul 13, 2022

Which component this PR applies to?

Cluster autoscaler - cluster api.

What type of PR is this?

This ensured that access to replicas during scale down operations were never stale by accessing the API server #3104.

This honoured that behaviour while moving to unstructured client #3312.

This regressed that behaviour while trying to reduce the API server load #4443.

This put back the never stale replicas behaviour at the cost of loading back the API server #4634.

Currently on e.g a 48 minutes cluster with no scaling activity it does 1.4k get request to the scale subresource.
This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets TargetSize which is called on every autoscaling cluster state loop from come from cache while getting fresh replicas at the time of perform the operations.

Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.

/area provider/cluster-api

/kind bug
/kind cleanup

@k8s-ci-robot k8s-ci-robot added area/provider/cluster-api Issues or PRs related to Cluster API provider kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 13, 2022
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jul 13, 2022
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 13, 2022
@enxebre
Copy link
Member Author

enxebre commented Jul 13, 2022

cc @JoelSpeed @elmiko

Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sense to me, thanks for the update @enxebre

i just have a quick question about

Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.

did you have something specific in mind?

Copy link

@arunmk arunmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a minor comment

return 0, errors.Wrap(err, "error getting replica count")
}
if !found {
replicas = 0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we set replicas = ng.scalableResource.MinSize() here?

Copy link
Contributor

@elmiko elmiko Jul 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that seems reasonable to me, i guess we still need to default to 0 if there is an issue with minsize though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I made it an error, as I don't think it is a valid case for it to not be found.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's probably better

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: arunmk, enxebre

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

This ensured that access to replicas during scale down operations were never stale by accessing the API server kubernetes#3104.
This honoured that behaviour while moving to unstructured client kubernetes#3312.
This regressed that behaviour while trying to reduce the API server load kubernetes#4443.
This put back the never stale replicas behaviour at the cost of loading back the API server kubernetes#4634.

Currently on e.g a 48 minutes cluster it does 1.4k get request to the scale subresource.
This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache.

Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.
@enxebre enxebre force-pushed the get-targetSize-fromcache branch from a80af8e to b2f1823 Compare July 13, 2022 18:26
@elmiko
Copy link
Contributor

elmiko commented Jul 13, 2022

LGTM, i'll let @arunmk add the label

@arunmk
Copy link

arunmk commented Jul 14, 2022

/lgtm

/approve

@JoelSpeed
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 18, 2022
@k8s-ci-robot k8s-ci-robot merged commit 028584a into kubernetes:master Jul 18, 2022
navinjoy pushed a commit to navinjoy/autoscaler that referenced this pull request Oct 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler area/provider/cluster-api Issues or PRs related to Cluster API provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants