Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clusterapi] Rely on replica count found in unstructuredScalableResource #4443

Merged
merged 2 commits into from
Dec 14, 2021

Conversation

codablock
Copy link
Contributor

Instead of retrieving it each time from k8s, which easily causes client-side
throttling, which in turn causes each autoscaler run to take multiple
seconds even if only a small number of NodeGroups is involved and nothing
is to do.

Instead of retrieving it each time from k8s, which easily causes client-side
throttling, which in turn causes each autoscaler run to take multiple
seconds even if only a small number of NodeGroups is involved and nothing
is to do.
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 4, 2021
@@ -119,6 +114,11 @@ func (r unstructuredScalableResource) SetSize(nreplicas int) error {

s.Spec.Replicas = int32(nreplicas)
_, updateErr := r.controller.managementScaleClient.Scales(r.Namespace()).Update(context.TODO(), gvr.GroupResource(), s, metav1.UpdateOptions{})

if updateErr == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I see that we're taking an accounting of the replica count and storing it every time we change the size from within cluster-autoscaler.

What if an out-of-band (i.e., not cluster-autoscaler-originating) replica change occurs. Won't our local replica count be stale in that event?

Lemme know if I'm not understanding everything here.

cc @elmiko @marwanad

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NodeGroups and underlying objects are temporary and re-queries/re-built every time they are needed, e.g. by calling machineController.nodeGroups(). This means that the next run of the autoscaler loop will definitely pick up up-to-date objects. Internally, watches/caches/informers are used to keep the local cache up-to-date.

This is based on my understanding after a few hours of debugging cluster-autoscaler. I'm not very familiar with the code-base and also far away from being a full-blown go-coder, so take my analysis/understanding with a grain of salt :)

@elmiko
Copy link
Contributor

elmiko commented Nov 8, 2021

thanks for the ping @jackfrancis , i'm just starting to take a look at this.

also, thanks for the contribution @codablock =)

Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this change generally makes sense to me, although i found the unit tests a little confusing at first.

i have some similar concerns to what @jackfrancis raised, but i would like to test drive this patch a little to see how it behaves.

@codablock
Copy link
Contributor Author

@elmiko I agree on the confusing nature of the unit test changes...but I couldn't find a better way to make the tests reliable, especially with my limited knowledge about the code base and architecture.

@elmiko
Copy link
Contributor

elmiko commented Nov 23, 2021

i've been running some local tests with a capi-kubemark cluster and i think this PR is probably safe for us to merge, and also will reduce the calls to the API server as @codablock noted.

here's what i did

  1. create a machinedeployment for inclusion in autoscaling, min size 1, max size 5
  2. manually increase replicas to 100
  3. watch
    in this case the autoscaler became very unhappy, it did see that the node group was too large but had trouble scaling it down because of the size differential and the amount of time i was willing to wait. this probably bears more investigation in a scalability sense.

so, i ran a second test with more reasonable limits

  1. create a machinedeployment for inclusion in autoscaling, min size 1, max size 5
  2. manually increase replicas to 6
  3. watch
    the autoscaler properly detected the replica size change (on the next node group acquisition), and then brought the replicas size down to 5.

i also tried the same procedure with an unmodified autoscaler and it seems to behave in a very similar manner, albeit with far fewer client side throttling events.

i'm fine to accept this change, thanks for giving us a chance to test it @codablock . i would like to get @jackfrancis to take another look as well.
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 23, 2021
@elmiko
Copy link
Contributor

elmiko commented Nov 23, 2021

fwiw, i think we have a bigger issue when it comes to a user changing the replicas outside of the autoscaler. it appears that if the user creates a large enough difference that the autoscaler stops being able to properly set the desired replicas because it is beyond the maximum setting.

for example

E1123 21:43:51.183228  644237 scale_down.go:1146] Problem with empty node deletion: failed to delete km-wl-kubemark-md-0-gh8jm: size increase too large - desired:99 max:5

@hardikdr
Copy link
Member

Thanks @codablock for the PR, looks good.

/lgtm

@hardikdr
Copy link
Member

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: codablock, hardikdr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 14, 2021
@k8s-ci-robot k8s-ci-robot merged commit 12efcce into kubernetes:master Dec 14, 2021
@codablock codablock deleted the fix-rate-limitting branch December 15, 2021 09:54
enxebre added a commit to enxebre/autoscaler that referenced this pull request Jul 13, 2022
This ensured that access to replicas during scale down operations were never stale by accessing the API server kubernetes#3104.
This honoured that behaviour while moving to unstructured client kubernetes#3312.
This regressed that behaviour while trying to reduce the API server load kubernetes#4443.
This put back the never stale replicas behaviour at the cost of loading back the API server kubernetes#4634.

This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache.

Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.
enxebre added a commit to enxebre/autoscaler that referenced this pull request Jul 13, 2022
This ensured that access to replicas during scale down operations were never stale by accessing the API server kubernetes#3104.
This honoured that behaviour while moving to unstructured client kubernetes#3312.
This regressed that behaviour while trying to reduce the API server load kubernetes#4443.
This put back the never stale replicas behaviour at the cost of loading back the API server kubernetes#4634.

Currently on e.g a 48 minutes cluster it does 1.4k get request to the scale subresource.
This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache.

Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.
enxebre added a commit to enxebre/autoscaler that referenced this pull request Jul 13, 2022
This ensured that access to replicas during scale down operations were never stale by accessing the API server kubernetes#3104.
This honoured that behaviour while moving to unstructured client kubernetes#3312.
This regressed that behaviour while trying to reduce the API server load kubernetes#4443.
This put back the never stale replicas behaviour at the cost of loading back the API server kubernetes#4634.

Currently on e.g a 48 minutes cluster it does 1.4k get request to the scale subresource.
This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache.

Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.
navinjoy pushed a commit to navinjoy/autoscaler that referenced this pull request Oct 26, 2022
This ensured that access to replicas during scale down operations were never stale by accessing the API server kubernetes#3104.
This honoured that behaviour while moving to unstructured client kubernetes#3312.
This regressed that behaviour while trying to reduce the API server load kubernetes#4443.
This put back the never stale replicas behaviour at the cost of loading back the API server kubernetes#4634.

Currently on e.g a 48 minutes cluster it does 1.4k get request to the scale subresource.
This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache.

Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants