[clusterapi] Rely on replica count found in unstructuredScalableResource #4443

codablock · 2021-11-04T13:15:32Z

Instead of retrieving it each time from k8s, which easily causes client-side
throttling, which in turn causes each autoscaler run to take multiple
seconds even if only a small number of NodeGroups is involved and nothing
is to do.

Instead of retrieving it each time from k8s, which easily causes client-side throttling, which in turn causes each autoscaler run to take multiple seconds even if only a small number of NodeGroups is involved and nothing is to do.

jackfrancis · 2021-11-05T19:29:51Z

cluster-autoscaler/cloudprovider/clusterapi/clusterapi_unstructured.go

@@ -119,6 +114,11 @@ func (r unstructuredScalableResource) SetSize(nreplicas int) error {

 	s.Spec.Replicas = int32(nreplicas)
 	_, updateErr := r.controller.managementScaleClient.Scales(r.Namespace()).Update(context.TODO(), gvr.GroupResource(), s, metav1.UpdateOptions{})
+
+	if updateErr == nil {


So I see that we're taking an accounting of the replica count and storing it every time we change the size from within cluster-autoscaler.

What if an out-of-band (i.e., not cluster-autoscaler-originating) replica change occurs. Won't our local replica count be stale in that event?

Lemme know if I'm not understanding everything here.

cc @elmiko @marwanad

NodeGroups and underlying objects are temporary and re-queries/re-built every time they are needed, e.g. by calling machineController.nodeGroups(). This means that the next run of the autoscaler loop will definitely pick up up-to-date objects. Internally, watches/caches/informers are used to keep the local cache up-to-date.

This is based on my understanding after a few hours of debugging cluster-autoscaler. I'm not very familiar with the code-base and also far away from being a full-blown go-coder, so take my analysis/understanding with a grain of salt :)

elmiko · 2021-11-08T14:57:42Z

thanks for the ping @jackfrancis , i'm just starting to take a look at this.

also, thanks for the contribution @codablock =)

elmiko

i think this change generally makes sense to me, although i found the unit tests a little confusing at first.

i have some similar concerns to what @jackfrancis raised, but i would like to test drive this patch a little to see how it behaves.

codablock · 2021-11-16T22:49:09Z

@elmiko I agree on the confusing nature of the unit test changes...but I couldn't find a better way to make the tests reliable, especially with my limited knowledge about the code base and architecture.

elmiko · 2021-11-23T21:41:53Z

i've been running some local tests with a capi-kubemark cluster and i think this PR is probably safe for us to merge, and also will reduce the calls to the API server as @codablock noted.

here's what i did

create a machinedeployment for inclusion in autoscaling, min size 1, max size 5
manually increase replicas to 100
watch
in this case the autoscaler became very unhappy, it did see that the node group was too large but had trouble scaling it down because of the size differential and the amount of time i was willing to wait. this probably bears more investigation in a scalability sense.

so, i ran a second test with more reasonable limits

create a machinedeployment for inclusion in autoscaling, min size 1, max size 5
manually increase replicas to 6
watch
the autoscaler properly detected the replica size change (on the next node group acquisition), and then brought the replicas size down to 5.

i also tried the same procedure with an unmodified autoscaler and it seems to behave in a very similar manner, albeit with far fewer client side throttling events.

i'm fine to accept this change, thanks for giving us a chance to test it @codablock . i would like to get @jackfrancis to take another look as well.
/lgtm

elmiko · 2021-11-23T21:44:11Z

fwiw, i think we have a bigger issue when it comes to a user changing the replicas outside of the autoscaler. it appears that if the user creates a large enough difference that the autoscaler stops being able to properly set the desired replicas because it is beyond the maximum setting.

for example

E1123 21:43:51.183228  644237 scale_down.go:1146] Problem with empty node deletion: failed to delete km-wl-kubemark-md-0-gh8jm: size increase too large - desired:99 max:5

hardikdr · 2021-12-14T18:15:39Z

Thanks @codablock for the PR, looks good.

/lgtm

hardikdr · 2021-12-14T18:44:26Z

/approve

k8s-ci-robot · 2021-12-14T18:45:01Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: codablock, hardikdr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cluster-autoscaler/cloudprovider/clusterapi/OWNERS~~ [hardikdr]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

This ensured that access to replicas during scale down operations were never stale by accessing the API server kubernetes#3104. This honoured that behaviour while moving to unstructured client kubernetes#3312. This regressed that behaviour while trying to reduce the API server load kubernetes#4443. This put back the never stale replicas behaviour at the cost of loading back the API server kubernetes#4634. This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache. Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.

This ensured that access to replicas during scale down operations were never stale by accessing the API server kubernetes#3104. This honoured that behaviour while moving to unstructured client kubernetes#3312. This regressed that behaviour while trying to reduce the API server load kubernetes#4443. This put back the never stale replicas behaviour at the cost of loading back the API server kubernetes#4634. Currently on e.g a 48 minutes cluster it does 1.4k get request to the scale subresource. This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache. Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 4, 2021

k8s-ci-robot requested review from hardikdr and jackfrancis November 4, 2021 13:15

codablock force-pushed the fix-rate-limitting branch from 2715058 to 3fa8e61 Compare November 4, 2021 13:20

Fix tests

897c208

codablock force-pushed the fix-rate-limitting branch from 3fa8e61 to 897c208 Compare November 4, 2021 13:40

jackfrancis reviewed Nov 5, 2021

View reviewed changes

jbartosik added the area/cluster-autoscaler label Nov 9, 2021

elmiko reviewed Nov 16, 2021

View reviewed changes

k8s-ci-robot assigned elmiko Nov 23, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 23, 2021

k8s-ci-robot assigned hardikdr Dec 14, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 14, 2021

k8s-ci-robot merged commit 12efcce into kubernetes:master Dec 14, 2021

codablock deleted the fix-rate-limitting branch December 15, 2021 09:54

enxebre mentioned this pull request Jul 13, 2022

Get capi targetsize from cache #5025

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[clusterapi] Rely on replica count found in unstructuredScalableResource #4443

[clusterapi] Rely on replica count found in unstructuredScalableResource #4443

codablock commented Nov 4, 2021

jackfrancis Nov 5, 2021

codablock Nov 5, 2021

elmiko commented Nov 8, 2021

elmiko left a comment

codablock commented Nov 16, 2021

elmiko commented Nov 23, 2021

elmiko commented Nov 23, 2021

hardikdr commented Dec 14, 2021

hardikdr commented Dec 14, 2021

k8s-ci-robot commented Dec 14, 2021

[clusterapi] Rely on replica count found in unstructuredScalableResource #4443

[clusterapi] Rely on replica count found in unstructuredScalableResource #4443

Conversation

codablock commented Nov 4, 2021

jackfrancis Nov 5, 2021

Choose a reason for hiding this comment

codablock Nov 5, 2021

Choose a reason for hiding this comment

elmiko commented Nov 8, 2021

elmiko left a comment

Choose a reason for hiding this comment

codablock commented Nov 16, 2021

elmiko commented Nov 23, 2021

elmiko commented Nov 23, 2021

hardikdr commented Dec 14, 2021

hardikdr commented Dec 14, 2021

k8s-ci-robot commented Dec 14, 2021