Ensure ClusterAPI DeleteNodes accounts for out of band changes scale #4634

JoelSpeed · 2022-01-21T13:40:09Z

Which component this PR applies to?

cluster-autoscaler

What type of PR is this?

/kind bug

What this PR does / why we need it:

Because the autoscaler assumes it can delete nodes in parallel, it fetches nodegroups for each node in separate go routines and then instructs each nodegroup to delete a single node. Because we don't share the nodegroup across go routines, the cached replica count in the scalableresource can become stale and as such, if the autoscaler attempts to scale down multiple nodes at a time, the cluster api provider only actually removes a single node.

To prevent this, we must ensure we have a fresh replica count for every scale down attempt.

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

elmiko

thanks for this @JoelSpeed
/approve

elmiko · 2022-01-21T14:23:04Z

/area/provider/custerapi

mrajashree

looks great

mrajashree · 2022-01-21T15:32:28Z

cluster-autoscaler/cloudprovider/clusterapi/clusterapi_unstructured.go

+	}
+
+	if s == nil {
+		return 0, fmt.Errorf("unknown %s %s/%s", r.Kind(), r.Namespace(), r.Name())


super nit: can this error message be a bit more elaborate, for instance can it say it failed fetching the replicas?

Sure, I'll push an update shortly

mrajashree · 2022-01-21T15:39:30Z

/area provider/cluster-api

enxebre · 2022-01-21T15:44:21Z

did we regress here somehow? #3104

JoelSpeed · 2022-01-21T16:00:55Z

Looks like we regressed when we went from structured to unstructured 🤔

Note we only picked this up because our CI tests on OpenShift started failing with annotations leftover on machines

enxebre · 2022-01-21T16:04:41Z

/lgtm
/hold
feel free to cancel the hold as you see fit @JoelSpeed

Because the autoscaler assumes it can delete nodes in parallel, it fetches nodegroups for each node in separate go routines and then instructs each nodegroup to delete a single node. Because we don't share the nodegroup across go routines, the cached replica count in the scalableresource can become stale and as such, if the autoscaler attempts to scale down multiple nodes at a time, the cluster api provider only actually removes a single node. To prevent this, we must ensure we have a fresh replica count for every scale down attempt.

alexander-demicev

/lgtm

k8s-ci-robot · 2022-01-21T16:27:29Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alexander-demichev, elmiko, JoelSpeed, mrajashree

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cluster-autoscaler/cloudprovider/clusterapi/OWNERS~~ [elmiko]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

JoelSpeed · 2022-01-21T16:28:30Z

Think I've addressed all the feedback, @mrajashree if you're happy with the error message update, would you hold cancel? :)

mrajashree · 2022-01-21T16:31:13Z

/unhold

This ensured that access to replicas during scale down operations were never stale by accessing the API server kubernetes#3104. This honoured that behaviour while moving to unstructured client kubernetes#3312. This regressed that behaviour while trying to reduce the API server load kubernetes#4443. This put back the never stale replicas behaviour at the cost of loading back the API server kubernetes#4634. This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache. Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.

This ensured that access to replicas during scale down operations were never stale by accessing the API server kubernetes#3104. This honoured that behaviour while moving to unstructured client kubernetes#3312. This regressed that behaviour while trying to reduce the API server load kubernetes#4443. This put back the never stale replicas behaviour at the cost of loading back the API server kubernetes#4634. Currently on e.g a 48 minutes cluster it does 1.4k get request to the scale subresource. This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache. Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 21, 2022

k8s-ci-robot requested review from mrajashree and shysank January 21, 2022 13:40

elmiko reviewed Jan 21, 2022

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 21, 2022

JoelSpeed force-pushed the fresh-replicas branch from 33afe41 to d8bf578 Compare January 21, 2022 14:07

mrajashree approved these changes Jan 21, 2022

View reviewed changes

k8s-ci-robot added the area/provider/cluster-api Issues or PRs related to Cluster API provider label Jan 21, 2022

JoelSpeed force-pushed the fresh-replicas branch from d8bf578 to 62f7d5e Compare January 21, 2022 15:43

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 21, 2022

k8s-ci-robot assigned enxebre Jan 21, 2022

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 21, 2022

JoelSpeed force-pushed the fresh-replicas branch from 62f7d5e to 9f670d4 Compare January 21, 2022 16:08

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 21, 2022

alexander-demicev approved these changes Jan 21, 2022

View reviewed changes

k8s-ci-robot assigned alexander-demicev Jan 21, 2022

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 21, 2022

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 21, 2022

k8s-ci-robot merged commit 75207a2 into kubernetes:master Jan 21, 2022

enxebre mentioned this pull request Jul 13, 2022

Get capi targetsize from cache #5025

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure ClusterAPI DeleteNodes accounts for out of band changes scale #4634

Ensure ClusterAPI DeleteNodes accounts for out of band changes scale #4634

JoelSpeed commented Jan 21, 2022

elmiko left a comment

elmiko commented Jan 21, 2022

mrajashree left a comment

mrajashree Jan 21, 2022

JoelSpeed Jan 21, 2022

mrajashree commented Jan 21, 2022

enxebre commented Jan 21, 2022

JoelSpeed commented Jan 21, 2022 •

edited

Loading

enxebre commented Jan 21, 2022

alexander-demicev left a comment

k8s-ci-robot commented Jan 21, 2022

JoelSpeed commented Jan 21, 2022

mrajashree commented Jan 21, 2022

Ensure ClusterAPI DeleteNodes accounts for out of band changes scale #4634

Ensure ClusterAPI DeleteNodes accounts for out of band changes scale #4634

Conversation

JoelSpeed commented Jan 21, 2022

Which component this PR applies to?

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

elmiko left a comment

Choose a reason for hiding this comment

elmiko commented Jan 21, 2022

mrajashree left a comment

Choose a reason for hiding this comment

mrajashree Jan 21, 2022

Choose a reason for hiding this comment

JoelSpeed Jan 21, 2022

Choose a reason for hiding this comment

mrajashree commented Jan 21, 2022

enxebre commented Jan 21, 2022

JoelSpeed commented Jan 21, 2022 • edited Loading

enxebre commented Jan 21, 2022

alexander-demicev left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Jan 21, 2022

JoelSpeed commented Jan 21, 2022

mrajashree commented Jan 21, 2022

JoelSpeed commented Jan 21, 2022 •

edited

Loading