Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure ClusterAPI DeleteNodes accounts for out of band changes scale #4634

Merged
merged 1 commit into from
Jan 21, 2022

Conversation

JoelSpeed
Copy link
Contributor

Which component this PR applies to?

cluster-autoscaler

What type of PR is this?

/kind bug

What this PR does / why we need it:

Because the autoscaler assumes it can delete nodes in parallel, it fetches nodegroups for each node in separate go routines and then instructs each nodegroup to delete a single node. Because we don't share the nodegroup across go routines, the cached replica count in the scalableresource can become stale and as such, if the autoscaler attempts to scale down multiple nodes at a time, the cluster api provider only actually removes a single node.

To prevent this, we must ensure we have a fresh replica count for every scale down attempt.

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 21, 2022
Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this @JoelSpeed
/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 21, 2022
@elmiko
Copy link
Contributor

elmiko commented Jan 21, 2022

/area/provider/custerapi

Copy link

@mrajashree mrajashree left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great

}

if s == nil {
return 0, fmt.Errorf("unknown %s %s/%s", r.Kind(), r.Namespace(), r.Name())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: can this error message be a bit more elaborate, for instance can it say it failed fetching the replicas?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll push an update shortly

@mrajashree
Copy link

/area provider/cluster-api

@k8s-ci-robot k8s-ci-robot added the area/provider/cluster-api Issues or PRs related to Cluster API provider label Jan 21, 2022
@enxebre
Copy link
Member

enxebre commented Jan 21, 2022

did we regress here somehow? #3104

@JoelSpeed
Copy link
Contributor Author

JoelSpeed commented Jan 21, 2022

Looks like we regressed when we went from structured to unstructured 🤔

Note we only picked this up because our CI tests on OpenShift started failing with annotations leftover on machines

@enxebre
Copy link
Member

enxebre commented Jan 21, 2022

/lgtm
/hold
feel free to cancel the hold as you see fit @JoelSpeed

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 21, 2022
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 21, 2022
Because the autoscaler assumes it can delete nodes in parallel, it 
fetches nodegroups for each node in separate go routines and then 
instructs each nodegroup to delete a single node.
Because we don't share the nodegroup across go routines, the cached 
replica count in the scalableresource can become stale and as such, if 
the autoscaler attempts to scale down multiple nodes at a time, the 
cluster api provider only actually removes a single node.

To prevent this, we must ensure we have a fresh replica count for every 
scale down attempt.
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 21, 2022
Copy link
Member

@alexander-demicev alexander-demicev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 21, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alexander-demichev, elmiko, JoelSpeed, mrajashree

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@JoelSpeed
Copy link
Contributor Author

Think I've addressed all the feedback, @mrajashree if you're happy with the error message update, would you hold cancel? :)

@mrajashree
Copy link

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 21, 2022
@k8s-ci-robot k8s-ci-robot merged commit 75207a2 into kubernetes:master Jan 21, 2022
enxebre added a commit to enxebre/autoscaler that referenced this pull request Jul 13, 2022
This ensured that access to replicas during scale down operations were never stale by accessing the API server kubernetes#3104.
This honoured that behaviour while moving to unstructured client kubernetes#3312.
This regressed that behaviour while trying to reduce the API server load kubernetes#4443.
This put back the never stale replicas behaviour at the cost of loading back the API server kubernetes#4634.

This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache.

Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.
enxebre added a commit to enxebre/autoscaler that referenced this pull request Jul 13, 2022
This ensured that access to replicas during scale down operations were never stale by accessing the API server kubernetes#3104.
This honoured that behaviour while moving to unstructured client kubernetes#3312.
This regressed that behaviour while trying to reduce the API server load kubernetes#4443.
This put back the never stale replicas behaviour at the cost of loading back the API server kubernetes#4634.

Currently on e.g a 48 minutes cluster it does 1.4k get request to the scale subresource.
This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache.

Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.
enxebre added a commit to enxebre/autoscaler that referenced this pull request Jul 13, 2022
This ensured that access to replicas during scale down operations were never stale by accessing the API server kubernetes#3104.
This honoured that behaviour while moving to unstructured client kubernetes#3312.
This regressed that behaviour while trying to reduce the API server load kubernetes#4443.
This put back the never stale replicas behaviour at the cost of loading back the API server kubernetes#4634.

Currently on e.g a 48 minutes cluster it does 1.4k get request to the scale subresource.
This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache.

Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.
navinjoy pushed a commit to navinjoy/autoscaler that referenced this pull request Oct 26, 2022
This ensured that access to replicas during scale down operations were never stale by accessing the API server kubernetes#3104.
This honoured that behaviour while moving to unstructured client kubernetes#3312.
This regressed that behaviour while trying to reduce the API server load kubernetes#4443.
This put back the never stale replicas behaviour at the cost of loading back the API server kubernetes#4634.

Currently on e.g a 48 minutes cluster it does 1.4k get request to the scale subresource.
This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache.

Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/cluster-api Issues or PRs related to Cluster API provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants