-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[clusterapi] Rely on replica count found in unstructuredScalableResource #4443
Conversation
Instead of retrieving it each time from k8s, which easily causes client-side throttling, which in turn causes each autoscaler run to take multiple seconds even if only a small number of NodeGroups is involved and nothing is to do.
2715058
to
3fa8e61
Compare
3fa8e61
to
897c208
Compare
@@ -119,6 +114,11 @@ func (r unstructuredScalableResource) SetSize(nreplicas int) error { | |||
|
|||
s.Spec.Replicas = int32(nreplicas) | |||
_, updateErr := r.controller.managementScaleClient.Scales(r.Namespace()).Update(context.TODO(), gvr.GroupResource(), s, metav1.UpdateOptions{}) | |||
|
|||
if updateErr == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I see that we're taking an accounting of the replica count and storing it every time we change the size from within cluster-autoscaler.
What if an out-of-band (i.e., not cluster-autoscaler-originating) replica change occurs. Won't our local replica count be stale in that event?
Lemme know if I'm not understanding everything here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NodeGroups and underlying objects are temporary and re-queries/re-built every time they are needed, e.g. by calling machineController.nodeGroups()
. This means that the next run of the autoscaler loop will definitely pick up up-to-date objects. Internally, watches/caches/informers are used to keep the local cache up-to-date.
This is based on my understanding after a few hours of debugging cluster-autoscaler. I'm not very familiar with the code-base and also far away from being a full-blown go-coder, so take my analysis/understanding with a grain of salt :)
thanks for the ping @jackfrancis , i'm just starting to take a look at this. also, thanks for the contribution @codablock =) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this change generally makes sense to me, although i found the unit tests a little confusing at first.
i have some similar concerns to what @jackfrancis raised, but i would like to test drive this patch a little to see how it behaves.
@elmiko I agree on the confusing nature of the unit test changes...but I couldn't find a better way to make the tests reliable, especially with my limited knowledge about the code base and architecture. |
i've been running some local tests with a capi-kubemark cluster and i think this PR is probably safe for us to merge, and also will reduce the calls to the API server as @codablock noted. here's what i did
so, i ran a second test with more reasonable limits
i also tried the same procedure with an unmodified autoscaler and it seems to behave in a very similar manner, albeit with far fewer client side throttling events. i'm fine to accept this change, thanks for giving us a chance to test it @codablock . i would like to get @jackfrancis to take another look as well. |
fwiw, i think we have a bigger issue when it comes to a user changing the replicas outside of the autoscaler. it appears that if the user creates a large enough difference that the autoscaler stops being able to properly set the desired replicas because it is beyond the maximum setting. for example
|
Thanks @codablock for the PR, looks good. /lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: codablock, hardikdr The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This ensured that access to replicas during scale down operations were never stale by accessing the API server kubernetes#3104. This honoured that behaviour while moving to unstructured client kubernetes#3312. This regressed that behaviour while trying to reduce the API server load kubernetes#4443. This put back the never stale replicas behaviour at the cost of loading back the API server kubernetes#4634. This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache. Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.
This ensured that access to replicas during scale down operations were never stale by accessing the API server kubernetes#3104. This honoured that behaviour while moving to unstructured client kubernetes#3312. This regressed that behaviour while trying to reduce the API server load kubernetes#4443. This put back the never stale replicas behaviour at the cost of loading back the API server kubernetes#4634. Currently on e.g a 48 minutes cluster it does 1.4k get request to the scale subresource. This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache. Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.
This ensured that access to replicas during scale down operations were never stale by accessing the API server kubernetes#3104. This honoured that behaviour while moving to unstructured client kubernetes#3312. This regressed that behaviour while trying to reduce the API server load kubernetes#4443. This put back the never stale replicas behaviour at the cost of loading back the API server kubernetes#4634. Currently on e.g a 48 minutes cluster it does 1.4k get request to the scale subresource. This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache. Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.
This ensured that access to replicas during scale down operations were never stale by accessing the API server kubernetes#3104. This honoured that behaviour while moving to unstructured client kubernetes#3312. This regressed that behaviour while trying to reduce the API server load kubernetes#4443. This put back the never stale replicas behaviour at the cost of loading back the API server kubernetes#4634. Currently on e.g a 48 minutes cluster it does 1.4k get request to the scale subresource. This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache. Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.
Instead of retrieving it each time from k8s, which easily causes client-side
throttling, which in turn causes each autoscaler run to take multiple
seconds even if only a small number of NodeGroups is involved and nothing
is to do.