-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cluster-autoscaler: override by annotation node scale down time #2989
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Julien Balestra <[email protected]>
d7c482f
to
89e8447
Compare
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Sorry, I missed that one earlier. As a go-to solution I would prefer allowing to override those and other settings where it makes sense on a per-nodegroup basis. I was thinking about similar use-case recently and my ideas was to of extend NodeGroup interface with a method that would allow returning struct with values such as scaleDownUnneededTime or scaleDownUtilizationThreshold (using top-level settings as backup when not available or not implemented).
The downside would be that making it provider specific would result in inconsistencies between providers, but I feel it's too late for that concern - configuring CA already wildly differs between providers. Would my solution also cover your use-case? Or do you actually want to be able to configure it on a per-node basis? Somewhat unrelated to the discussion on the API - a possible problem with having different scaleDownUnneededTime for different nodes is that CA only keeps a certain number of nodes as 'scale-down candidates'. Without any preference for those with shorter scaleDownUnneededTime there are edge cases where long-waiting candidates could effectively block scale-down. It would be worth trying to think through other edge cases that may arise from making various configs per-nodepool, but I don't think it should block adding the configuration knobs. |
@maciaszczykm thank you for the answer and your detailed proposal. On our end, we prefer to defer the logic on Kubernetes objects like nodes to let the user experience be customisable (this PR change). I observe the following advantage to go with this original proposal:
As both make sense, could we potentially implement both or would that be too misleading ? |
@MaciekPytel @JulienBalestra I was thinking about a similar use case for
than a provider-specific interpretation of autodiscovery. I believe that not all cloud-providers have implementations of autodiscovery. |
Sorry, I've missed your replies. I don't particularly like annotations for the following reasons:
Regarding configmap: You make good points about having a consistent implementation between providers and the fact that some don't have autodiscovery. The problems with this approach that I see:
All those problems apply to any in-cluster config and I'm not sure how we can easily solve them. Maybe there is some sort of a middle ground solution here where we can have a cloudprovider specific mechanism, but provide a default implementation using configmap? Though it seems a bit ugly. edit: On implementation side - I don't believe the old implementation of dynamic config has ever worked. It was contributed by an external contributor in the early days of the project and it was never maintained, IIRC it had some breaking bugs that never got fixed. I don't believe CA can handle recreating StaticAutoscaler (probably even more so now that we encourage people to use stateful processors in forks), even if we decide to go with confimap I think it's going to be much easier and safer to implement it by updating the settings inside RunOnce (similarly to how we update settings in cloudprovider.Refresh()). |
Closing due to inactivity. Feel free to reopen. |
Is there a way to achieve this (different scale-down times per nodegroups) currrently? |
Depends on where you're running your cluster. Core autoscaler logic supports it, but not all providers have integrated with this logic and many only added integration recently (ex. GCE, Azure support is coming in 1.23 I think, #4238 adding AWS support is pending). |
This PR allows to use specific scale down threshold to easily manage a heterogeneous number of nodegroups.
For example, when using the storage-local-static-provisioner and statefulsets, you don't want that the node missing stateful pods being scaled down with the same time as the stateless ones.