-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chart: Add option to install VPA #5558
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/assign @gjtempleton
A couple of small requests.
charts/cluster-autoscaler/Chart.yaml
Outdated
@@ -11,4 +11,4 @@ name: cluster-autoscaler | |||
sources: | |||
- https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler | |||
type: application | |||
version: 9.25.0 | |||
version: 9.25.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're adding new functionality here, so can we make this a minor version bump please?
charts/cluster-autoscaler/README.md
Outdated
The chart can install a VPA for the Deployment if needed. A VPA can help minimize wasted resources when usage spikes periodically. | ||
The following example snippet can be used to install a `autoscaling.k8s.io/v1` VPA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we link out to the VPA docs please, there's a reasonable chance users will be reading this Readme disconnected from this repo, and may want to find out more about the VPA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Some small comments about VPA configuration settings, the API version is the only thing that I'd really recommend to change – the other two remarks are just to start a discussion. I wouldn't argue against merging this, even if you were to decide to keep these as-is.
@@ -0,0 +1,20 @@ | |||
{{- if .Values.vpa.enabled -}} | |||
apiVersion: "autoscaling.k8s.io/{{ .Values.vpa.apiVersion }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we need optional support for v1beta2
– do we have good reasons for allowing people to use v1beta2
instead of v1
?
The recent VPA release 0.13.0 deprecated this api version. This means it will stay around for a short while, but is on a clear track to be removed.
The v1
api version has been around since 2019, so I think it should be safe to hardcode this template to v1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sure. I was actually trying to find out when it got released, but I must've missed those release notes.
# memory: 150Mi | ||
# maxAllowed: | ||
# cpu: 100m | ||
# memory: 300Mi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how much we want this commented out bit to be a suggestion to people to use these values, or if they're just here for illustration purposes. Probably this is what users will configure as a starting point if they don't have an indication to use different values.
Given that the deployment template uses cpu: 100m
and memory: 300Mi
as starting points, it seems pretty strict to use the same values as maxAllowed
– this would mean the CA would never be scaled up by VPA. In pratice, the main reason I see people use maxAllowed
is to ensure that a workload cannot grow to a point where it cannot be scheduled on a node anymore. If we don't have any other scenario in mind, I think we could put higher values into maxAllowed
or even leave it out entirely?
For minAllowed
, there's a similar case to be made: In our systems I see CAs working fine with cpu: 20m
and memory: 50Mi
– this configuration here doesn't allow VPA to scale down beyond cpu: 50m
and memory: 150Mi
, which would be a waste of resources. It is still an improvement on the default values in the Deployment, though, but I'm not familiar enough with CA performance characteristics to say if there is a certain threshold beyond which we wouldn't want VPA to shrink the resources to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The values here were meant to illustrate the possible contents of a containerPolicy. I'll add a link to the spec instead.
I just checked some deployments and the lowest usage I saw was 2m/22Mi
. I'd say 20m/50Mi
is a good compromise that still allows some headroom. I'll add these to the example in the README and leave out the maxAllowed
.
kind: Deployment | ||
name: {{ template "cluster-autoscaler.fullname" . }} | ||
updatePolicy: | ||
updateMode: "Auto" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see people run their VPA in updateMode: off
for a while to observe if anything weird would happen when they switched on updateMode: auto
– so maybe this is an option you'd want to make configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think about that. I'll add it
9ca4889
to
29e2554
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm thanks!
Thanks for addressing the feedback. /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: avorima, gjtempleton, voelzmo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
I currently install a VPA for the cluster-autoscaler alongside the chart, but it would be nice to get it all from the same source.
Does this PR introduce a user-facing change?