VPA: Support vertical scaling of injected containers #5617

jutley · 2023-03-24T16:16:50Z

Which component are you using?:

Vertical Pod Autoscaler

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

There are scenarios where we inject sidecar containers into many pods. There is not a one-size-fits-all resource allocation for these pods, so I really want to be able to start with a generic value, and then use a VPA to tune these to a more appropriate value for the workload. Currently, injected containers seem to be entirely ignored, and there is no way to explicit ask for them to be configured.

Describe the solution you'd like.:

I would like to be able to configure either the VPA pods or the VPA resources in a way that allows us to include these injected containers in the VPA scaling process. I think the best option may be in the VPA pods. Since container injection is generally a cluster-admin responsibility, it makes sense that we would want to introduce this configuration at a level maintained by cluster admins.

I can imagine a new flag being introduced for the recommender like --injected-sidecar-allowlist which enumerates the container names which we want to support VPA scaling for.

Describe any alternative solutions you've considered.:

We can augment the logic of our container injections to try to set more appropriate values for their resources, but this is very hard! After all, this is the point of the VPA existing in the first place.

The text was updated successfully, but these errors were encountered:

voelzmo · 2023-03-29T09:09:18Z

Hey @jutley, thanks for the detailed description of what you're trying to achieve!

While I'm not exactly sure, why the feature to ignore injected sidecar containers was introduced in the first place, I'm sure there was a good reason for it. @jbartosik @krzysied, can you maybe remember more context about why calculating recommendations for injected sidecars can be problematic?

That being said, I think it should be possible to

either introduce a parameter for the admission-webhook and pass it into the ObservedContainersCalculator, so it can include the sidecar container's name when calculating the patch for the observedContainers annotation. This would lead to all VPAs having this container name in their annotation, even if the Pod doesn't have the injected sidecar container
or create a new annotation on the Pod or the VPA (like vpaObservedSidecarContainers) which contains a list of names to added when calculating the patch for the observedContainers annotation. With this solution, you'd have to add an additional annotation to all Pods that get the sidecar injected. That allows you to ensure only the Pods that actually have the sidecar will get an vpaObservedContainers annotation with the sidecar container's name in it – but it makes it also possible to forget adding the new annotation and end up wondering why the sidecar is there but doesn't get any resource updates.

If there's no general case to be made why we shouldn't update sidecar container resources at all, I think we can add something like sketched out above, WDYT @jbartosik?

jbartosik · 2023-05-09T08:21:41Z

The reason for #2795 (VPA ignoring injected sidecars) is that some sidecars might be injected by webhooks acting after VPA admission controller. This is a problem because in that case VPA admission controller won't change requests of the injected side car. So when the pod starts VPA will evict the pod (because the injected side car has unadjusted request), causing an eviction loop.

I have a few ideas for how to handle this:

proposed here: let user decide, give them a flag (we'd need to document the risk of eviction loop)
maybe setting reinvocationPolicy: IfNeeded for the VPA admission webhook would solve the problem (we need to test)
as part of support for in-place updates (Support in-place Pod vertical scaling in VPA #4016) we would be able to set those requests

EDIT: reinvocationPolicy was beta in 1.15, but I can't find quickly:

Why we decided not to use it,
When it went GA / if it went GA (if it wasn't GA in 2020-02 maybe that's the reason we didn't use it?).

jutley · 2023-05-15T22:51:32Z

That reasoning makes a lot of sense. The reinvocationPolicy configuration seems like it should do the trick, and I don't see any reason that adding it would cause issues. The key requirements is that the mutation functionality must be idempotent, and this webhook just changes existing fields.

I can't find any reference to when this feature graduated from beta to GA, but the official docs on admission controllers reference this feature without any mention of it being in beta. I think it is reasonable to assume that it is GA.
https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/

Mason-776 · 2023-05-16T18:18:45Z

For reference, it looks like reinvocationPolicy was added in v1.15.0-beta.2 and graduated from v1beta1 to v1 in v1.16.0-alpha.1

k8s-triage-robot · 2024-01-20T15:12:58Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-02-19T15:16:13Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

voelzmo · 2024-02-21T09:02:37Z

/remove-lifecycle rotten
/lifecycle frozen

This is still a thing. My understanding is that we should offer a switch to the admission-controller to disable adding the vpaObservedContainers annotation and setting reinvocationPolicy: IfNeeded.

@jutley On a sidenote: I think you can achieve the desired behavior already now, if you ensure that the vpa webhook will be called after the webhook that injects the container. This way, the container will be part of the vpaObservedContainers annotation and everything will behave as with the other containers.

Currently, the advice is the exact opposite to your situation: we want people to make sure that the vpa webhook is executed before all injecting webhooks, such that the feature of ignoring injected sidecars works properly.

jutley added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 24, 2023

voelzmo mentioned this issue May 9, 2023

Support in-place Pod vertical scaling in VPA #4016

Open

jbartosik added the area/vertical-pod-autoscaler label May 12, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 19, 2024

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Feb 21, 2024

voelzmo mentioned this issue Feb 21, 2024

Set vpa webhook reinvocationPolicy: IfNeeded gardener/gardener#9191

Merged

voelzmo mentioned this issue Apr 5, 2024

Native sidecars lead to huge amounts of errors in recommender logs #6691

Open

voelzmo mentioned this issue Dec 20, 2024

Document VPA behavior for sidecar containers and how to change it #7641

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VPA: Support vertical scaling of injected containers #5617

VPA: Support vertical scaling of injected containers #5617

jutley commented Mar 24, 2023

voelzmo commented Mar 29, 2023 •

edited

Loading

jbartosik commented May 9, 2023 •

edited

Loading

jutley commented May 15, 2023

Mason-776 commented May 16, 2023

k8s-triage-robot commented Jan 20, 2024

k8s-triage-robot commented Feb 19, 2024

voelzmo commented Feb 21, 2024

VPA: Support vertical scaling of injected containers #5617

VPA: Support vertical scaling of injected containers #5617

Comments

jutley commented Mar 24, 2023

voelzmo commented Mar 29, 2023 • edited Loading

jbartosik commented May 9, 2023 • edited Loading

jutley commented May 15, 2023

Mason-776 commented May 16, 2023

k8s-triage-robot commented Jan 20, 2024

k8s-triage-robot commented Feb 19, 2024

voelzmo commented Feb 21, 2024

voelzmo commented Mar 29, 2023 •

edited

Loading

jbartosik commented May 9, 2023 •

edited

Loading