Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AEP for support of in-place updates for VPA #5755

Merged
merged 2 commits into from
Jun 12, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
# AEP-4016: Support for in place updates in VPA

<!-- toc -->
- [Summary](#summary)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Proposal](#proposal)
- [Context](#context)
- [Design Details](#design-details)
- [Applying Updates During Pod Admission](#applying-updates-during-pod-admission)
- [Applying Disruption-free Updates](#Applying-disruption-free-updates)
- [Applying Disruptive Updates](#applying-disruptive-updates)
- [Comparison of `UpdateMode`s](#comparison-of-updatemodes)
- [Test Plan](#test-plan)
- [Implementation History](#implementation-history)
<!-- /toc -->

## Summary


VPA applies its recommendations with a mutating webhook, during pod creation. It can also evict
pods expecting that it will apply the recommendation when the pod is recreated. This is a
disruptive process so VPA has some mechanism to avoid too frequent disruptive updates.

This proposal allows VPA to apply its recommendations more frequently, with less disruption by
using the
[in-place update feature] which is an alpha feature [available in Kubernetes 1.27.] This proposal enables only core uses
of in place updates in VPA with intention of gathering more feedback. Any more advanced uses of in place updates in VPA
(like applying different recommendations during pod initialization) will be introduced as separate enhancement
proposals.

[in-place update feature](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[in-place update feature](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources)
[in-place update feature]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #5877

[available in Kubernetes 1.27.](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md#api-change-3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[available in Kubernetes 1.27.](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md#api-change-3)
[available in Kubernetes 1.27.]: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md#api-change-3

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #5877


### Goals

* Allow VPA to actuate without disruption,
* Allow VPA to actuate more frequently, when it can actuate without disruption,
* Allow VPA to actuate in situations where actuation by eviction doesn't work.

### Non-Goals

* Improved handling of injected sidecars
* Separate AEP will improve VPAs handling of injected sidecars.

## Proposal

Add new supported values of [`UpdateMode`]:

* `InPlaceOnly` and
* `InPlaceOrRecreate`.

[`UpdateMode`]: https://github.com/kubernetes/autoscaler/blob/71b489f5aec3899157b37472cdf36a1de223d011/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L124

## Context

[In-place update of pod resources KEP] is available in alpha in [Kubernetes 1.27]. The feature allows changing container
resources while the container is running. It also adds [`ResizePolicy`] field to Container. This field indicates for an
individual resource if a container needs to be restarted by kubelet when the resource is changed. For example it may be
the case that a Container automatically adapts to a change in CPU, but needs to be restarted for a change in Memory to
take effect.

[In-place update of pod resources KEP]: https://github.com/kubernetes/enhancements/issues/1287
[Kubernetes 1.27]: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md#api-change-3
[`ResizePolicy`]: https://github.com/kubernetes/api/blob/8360d82aecbc72aa039281a394ebed2eaf0c0ccc/core/v1/types.go#L2448

## Design Details

In the update modes existing before (`Initial` and `Recreate`) only admission controller was changing pod spec (updater
was responsible for evicting pods so that admission controller could change them during admission).

In the newly added `InPlaceOnly` and `InPlaceOrRecreate` modes VPA Updater will execute in-place updates (which change
pod spec without involving admission controller).

### Applying Updates During Pod Admission

For VPAs in `InPlaceOnly` and `InPlaceOrRecreate` modes VPA Admission Controller will apply updates to starting pods,
like it does for VPAs in `Initial`, `Auto`, and `Recreate` modes.

### Applying Disruption-free Updates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we doing something about the case where an in-place update becomes infeasible, because the Pod doesn't fit on this Node anymore? Otherwise it could happen that using in-place updates creates more problems for people's workloads.

We could e.g. react to events on the Pod, which should be generated by kubelet

kubelet will generate Events on the Pod whenever a resize is accepted or rejected

I wasn't sure where/how to find which events should be generated in this case, though?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I think VPA should react to both Infeasible (this is rather easy) and also to long-lasting Deferred status: "it fits on this node" might require evicting all other Pods and it might take a long time to actually achieve this (IIUC there is no controller which will just do it for VPA).

I'd like to make sure that a long-lasting Deferred status doesn't block VPA from actuating the recommendation to other Pods in the workload.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the "VPA updater will consider that the update failed if" part I added answer your question? Or were you thinking about something else (scale down in place but then can't scale up in place to the old size)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kgolab @voelzmo if you think I should change the doc please let me know and I'll add changes to #5877


When an update only changes resources for which a container indicates that it doesn't require a restart, then VPA can
attempt to actuate the change without disrupting the pod. VPA Updater will:
* attempt to actuate such updates in place,
* attempt to actuate them if difference between recommendation and request is at least 10%
jbartosik marked this conversation as resolved.
Show resolved Hide resolved
* even if pod has been running for less than 12h,
* if necessary execute only partial updates (if some changes would require a restart).

### Applying Disruptive Updates

In both `InPlaceOnly` and `InPlaceOrRecreate` modes VPA updater will attempt to apply updates that require container
restart in place. It will update them under conditions that would trigger update with `Recreate` mode. That is it will
apply disruptive updates if:

* Any container has a request below the corresponding `LowerBound` or
* Any container has a request above the corresponding `UpperBound` or
* Difference between sum of pods requests and sum of recommendation `Target`s is more than 10% and the pod has been
running undisrupted for at least 12h.
* Successful disruptive update counts as disruption here (and prevents further disruptive updates to the pod for 12h).

In `InPlaceOrRecreate` mode (but not in `InPlaceOnly` mode) VPA updater will evict pod to actuate a recommendation if it
attempted to apply the recommendation in place and failed.
jbartosik marked this conversation as resolved.
Show resolved Hide resolved

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to have the order of updater actions explicitly stated.
I expected it would attempt in-place updates first but a) it's only my assumption b) I'm not sure it always would be the case.
I can see at least the following possibilities:

  • all resources can be updated in-place and the change matches eviction criteria: attempt in-place, evict on error?
  • some resources can be updated in-place, some not and the change matches eviction criteria: would VPA evict already, w/o attempting in-place update (this seems contradictory to "if necessary execute only partial updates" earlier), does this depend on which resource matches eviction criteria, other?
  • all or some resources can be updated in-place but the change doesn't match eviction criteria: in-place only?
  • no possibility for in-place update: go to the existing updater flow.

We also have priority ordering in updater, maybe it's worth adding a sentence how it would be changed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what you mean by "order of updater actions". Do you mean order of operations in updaters RunOnce?

Or do you mean since now there are up to 3 different things we could attempt for any given pod, in which of them we actually do?

The things we could do:

  1. Apply full update by eviction,
  2. Apply full update in-place, trigger some container restarts,
  3. Apply partial update in-place, don't restart any containers

We definitely need to try 2. before attempting 1. Attempt 1. only if 2. failed.

If both 2. and 3. are an option I think we should do 2. (there's some reason we didn't do the smaller disruption-free update earlier so let's move quickly).

I'm not updating AEP yet because I'm not sure what to put there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. for InPlaceOrRecreate and 3. for InPlaceOnly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. for both InPlaceOrRecreate and InPlaceOnly

VPA updater will consider that the update failed if:
* The pod has `.status.resize: Infeasible` or
* The pod has `.status.resize: Deferred` and more than 1 minute elapsed since the update or
* The pod has `.status.resize: InProgress` and more than 1 hour elapsed since the update:
* There seems to be a bug where containers that say they need to be restarted get stuck in update, hopefully it gets
fixed and we don't have to worry about this by beta.
* Patch attempt will return an error.
* If the attempt fails because it would change pods QoS:
* `InPlaceOrRecreate` will treat it as any other failure and consider evicting the pod.
* `InPlaceOnly` will consider applying request slightly lower than the limit.

Those failure modes shouldn't disrupt pod operations, only update. If there are problems that can disrupt pod operation
we should consider not implementing the `InPlaceOnly` mode.

### Comparison of `UpdateMode`s

VPA updater considers the following conditions when deciding if it should apply an update:
- [`CanEvict`]:
- Pod is `Pending` or
- There are enough running pods in the controller.
- Quick OOM:
- Any container in the pod had a quick OOM ([by default less than 10 minutes] after the container started) and
- There is difference between sum of recommendations and sum of current requests over containers (see
[`defaultPriorityProcessor.GetUpdatePriority`]).
- Long-lived pod - started enough time ago ([by default 12h])
- Significant change - difference between sum of requests over containers and sum of recommendations are different
enough ([by default 10%]).
- [Outside recommended range]:
- At least one container has at least one resource request lower than the lower bound of the corresponding
recommendation or
- At least one container has at least one resource request higher than the upper bound of the corresponding
recommendation.
- Disruption-free update - doesn't change any resources for which the relevant container specifies
`RestartPolicy: RestartContainer`.

`Auto` / `Recreate` evicts pod if:
* [`CanEvict`] returns true for the pod, and it meets at least one of the following conditions:
* Quick OOM,
* Outside recommended range,
* Long-lived pod with significant change.

`InPlaceOnly` and `InPlaceOrRecreate` will attempt to apply a disruption-free update in place if it meets at least one
of the following conditions:
* Quick OOM,
* Outside recommended range,
* Significant change.

`InPlaceOnly` and `InPlaceOrRecreate` when considering a disruption-free update in place ignore some conditions that
influence eviction decission in the `Recreate` mode:
* [`CanEvict`] won't be checked and
* Pods with significant change can be updated even if they are not long-lived.

`InPlaceOnly` and `InPlaceOrRecreate` will attempt to apply updates that are **not** disruption-free in place under
the same conditions that apply to updates in the `Recreate` mode.

`InPlaceOrRecreate` will attempt to apply updates that by eviction when:
* VPA already attempted to apply the update in-place and failed and
* it meets conditions for applying in the `Recreate` mode.

[`CanEvict`]: https://github.com/kubernetes/autoscaler/blob/114a35961a85efdf3f36859350764e5e2c0c7013/vertical-pod-autoscaler/pkg/updater/eviction/pods_eviction_restriction.go#LL100C10-L100C37
[by default less than 10 minutes]: https://github.com/kubernetes/autoscaler/blob/114a35961a85efdf3f36859350764e5e2c0c7013/vertical-pod-autoscaler/pkg/updater/priority/update_priority_calculator.go#L37
[`UpdatePriorityCalculator.AddPod`]: https://github.com/kubernetes/autoscaler/blob/114a35961a85efdf3f36859350764e5e2c0c7013/vertical-pod-autoscaler/pkg/updater/priority/update_priority_calculator.go#L81
[by default 12h]: https://github.com/kubernetes/autoscaler/blob/114a35961a85efdf3f36859350764e5e2c0c7013/vertical-pod-autoscaler/pkg/updater/priority/update_priority_calculator.go#L35
[by default 10%]: https://github.com/kubernetes/autoscaler/blob/114a35961a85efdf3f36859350764e5e2c0c7013/vertical-pod-autoscaler/pkg/updater/priority/update_priority_calculator.go#L33
[Outside recommendation range]: https://github.com/kubernetes/autoscaler/blob/114a35961a85efdf3f36859350764e5e2c0c7013/vertical-pod-autoscaler/pkg/updater/priority/priority_processor.go#L73
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[Outside recommendation range]: https://github.com/kubernetes/autoscaler/blob/114a35961a85efdf3f36859350764e5e2c0c7013/vertical-pod-autoscaler/pkg/updater/priority/priority_processor.go#L73
[Outside recommended range]: https://github.com/kubernetes/autoscaler/blob/114a35961a85efdf3f36859350764e5e2c0c7013/vertical-pod-autoscaler/pkg/updater/priority/priority_processor.go#L73

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #5877


### Test Plan

The following test scenarios will be added to e2e tests. Both `InPlaceOnly` and `InPlaceOrRecreate` modes will be tested
and they should behave the same:

* Admission controller applies recommendation to pod controlled by VPA.
* Disruption-free in-place update applied to all containers of a pod (request in recommendation bounds).
* Partial disruption-free update applied to some containers of a pod, some disruptive changes skipped (request in
recommendation bounds).
* Disruptive in-place update applied to all containers of a pod (request out ouf recommendation bounds).

There will be also scenarios testing differences between `InPlaceOnly` and `InPlaceOrRecreate` modesL
* Disruptive in-place update will fail. In `InPlaceOnly` pod should not be evicted, in the `InPlaceOrRecreate` pod
should be evicted and the recommendation applied.
* VPA attempts an update that would change Pods QoS (`RequestsOnly` scaling, request initially < limit, recommendation
equal to limit). In `InPlaceOnly` pod should not be evicted, request slightly lower than the recommendation will be
applied. In the `InPlaceOrRecreate` pod should be evicted and the recommendation applied.

jbartosik marked this conversation as resolved.
Show resolved Hide resolved
## Implementation History

- 2023-05-10: initial version