Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AEP for support of in-place updates for VPA #5755

Merged
merged 2 commits into from
Jun 12, 2023

Conversation

jbartosik
Copy link
Collaborator

/kind documentation
/kind feature

Add Autoscaling Enhancement Proposal to add basic support for in place updates to VPA.

See also #5754 for changes I propose to eviction control enhancement proposal to work better with in place updates

#4016

@voelzmo @kgolab @pbetkier @wangchen615 please take a look

@jbartosik jbartosik added area/vertical-pod-autoscaler kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels May 11, 2023
@k8s-ci-robot k8s-ci-robot added kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. labels May 11, 2023
@k8s-triage-robot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 12, 2023
Copy link
Contributor

@voelzmo voelzmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for getting this Enhancement Proposal rolling! I have a few suggestions for changing the text, but mostly questions around the two UpdateModes and how we're handling errors and the fallback to eviction.

For VPAs in `InPlaceOnly` and `InPlaceOrRecreate` modes VPA Admission Controller will apply updates to starting pods,
like it does for VPAs in `Initial`, `Auto`, and `Recreate` modes.

### Applying Disruption-free Updates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we doing something about the case where an in-place update becomes infeasible, because the Pod doesn't fit on this Node anymore? Otherwise it could happen that using in-place updates creates more problems for people's workloads.

We could e.g. react to events on the Pod, which should be generated by kubelet

kubelet will generate Events on the Pod whenever a resize is accepted or rejected

I wasn't sure where/how to find which events should be generated in this case, though?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I think VPA should react to both Infeasible (this is rather easy) and also to long-lasting Deferred status: "it fits on this node" might require evicting all other Pods and it might take a long time to actually achieve this (IIUC there is no controller which will just do it for VPA).

I'd like to make sure that a long-lasting Deferred status doesn't block VPA from actuating the recommendation to other Pods in the workload.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the "VPA updater will consider that the update failed if" part I added answer your question? Or were you thinking about something else (scale down in place but then can't scale up in place to the old size)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kgolab @voelzmo if you think I should change the doc please let me know and I'll add changes to #5877


In `InPlaceOrRecreate` mode (but not in `InPlaceOnly` mode) VPA updater will evict pod to actuate a recommendation if it
attempted to apply the recommendation in place and failed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to have the order of updater actions explicitly stated.
I expected it would attempt in-place updates first but a) it's only my assumption b) I'm not sure it always would be the case.
I can see at least the following possibilities:

  • all resources can be updated in-place and the change matches eviction criteria: attempt in-place, evict on error?
  • some resources can be updated in-place, some not and the change matches eviction criteria: would VPA evict already, w/o attempting in-place update (this seems contradictory to "if necessary execute only partial updates" earlier), does this depend on which resource matches eviction criteria, other?
  • all or some resources can be updated in-place but the change doesn't match eviction criteria: in-place only?
  • no possibility for in-place update: go to the existing updater flow.

We also have priority ordering in updater, maybe it's worth adding a sentence how it would be changed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what you mean by "order of updater actions". Do you mean order of operations in updaters RunOnce?

Or do you mean since now there are up to 3 different things we could attempt for any given pod, in which of them we actually do?

The things we could do:

  1. Apply full update by eviction,
  2. Apply full update in-place, trigger some container restarts,
  3. Apply partial update in-place, don't restart any containers

We definitely need to try 2. before attempting 1. Attempt 1. only if 2. failed.

If both 2. and 3. are an option I think we should do 2. (there's some reason we didn't do the smaller disruption-free update earlier so let's move quickly).

I'm not updating AEP yet because I'm not sure what to put there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. for InPlaceOrRecreate and 3. for InPlaceOnly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. for both InPlaceOrRecreate and InPlaceOnly

@kgolab
Copy link
Collaborator

kgolab commented May 16, 2023

@jbartosik , thanks for starting this! I added some comments on top of @voelzmo reviews (thanks!) - nothing major but I think it's worth discussing some finer points to avoid misunderstanding (or getting wrong expectations).

@jbartosik
Copy link
Collaborator Author

@kgolab @voelzmo thank you for your comments. I replied to all and I think most are resolved. Please take a look

@iakat
Copy link

iakat commented Jun 11, 2023

Hi @kgolab @voelzmo, could you please see if another round of reviews is appropriate?

Copy link
Contributor

@voelzmo voelzmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thanks for addressing all the comments! I've just added a few comments regarding link formatting for your consideration, otherwise this looks good to me!

(like applying different recommendations during pod initialization) will be introduced as separate enhancement
proposals.

[in-place update feature](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[in-place update feature](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources)
[in-place update feature]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #5877

proposals.

[in-place update feature](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources)
[available in Kubernetes 1.27.](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md#api-change-3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[available in Kubernetes 1.27.](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md#api-change-3)
[available in Kubernetes 1.27.]: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md#api-change-3

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #5877

[`UpdatePriorityCalculator.AddPod`]: https://github.com/kubernetes/autoscaler/blob/114a35961a85efdf3f36859350764e5e2c0c7013/vertical-pod-autoscaler/pkg/updater/priority/update_priority_calculator.go#L81
[by default 12h]: https://github.com/kubernetes/autoscaler/blob/114a35961a85efdf3f36859350764e5e2c0c7013/vertical-pod-autoscaler/pkg/updater/priority/update_priority_calculator.go#L35
[by default 10%]: https://github.com/kubernetes/autoscaler/blob/114a35961a85efdf3f36859350764e5e2c0c7013/vertical-pod-autoscaler/pkg/updater/priority/update_priority_calculator.go#L33
[Outside recommendation range]: https://github.com/kubernetes/autoscaler/blob/114a35961a85efdf3f36859350764e5e2c0c7013/vertical-pod-autoscaler/pkg/updater/priority/priority_processor.go#L73
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[Outside recommendation range]: https://github.com/kubernetes/autoscaler/blob/114a35961a85efdf3f36859350764e5e2c0c7013/vertical-pod-autoscaler/pkg/updater/priority/priority_processor.go#L73
[Outside recommended range]: https://github.com/kubernetes/autoscaler/blob/114a35961a85efdf3f36859350764e5e2c0c7013/vertical-pod-autoscaler/pkg/updater/priority/priority_processor.go#L73

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #5877

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 12, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jbartosik, voelzmo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 9c3d2e0 into kubernetes:master Jun 12, 2023
@iakat
Copy link

iakat commented Jun 12, 2023

Thank you all for your work on this 🤗

Excited to try it out!

@jbartosik jbartosik deleted the vpa-in-place-aep branch June 28, 2023 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/vertical-pod-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants