Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-1287: InPlacePodVerticalScaling changes for v1.33 #5089

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

tallclair
Copy link
Member

InPlacePodVerticalScaling changes for v1.33, including:

/sig node
/milestone v1.33

@k8s-ci-robot k8s-ci-robot added this to the v1.33 milestone Jan 24, 2025
@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels Jan 24, 2025
@tallclair
Copy link
Member Author

/assign @dchen1107 @thockin

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 24, 2025
@tallclair
Copy link
Member Author

/assign @vinaykul

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tallclair
Once this PR has been reviewed and has the lgtm label, please ask for approval from dchen1107. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@esotsal
Copy link

esotsal commented Jan 25, 2025

/cc

@k8s-ci-robot k8s-ci-robot requested a review from esotsal January 25, 2025 00:12
@@ -216,6 +216,8 @@ PodStatus is extended to show the resources applied to the Pod and its Container
* Pod.Status.ContainerStatuses[i].Resources (new field, type
v1.ResourceRequirements) shows the **actual** resources held by the Pod and
its Containers for running containers, and the allocated resources for non-running containers.
* Pod.Status.AllocatedResources (new field) reports the aggregate pod-level allocated resources,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does introducing a new field and turning a feature on for beta conflict with normal API changes?

I would think that a new API would need to go into Alpha -> Beta -> BA.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct , InPlacePodVerticalScalingAllocatedStatus feature gate was introduced in v1.32 for this reason. Please check https://github.com/kubernetes/kubernetes/blob/7140b4910c6c1179c9778a7f3bb8037356febd58/pkg/api/pod/util.go#L806

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point. I'll update the KEP with these details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read it and I think you did add it. I just missed it with this update.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was mixed up between the pod-level field, and the container level field. I dropped the podlevel field for now, and I'm proposing adding back the container-level field. We'll still need the pod-level field for resizing pod-level resources, but that will be addresses in a separate KEP update.

* NotRequired - default value; resize the Container without restart, if possible.
* RestartContainer - the container requires a restart to apply new resource values.
* `PreferNoRestart` - default value; resize the Container without restart, if possible.
* `NotRequired` - Equivalent to `PreferNoRestart`, deprecated with v1.33.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Unnecessary whitespace

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intentional, to group it under PreferNoRestart

@tallclair tallclair mentioned this pull request Jan 30, 2025
31 tasks
`Status...Resources.Requests` values.
To compute the Node resources allocated to Pods, pending resizes must be factored in.
The scheduler will use the maximum of:
1. Desired resources, computed from container requests in the pod spec, unless the resize is marked as `Infeasible`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Numbered list items have 1 , is this in purpose or intention was to be 1,2,3 indicating order?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are rendered as 1,2,3 in markdown.

- Metric name: `runtime_operations_duration_seconds{operation_type=container_update}`
- Components exposing the metric: kubelet
- Metric name: `runtime_operations_errors_total{operation_type=container_update}`
- Components exposing the metric: kubelet

* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**

- Using `kubelet_container_resize_requests_total`, `completed + infeasible + canceled` request count
should approach `proposed` request count in steady state.
- Resize requests should succeed (`apiserver_request_total{resource=pods,subresource=resize}` with non-success `code` should be low))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary close bracket

nit: perhaps rephrase to ?

Resize requests should succeed (apiserver_request_total{resource=pods,subresource=resize} non-success code rate should be low )


If we need allocated resources & limits in the pod status API, the following options have been
If we need allocated limits in the pod status API, the following options have been
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since intention is to state the motivation should we rephrase to:

If we need to track allocated requests and limits in the pod status API, the following options have been

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Development

Successfully merging this pull request may close these issues.

8 participants