VPA: configurable resource #2387

Avanpourm · 2019-09-25T16:50:06Z

It is desirable to configure each VPA-associated resource separately to determine whether to automatically scale the memory, CPU, or both, and configure whether to scale the Limit value proportionally.
In the actual business scenario, not all services are suitable for automatic scaling of memory and CPU at the same time. It is hoped that the corresponding configuration capability can be provided. In addition, the proportional scaling of the Limit values is not suitable for all services. Many services want to maintain the “Limit” value and only automatically resize the “Reqeust” value.

Avanpourm · 2019-09-25T16:58:29Z

According to the current VPA, only the CPU and memory are automatically scaled, then the configuration options for each VPA will be:

Whether to automatically scale the memory "Request" value
Whether to automatically scale the memory "Limit" value (only valid when the auto-scaling memory "Request" value is allowed)
Whether to automatically scale the CPU "Request" value
Whether to automatically scale the CPU "Limit" value (only valid when the auto-scaling CPU "Request" value is allowed)

calixwu · 2019-09-26T08:43:27Z

Agree!

johanneswuerbach · 2019-09-26T08:50:24Z

Making the limit scaling configurable would also solve our issues in #2359. So 👍 on that.

bskiba · 2019-09-26T09:51:09Z

WRT just scaling one of CPU, memory, there is a way to currently do so. If you want to turn off scaling for a given resource, you can specify MinAllowed=MaxAllowed=desired request in the scaling policy. I know it's not ideal in terms of API, but possible to configure.

WRT to keeping limit unchanged, I'm still not sure I understand the use case here. What is the benefit? If we want to treat limit as an upper maximum for request, VPA config provides MaxAllowed for capping the desired resource request.

Avanpourm · 2019-09-26T12:31:53Z

"specify MinAllowed=MaxAllowed=desired request in the scaling policy" may work, but it also causes a lot of trouble.

For example, every time I update the "CPU Request" in a deployment, I need to update the VPA synchronously and set MinAllowed=MaxAllowed=desired. In addition, not all services are configured with VPA, so you need to maintain this mapping. In fact, if the API design is reasonable, you can avoid these problems and make it easier to use.

Regarding Limit, it is relatively dangerous to automatically change it. For example, the memory limit setting is incorrect, it is easy to cause OOM, which causes the "service" SLA to deteriorate. Some "services" want to be able to set Limit appropriately to provide redundancy for bursty traffic.

The current VPA prediction algorithm takes a percentage value, which may be reasonable for optimizing the Request, but it is too simple to calculate Limit, which can easily affect the quality of the service. So it is better to provide this configuration, allowing users to specify Limit themselves.

In fact, the "Request" value is more important when optimizing the utilization of the cluster. If it is set properly, it can help the scheduler to work better and make full use of the resources of the cluster. For the "Limit" value, the service usage limit is set on the one hand, and the ability to use resources excessively is provided on the other hand.

Before there is no better prediction algorithm for Limit, if you provide the ability to configure, you can use VPA for "services" that cannot use VPA because of the "Limit" problem.

bskiba · 2019-09-26T13:32:51Z

/cc @kgolab for visibility

bskiba · 2019-10-22T14:27:50Z

@Avanpourm
What do you think would be a better calculation of limit?
The request is the amount of resources guaranteed for your workload, that you can burst above if needed. The limit is the hard bound. I am not certain how setting a constant limit protects you from OOM errors? What limit gives you is the predictability of OOMs, which is why we recommend to set limit=request for memory. The resources are guaranteed for you and you will not get any surprise OOM errors in the case the cluster is full, you will find out that the workload is out of resources as soon as it starts using its request to the fullest and you (or VPA) can raise those resources for you.

I am sorry if I am missing the point from your comment. I'm trying to wrap my head around it.

Avanpourm · 2019-10-23T04:43:52Z

@bskiba
I hope that the utilization of the request is close to the actual usage (for example, the utilization of the request is 100%). Because kube-schedule is based on the request to calculate the scheduling priority. So the request close to the actual usage can make the cluster node scheduling more accurate. Ensure that some Nodes are not over-utilized due to schedule estimation errors during the overall utilization of the cluster.

Then we need to consider some scenarios of sudden traffic increase in the service when the request utilization is as high as possible.

If Limit is calculated proportionally, such as Limit = 200% Request, then Lmit will be small when Request is very small, but this is not very reasonable. For example, the increase in service memory usage from 1G to 2G has doubled, but from 100M to 2G has increased by 20 times. The relationship between Limit and Request should not be just a fixed ratio.

Like two cases

A service actually uses memory 1G. If you want the utilization rate to be 100%, then request = 1G. I hope that when the traffic spikes, as long as the system has enough resources, let it use it, that is, allow it to use more than 100. %. So Limit hopes to be 2G.
The average memory usage of the B service is only 200M, and the memory usage will reach 2G at a specific time every weekend. I hope the request can be 200M and the Limit can be 2G. OOM may occur when NODE resources are insufficient, but the chances are small. As long as the cluster redundancy is appropriate and most services' requests are close to real usage, there will be less tension for individual Node resources. And if request = Limit = 200M, it will inevitably lead to OOM. If request = Limit = 2G then there is more waste of resources.

fejta-bot · 2020-01-21T05:09:37Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2020-02-20T05:51:55Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2020-03-21T06:35:02Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2020-03-21T06:35:18Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

fejta-bot · 2020-04-20T07:18:18Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Avanpourm changed the title ~~VPS: configurable resource~~ VPA: configurable resource Sep 25, 2019

bskiba added kind/feature Categorizes issue or PR as related to a new feature. area/vertical-pod-autoscaler labels Oct 22, 2019

This was referenced Jan 17, 2020

VPA: Protect against runaway pods #2359

Closed

[feature request] Allow to set limit/request ratio for vpa #2444

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 21, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 20, 2020

k8s-ci-robot closed this as completed Mar 21, 2020

johanneswuerbach mentioned this issue Apr 6, 2020

VPA: Configurable container limit scaling #3028

Merged

raywainman mentioned this issue Sep 17, 2024

VPA Not honoring maxAllowed Memory Limit #6996

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VPA: configurable resource #2387

VPA: configurable resource #2387

Avanpourm commented Sep 25, 2019

Avanpourm commented Sep 25, 2019

calixwu commented Sep 26, 2019

johanneswuerbach commented Sep 26, 2019

bskiba commented Sep 26, 2019

Avanpourm commented Sep 26, 2019

bskiba commented Sep 26, 2019

bskiba commented Oct 22, 2019

Avanpourm commented Oct 23, 2019

fejta-bot commented Jan 21, 2020

fejta-bot commented Feb 20, 2020

fejta-bot commented Mar 21, 2020

k8s-ci-robot commented Mar 21, 2020

fejta-bot commented Apr 20, 2020

VPA: configurable resource #2387

VPA: configurable resource #2387

Comments

Avanpourm commented Sep 25, 2019

Avanpourm commented Sep 25, 2019

calixwu commented Sep 26, 2019

johanneswuerbach commented Sep 26, 2019

bskiba commented Sep 26, 2019

Avanpourm commented Sep 26, 2019

bskiba commented Sep 26, 2019

bskiba commented Oct 22, 2019

Avanpourm commented Oct 23, 2019

fejta-bot commented Jan 21, 2020

fejta-bot commented Feb 20, 2020

fejta-bot commented Mar 21, 2020

k8s-ci-robot commented Mar 21, 2020

fejta-bot commented Apr 20, 2020