Karpenter should consider in-flight capacity when scaling out #1044

olemarkus · 2021-12-24T20:43:39Z

Tell us about your request
What do you want us to build?

If two apps are scaling up in succession, Karpenter will first react to app one's requirements and provision instances accordingly. Often there will be will be spare capacity on the provisioned nodes, especially if there are topology spread constraints involved.
When app two scales up, that spare capacity is not considered, and Karpenter may scale up an excessive amount of additional instances.

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

The consequence of the above is that there is a large amount of under-utilised nodes.

Are you currently working around this issue?
Regularly rotating nodes will compact the cluster. This causes a lot of unnecessary bouncing of Pods though.

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

ellistarn · 2021-12-24T21:02:44Z

One of the benefits of Karpenter's current implementation is that it's almost entirely stateless. Introducing new concepts such as awareness of in-flight instances or delayed pod binding significantly increases the Karpenter's complexity surface.

The Cluster Autoscaler has gone down this road and attempted to be smart about what it thinks might happen in the future, and relies on various timeouts to allow the built-up state to reconcile with what actually happened. These time periods inevitably change over time, and require configuration for the performance characteristics of different environments.

There's definitely a tradeoff to be made, but just because we can do/know something, doesn't mean we should. I'm open to being convinced that it's worth taking on this complexity; perhaps easiest at working group or on slack.

matti · 2022-01-07T11:30:00Z

What about bringing https://github.com/kubernetes-sigs/descheduler to the mix? Descheduler can detect these kind of conditions and then delete/drain those nodes?

olemarkus · 2022-01-07T12:11:55Z

I want to avoid pods being rescheduled multiple times during a normal deploy, so descheduler, just like defrag, is not a viable solution.

ellistarn · 2022-02-09T03:44:57Z

This is also useful for spark workloads, which do not create all pods at the same time. #1291

tzneal · 2022-05-03T19:04:36Z

Fixed with #1727

olemarkus added the feature New feature or request label Dec 24, 2021

ellistarn added the consolidation label Dec 24, 2021

olemarkus mentioned this issue Dec 28, 2021

Pod preemption cause Karpenter to over provision and leave Pods Pending #1050

Closed

olemarkus mentioned this issue Jan 7, 2022

Karpenter workload consolidation/defragmentation #1091

Closed

ellistarn mentioned this issue Jan 14, 2022

Karpenter provisions multiple nodes when the pending pods claims for PV #1146

Closed

ellistarn mentioned this issue Feb 9, 2022

Karpenter launches more EC2 instances than required #1291

Closed

ellistarn mentioned this issue Apr 21, 2022

[kops] Karpenter schedules nodes that are never used during deployment rolling restart #1059

Closed

tzneal self-assigned this Apr 25, 2022

tzneal closed this as completed May 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Karpenter should consider in-flight capacity when scaling out #1044

Karpenter should consider in-flight capacity when scaling out #1044

olemarkus commented Dec 24, 2021

ellistarn commented Dec 24, 2021

matti commented Jan 7, 2022

olemarkus commented Jan 7, 2022

ellistarn commented Feb 9, 2022

tzneal commented May 3, 2022

Karpenter should consider in-flight capacity when scaling out #1044

Karpenter should consider in-flight capacity when scaling out #1044

Comments

olemarkus commented Dec 24, 2021

Community Note

ellistarn commented Dec 24, 2021

matti commented Jan 7, 2022

olemarkus commented Jan 7, 2022

ellistarn commented Feb 9, 2022

tzneal commented May 3, 2022