min capacity draft #62

grosser · 2022-11-09T22:05:06Z

feat: allow setting minimum capacity per provisioner
fix: #749

Description
rough draft of how I think having a min capacity could work

... alternatively this could be done via "min nodes" setting and then computing that times the highest (or average) cpu of the matching instance types, but means more indirection

How was this change tested?
not

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

mattsre · 2022-11-09T23:29:02Z

If I understand it correctly this would only provision capacity up to whatever minimum was defined? If I have an existing provisioner which is already managing 20cpu of compute and I set minCpu to 4cpu, it would effectively do nothing right? I think this proposal is a bit too restrictive, and wouldn't support some of the ideas from #749 like "prewarming" easily (I'm of course biased because my need wouldn't be solved by this proposal either 😅).

If we implement headroom using similar code to what you have here and re-using my example from above, by setting cpuHeadroom to 4cpu with an existing capacity of 20cpu, we could check how much of the 20cpu we managed is reserved. Let's say 18cpu of the 20cpu capacity is reserved. From there we can determine we need to provision a new node with 2cpu to increase our total capacity to 22cpu to cover our 4cpu headroom. Something like this could also handle your minCpu usecase. If the provisioner is currently managing 0cpu and we set headroom to 10cpu, its functionally equivalent to minCpu. We'd identify the difference and provision new nodes until we had at least 10cpu of headroom. I will admit a fully-baked headroom implementation is more complicated to implement and performance intensive than something like the minCpu strategy.

grosser · 2022-11-09T23:35:51Z

yeah, our need was to have a min capacity, not a buffer
... so unless we find some clever in-between thing we might need both ...

ellistarn · 2022-11-10T17:09:32Z

Thoughts on building this capability via a separate CRD? I see this usecase as fundamentally different. Static Capacity vs Dynamic Capacity.

grosser · 2022-11-10T17:27:15Z

I'd not want to create a new CRD since this

seems like an option for the provisioner
the sheer overhead of managing a new CRD and getting that into the codebase seems overwhelming to me
I want to mirror the previous ASG workload as much as possible so having it behave in a similar way would be nice (that is for the min capacity, does not apply to the overhead capacity)

ellistarn · 2022-11-10T17:35:50Z

This is an interesting idea to as a similar mechanism to "limits". Essentially, we have a ceiling today, but not a floor. What type of capacity would you expect for the floor. How big should the nodes be? Which zones should they be launched in?

We'd love to collaborate on this. Can you ping me on slack and we can discuss next steps?

grosser · 2022-11-10T17:57:18Z

type: we'd set this based on cpu cores, but it might be simpler to say "number of nodes" because then we don't need to do all the cpu/memory math
nodes: same as specified in the provider (if nothing is specified that biggest instance type) ... we use a list of instance types and would want the first (preferred type)

... so maybe doing minNodes is simpler and then it creates a fake pod in the size of the preferred node (minus room for daemonsets etc)
that would mirror the ASG behavior nicely

ellistarn · 2022-11-10T18:55:34Z

How does this interact with consolidation / expiry?

grosser · 2022-11-10T19:04:00Z

The extra nodes would expire same as all others.
For consolidation it would need to do the same "fake pods" logic to not decide to remove the extra nodes.

ellistarn · 2022-11-11T01:48:27Z

One thing we discussed in working group was using negative preemption pods to solve this problem. These pods would be included in scale out calculations, but wouldn't themselves trigger scale out unless there were positive preemption pods. Similarly, we wouldn't consolidate nodes with negative preemption pods, but we would terminate a node if it only had negative preemption pods.

grosser · 2022-11-11T02:02:15Z

That sounds good to me.
Do you have a preferred direction in terms of how many/how big these pods should be ?

solidnerd · 2022-11-11T10:46:20Z

I also thought about the problem yesterday since we would have something like pre-warming nodes and schedule them so I can say from Mo-Fr minSize=2 and Sa-Su minSize=0.

My initial idea was to use a Kubernetes cronjob, placing a pause container with the fitting requests for CPU and Memory for the workload and have a configurable shutdown of the pause container (this is similar to pre-emption pods)

To be able for the perfect fitting, we need the option to specify the CPU and Memory for the fake pods.

grosser · 2022-11-11T16:33:05Z

Karpenter knows how much cpu/mem daemonsets use so it could schedule up to the biggest possible pods to fit onto the biggest supported instance type.
(so 50 cpus + 16xlarge -> just 1 pod)

ellistarn · 2022-11-11T20:12:12Z

Daemonset are definitely an interesting way to do node overhead! @tzneal

skillcoder · 2022-12-27T18:30:04Z

It is not enough to have ReservedCPU (MinCpu), also need at least ReservedMemory as some workload can use lot's of memory and not much CPU.
And of course it should be configured per "application group".
Also this feature should have ability to work with/like KEDA, to track some metrics and scale this values accordingly.
As reserved values can/will be dynamic.
For example should be higher during huge update waves, or during massive scaling events.
And lower during lower hours at night.

Without implementing all these requirements better to stay on overprovisioning dirty hack and manage it outside of Karpenter as for now it's more flexible.

grosser · 2022-12-27T20:05:20Z

min-memory only needs to be considered when using different instance families
per application group: this could be per provisioner
keda like features were not in scope for me and don't "need" to be added

ellistarn · 2022-12-27T20:08:29Z

I think this change warrants a design. See https://karpenter.sh/v0.20.0/contributing/design-guide/

njtran · 2023-03-02T18:59:04Z

Hey @grosser, thanks a ton for discussing with us on this PR. Unfortunately I don't think we'll be accepting this PR, due to the amount of design unknowns that come with this. There's a lot of moving parts in the deprovisioning logic where adding static capacity could create some funky circumstances. All in all, if you're willing to help work on this design, please come talk with us in https://kubernetes.slack.com/archives/C04JW2J5J5P, we're more than willing to help get you up to speed on the current state of the project.

min capacity draft

e1ce1d4

grosser mentioned this pull request Nov 9, 2022

Mega Issue: Manual node provisioning #749

Open

njtran added the needs-design label Jan 3, 2023

njtran closed this Mar 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

min capacity draft #62

min capacity draft #62

grosser commented Nov 9, 2022 •

edited

Loading

mattsre commented Nov 9, 2022

grosser commented Nov 9, 2022

ellistarn commented Nov 10, 2022

grosser commented Nov 10, 2022

ellistarn commented Nov 10, 2022

grosser commented Nov 10, 2022

ellistarn commented Nov 10, 2022

grosser commented Nov 10, 2022

ellistarn commented Nov 11, 2022

grosser commented Nov 11, 2022

solidnerd commented Nov 11, 2022

grosser commented Nov 11, 2022

ellistarn commented Nov 11, 2022

skillcoder commented Dec 27, 2022 •

edited

Loading

grosser commented Dec 27, 2022

ellistarn commented Dec 27, 2022 •

edited

Loading

njtran commented Mar 2, 2023

min capacity draft #62

min capacity draft #62

Conversation

grosser commented Nov 9, 2022 • edited Loading

mattsre commented Nov 9, 2022

grosser commented Nov 9, 2022

ellistarn commented Nov 10, 2022

grosser commented Nov 10, 2022

ellistarn commented Nov 10, 2022

grosser commented Nov 10, 2022

ellistarn commented Nov 10, 2022

grosser commented Nov 10, 2022

ellistarn commented Nov 11, 2022

grosser commented Nov 11, 2022

solidnerd commented Nov 11, 2022

grosser commented Nov 11, 2022

ellistarn commented Nov 11, 2022

skillcoder commented Dec 27, 2022 • edited Loading

grosser commented Dec 27, 2022

ellistarn commented Dec 27, 2022 • edited Loading

njtran commented Mar 2, 2023

grosser commented Nov 9, 2022 •

edited

Loading

skillcoder commented Dec 27, 2022 •

edited

Loading

ellistarn commented Dec 27, 2022 •

edited

Loading