-
Notifications
You must be signed in to change notification settings - Fork 989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Karpenter cannot fit workload on instance type where it should fit #1306
Comments
I printed our node overhead calculation (the amount of space reserved for system level kube resources)
So it does look like the pod should fit. I'll have to dig into this some more. ** My calcs were slightly off, but corrected the math and it still seems it should fit. |
Oh, we also dedicate a certain portion of memory to the VM resource (7.5% of the total memory of the instance). With that in mind, [r5.12xlarge - Total] We take the 7.5% of memory bringing it down to: memory = 393,216 * 0.075 = 363,724Mi and then we subtract the CPU and memory overhead which mainly takes into account the resources the kubelet needs based on the number of vcpus and ENIs the instance has (https://github.com/bottlerocket-os/bottlerocket#kubernetes-settings). Since the full of your pods is: the memory overflows our bin-packing by 6,828Mi The 7.5% memory overhead may be a little aggressive on our part. We want to be a little cautious so that pods don't get OOM Killed. |
I'm going to try and remove this 7.5% memory overhead tomorrow and replace it with a static overhead of memory. I believe this was only to protect packing on to really small instance types (micros and nanos). |
Thanks for the information and the fixes. I believe, the information about the memory overhead and how Karpenter computes if a workload can be placed on a node could be interesting to have in Karpenter docs. |
That's an excellent idea. I'll open a docs issue to add that content |
Labeled for closure due to inactivity in 10 days. |
Closing this out in favor of #1329 . |
Version
Karpenter: v0.6.1
Kubernetes: v1.20+
Expected Behavior
I expect Karpenter to be able to schedule a deployment on an instance type where the workload (resources) fits
Actual Behavior
I am trying to switch from ASG managed nodes to Karpenter. Currently it fails to fit one of our deployment (prometheus) on the same instance type as it was before in one of the ASG node (r5.12xlarge).
Our Prometheus deployment requests around 350GiB memory and 40 CPU, and a r5.12xlarge has 48 vCPU and 384 GiB as per AWS docs.
Extract of prometheus pod spec
Karpenter fails to run it on this specific instance type saying it won't fit.
Notes:
Steps to Reproduce the Problem
Resource Specs and Logs
Pod spec (prometheus, I included relevant part only)
ASG managed node running prometheus, showing prometheus is able to fit on r5.12xlarge
Karpenter logs
Prometheus provisioner spec
The text was updated successfully, but these errors were encountered: