Karpenter cannot fit workload on instance type where it should fit #1306

nonoswz · 2022-02-09T21:38:20Z

Version

Karpenter: v0.6.1

Kubernetes: v1.20+

Expected Behavior

I expect Karpenter to be able to schedule a deployment on an instance type where the workload (resources) fits

Actual Behavior

I am trying to switch from ASG managed nodes to Karpenter. Currently it fails to fit one of our deployment (prometheus) on the same instance type as it was before in one of the ASG node (r5.12xlarge).

Our Prometheus deployment requests around 350GiB memory and 40 CPU, and a r5.12xlarge has 48 vCPU and 384 GiB as per AWS docs.
Extract of prometheus pod spec

    resources:
      limits:
        memory: 350Gi
      requests:
        cpu: "40"
        memory: 350Gi

Karpenter fails to run it on this specific instance type saying it won't fit.

2022-02-09T20:11:34.749Z	ERROR	controller.provisioning	Failed to compute packing, pod(s) [monitoring/prometheus-infrastructure-0] did not fit in instance type option(s) [r5.12xlarge]	{"commit": "df57892", "provisioner": "prometheus"}

Notes:

we also have multiple damonsets running on each nodes (they are shown below in the node description), in case that comes in the calculation to see if the workload can fit.
I tried to use a bigger instance type (r5.16xlarge) and it works fine. We can use bigger nodes as a workaround, but ultimately it would be good to be able to get the smaller instance type working.

Steps to Reproduce the Problem

Create a deployment requesting 40vCPU and 350Gi memory
Create a provisioners with only one instance type : r5.12xlarge
Scale up replicas of this deployment to 1 and see if Karpenter is able to fit the deployment into a r5.12xlarge instance type.

Resource Specs and Logs

Pod spec (prometheus, I included relevant part only)

spec:
  containers:
   # container 1
    ....
    resources:
      limits:
        memory: 350Gi
      requests:
        cpu: "40"
        memory: 350Gi
   # container 2
    .....
    resources:
      limits:
        cpu: 100m
        memory: 25Mi
      requests:
        cpu: 100m
        memory: 25Mi
   # container 3
   ......
    resources:
      limits:
        cpu: 100m
        memory: 25Mi
      requests:
        cpu: 100m
        memory: 25Mi
  nodeSelector:
    group: prometheus
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  terminationGracePeriodSeconds: 600
  tolerations:
  - effect: NoSchedule
    key: dedicated
    operator: Equal
    value: prometheus
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300

ASG managed node running prometheus, showing prometheus is able to fit on r5.12xlarge

Name:              ********
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=r5.12xlarge
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=us-east-1
                    failure-domain.beta.kubernetes.io/zone=us-east-1b
                    group=prometheus
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=******
                    kubernetes.io/os=linux
                    node.kubernetes.io/instance-type=r5.12xlarge
                    topology.ebs.csi.aws.com/zone=us-east-1b
                    topology.kubernetes.io/region=us-east-1
                    topology.kubernetes.io/zone=us-east-1b
Annotations:        
                    csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"*********"}
                    node.alpha.kubernetes.io/ttl: 0
                    projectcalico.org/IPv4Address: **********
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Wed, 09 Feb 2022 15:11:45 -0500
Taints:             dedicated=prometheus:NoSchedule
Unschedulable:      false
Capacity:
  attachable-volumes-aws-ebs:  25
  cpu:                         48
  ephemeral-storage:           20959212Ki
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      391804104Ki
  pods:                        234
Allocatable:
  attachable-volumes-aws-ebs:  25
  cpu:                         47750m
  ephemeral-storage:           18241637770
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      400241034854400m
  pods:                        234
System Info:
  OS Image:                   Amazon Linux 2
  Operating System:           linux
  Architecture:               amd64
Non-terminated Pods:          (11 in total)
  Namespace                   Name                           CPU Requests  CPU Limits  Memory Requests  Memory Limits   AGE
  ---------                   ----                           ------------  ----------  ---------------  -------------   ---
  fluentd                           ***********                         400m (0%)     3 (6%)      6128Mi (1%)      7024Mi (1%)     2m28s
  kube-system                 aws-node-kr6qd                   10m (0%)      0 (0%)      0 (0%)           0 (0%)          2m28s
  kube-system                 calico-node-pfczj                 20m (0%)      0 (0%)      32Mi (0%)        0 (0%)          2m28s
  kube-system                 ebs-csi-node-vxjd2                   0 (0%)        0 (0%)      0 (0%)           0 (0%)          2m28s
  kube-system                 kube-proxy-48trm               100m (0%)     0 (0%)      0 (0%)           0 (0%)          2m28s
  kube-system                 node-local-dns-f2jdl           100m (0%)     1 (2%)      100Mi (0%)       1Gi (0%)        2m28s
  logging                         ***********                            1300m (2%)    2 (4%)      4224Mi (1%)      5Gi (1%)        2m28s
  monitoring                  ***********                               600m (1%)     1 (2%)      800Mi (0%)       0 (0%)          2m28s
  monitoring                  ***********                               200m (0%)     200m (0%)   768Mi (0%)       768Mi (0%)      2m28s
  monitoring                  ***********                                110m (0%)     220m (0%)   50Mi (0%)        90Mi (0%)       2m28s
  monitoring                  prometheus-infrastructure-0    40200m (84%)  200m (0%)   358450Mi (93%)   358450Mi (93%)  4m34s
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests        Limits
  --------                    --------        ------
  cpu                         43040m (90%)    7620m (15%)
  memory                      370552Mi (97%)  372476Mi (97%)
  ephemeral-storage           0 (0%)          0 (0%)
  hugepages-1Gi               0 (0%)          0 (0%)
  hugepages-2Mi               0 (0%)          0 (0%)
  attachable-volumes-aws-ebs  0               0

Karpenter logs

2022-02-09T20:11:28.741Z	INFO	controller.provisioning	Waiting for unschedulable pods	{"commit": "df57892", "provisioner": "prometheus"}
2022-02-09T20:11:34.743Z	INFO	controller.provisioning	Batched 1 pods in 1.000548092s	{"commit": "df57892", "provisioner": "prometheus"}
2022-02-09T20:11:34.749Z	ERROR	controller.provisioning	Failed to compute packing, pod(s) [monitoring/prometheus-infrastructure-0] did not fit in instance type option(s) [r5.12xlarge]	{"commit": "df57892", "provisioner": "prometheus"}

Prometheus provisioner spec

spec:
  kubeletConfiguration: {}
  labels:
    group: prometheus
  limits: {}
  provider:
    apiVersion: extensions.karpenter.sh/v1alpha1
    instanceProfile: ***********
    kind: AWS
    launchTemplate: ***********
    securityGroupSelector:
      kubernetes.io/cluster/***********: '*'
    subnetSelector:
      Name: '*private*'
  requirements:
  - key: karpenter.sh/capacity-type
    operator: In
    values:
    - on-demand
  - key: node.kubernetes.io/instance-type
    operator: In
    values:
    - r5.12xlarge
  - key: topology.kubernetes.io/zone
    operator: In
    values:
    - us-east-1a
    - us-east-1b
    - us-east-1c
  - key: kubernetes.io/arch
    operator: In
    values:
    - amd64
  taints:
  - effect: NoSchedule
    key: dedicated
    value: prometheus
  ttlSecondsAfterEmpty: 30

The text was updated successfully, but these errors were encountered:

bwagner5 · 2022-02-09T23:26:37Z

I printed our node overhead calculation (the amount of space reserved for system level kube resources)

[r5.12xlarge - Total]
cpu = 48,000m
memory = 393,216Mi

[r5.12xlarge - Allocatable]
cpu = 47,750m
memory = 381,700Mi

[r5.12xlarge - Overhead]
cpu = 290m 
memory = 3,029Mi 

[r5.12xlarge - Other things on the node you pasted]
cpu = 2,840
memory = 12,102Mi

[r5.12xlarge - Available for the Pod]
cpu = 44,620m
memory = 366,569Mi

[Needed for Prometheus Pod]
cpu = 40,200m
memory = 358,450Mi

So it does look like the pod should fit. I'll have to dig into this some more.

** My calcs were slightly off, but corrected the math and it still seems it should fit.

bwagner5 · 2022-02-10T23:14:38Z

Oh, we also dedicate a certain portion of memory to the VM resource (7.5% of the total memory of the instance).

With that in mind,

[r5.12xlarge - Total]
cpu = 48,000m
memory = 393,216Mi

We take the 7.5% of memory bringing it down to:

memory = 393,216 * 0.075 = 363,724Mi

and then we subtract the CPU and memory overhead which mainly takes into account the resources the kubelet needs based on the number of vcpus and ENIs the instance has (https://github.com/bottlerocket-os/bottlerocket#kubernetes-settings).

Since the full of your pods is:
cpu = 43,040m
memory = 370,552Mi

the memory overflows our bin-packing by 6,828Mi

The 7.5% memory overhead may be a little aggressive on our part. We want to be a little cautious so that pods don't get OOM Killed.

bwagner5 · 2022-02-11T02:30:05Z

I'm going to try and remove this 7.5% memory overhead tomorrow and replace it with a static overhead of memory. I believe this was only to protect packing on to really small instance types (micros and nanos).

nonoswz · 2022-02-11T17:00:40Z

Thanks for the information and the fixes. I believe, the information about the memory overhead and how Karpenter computes if a workload can be placed on a node could be interesting to have in Karpenter docs.

bwagner5 · 2022-02-11T21:22:39Z

Thanks for the information and the fixes. I believe, the information about the memory overhead and how Karpenter computes if a workload can be placed on a node could be interesting to have in Karpenter docs.

That's an excellent idea. I'll open a docs issue to add that content

github-actions · 2022-05-09T12:12:26Z

Labeled for closure due to inactivity in 10 days.

dewjam · 2022-05-10T16:21:44Z

Closing this out in favor of #1329 .

nonoswz added the bug Something isn't working label Feb 9, 2022

felix-zhe-huang self-assigned this Feb 9, 2022

bwagner5 assigned bwagner5 and unassigned felix-zhe-huang Feb 9, 2022

ellistarn added burning Time sensitive issues bug Something isn't working and removed bug Something isn't working labels Feb 10, 2022

bwagner5 mentioned this issue Feb 10, 2022

Fix overhead calculation to correctly apply kube-reserved cpu #1317

Merged

3 tasks

This was referenced Feb 11, 2022

[AWS] Add docs on InstanceType Overhead Calculation #1329

Closed

[WIP]: use a static vm overhead for memory rather than 7.5% #1330

Closed

njtran removed the burning Time sensitive issues label Apr 18, 2022

github-actions bot added the lifecycle/stale label May 9, 2022

dewjam closed this as completed May 10, 2022

dgdevops mentioned this issue Feb 16, 2024

Karpenter cannot provision node with same instance type previously used by cluster-autoscaler #5676

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Karpenter cannot fit workload on instance type where it should fit #1306

Karpenter cannot fit workload on instance type where it should fit #1306

nonoswz commented Feb 9, 2022

bwagner5 commented Feb 9, 2022 •

edited

Loading

bwagner5 commented Feb 10, 2022

bwagner5 commented Feb 11, 2022

nonoswz commented Feb 11, 2022 •

edited

Loading

bwagner5 commented Feb 11, 2022

github-actions bot commented May 9, 2022

dewjam commented May 10, 2022

Karpenter cannot fit workload on instance type where it should fit #1306

Karpenter cannot fit workload on instance type where it should fit #1306

Comments

nonoswz commented Feb 9, 2022

Version

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Resource Specs and Logs

bwagner5 commented Feb 9, 2022 • edited Loading

bwagner5 commented Feb 10, 2022

bwagner5 commented Feb 11, 2022

nonoswz commented Feb 11, 2022 • edited Loading

bwagner5 commented Feb 11, 2022

github-actions bot commented May 9, 2022

dewjam commented May 10, 2022

bwagner5 commented Feb 9, 2022 •

edited

Loading

nonoswz commented Feb 11, 2022 •

edited

Loading