Incorrect Number of maxPods / Node Pods Capacity #6890

msvechla · 2024-08-28T12:50:26Z

Description

Observed Behavior:

Since we upgraded to Karpenter v1 we observed incorrect kubelet maxPods settings for multiple nodes. We initially only noticed the issue with m7a.medium instances, however today we also had a case with an r7a.medium instance.

The issue becomes visible when multiple pods on a node in the cluster are stuck in initializing with:

Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "850cdbed09a9f986b2370c7409fb9e5ee782846056ec7466fb13e863f6e225ad": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container

Checking the node, it immediately becomes obvious that too many pods have been scheduled on it, and the node is running out of IP addresses.

In the example with m7a.medium we observed multiple nodes in the same cluster (all m7a.medium) with a different status.capacity.pods specified.

We observed nodes with 8, 58 and 29 maxPods in the cluster.

According to https://github.com/awslabs/amazon-eks-ami/blob/main/templates/shared/runtime/eni-max-pods.txt#L518 the correct number should be 8. So the nodes which had a higher number specified ran into the issue mentioned above.

Logging into the nodes and checking the kubelet config revealed the following:

[root@ip]# cat /etc/kubernetes/kubelet/config.json.d/00-nodeadm.conf |grep maxPods
    "maxPods": 29,

[root@ip]# cat /etc/kubernetes/kubelet/config.json |grep maxPods
  "maxPods": 8,

So it appears that the correct value is specified in /etc/kubernetes/kubelet/config.json but overwritten in /etc/kubernetes/kubelet/config.json.d/00-nodeadm.conf.

We use AL2023 and we do not specify any value for podsPerCore in our karpenter resources or similar.

As we had different nodes of the same instance type with varying values, this could also be some kind of race condition or similar.

Expected Behavior:

Calculated maxPods matches value in https://github.com/awslabs/amazon-eks-ami/blob/main/templates/shared/runtime/eni-max-pods.txt

Reproduction Steps (Please include YAML):

Used EC2NodeClass

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiSelectorTerms:
  - alias: al2023@v20240807
  blockDeviceMappings:
  - deviceName: /dev/xvda
    ebs:
      deleteOnTermination: true
      encrypted: true
      volumeSize: 30Gi
      volumeType: gp3
  detailedMonitoring: true
  instanceProfile: karpenter
  kubelet:
    kubeReserved:
      cpu: 200m
      ephemeral-storage: 1Gi
      memory: 200Mi
    systemReserved:
      cpu: 100m
      ephemeral-storage: 1Gi
      memory: 200Mi
  metadataOptions:
    httpEndpoint: enabled
    httpProtocolIPv6: disabled
    httpPutResponseHopLimit: 2
    httpTokens: required
  securityGroupSelectorTerms:
  - name: karpenter
  subnetSelectorTerms:
  - tags:
      Name: private
  userData: |
    #!/bin/bash

    # https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/docs/faq.md#6-minute-delays-in-attaching-volumes
    # https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/1955
    echo -e "InhibitDelayMaxSec=45\n" >> /etc/systemd/logind.conf
    systemctl restart systemd-logind
    echo "$(jq ".shutdownGracePeriod=\"400s\"" /etc/kubernetes/kubelet/config.json)" > /etc/kubernetes/kubelet/config.json
    echo "$(jq ".shutdownGracePeriodCriticalPods=\"100s\"" /etc/kubernetes/kubelet/config.json)" > /etc/kubernetes/kubelet/config.json
    systemctl restart kubelet

Versions:

Chart Version: v1.0.1
Kubernetes Version (kubectl version): v1.29.6-eks-db838b0

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

rschalo · 2024-08-29T18:18:03Z

It looks like this problem is related to https://karpenter.sh/v1.0/troubleshooting/#maxpods-is-greater-than-the-nodes-supported-pod-density.

I'll point out that some of the language there needs to be updated, for example I believe NodePods in Solution 2 was meant to point to NodePools and the pod density section now directs to the EC2NodeClass kubelet config section since it was moved there from NodePools in v1.

Please share an update if the problem persists after updating the kubelet spec or enabling prefix delegation.

msvechla · 2024-08-29T19:35:41Z

I'm not quite sure what you mean. I posted my kubelet spec / the entire EC2NodeClass in the original post above. We are not specifying any maxPods as is mentioned in the troubleshooting guide, so it must mean karpenter is setting an incorrect amount.

Or did I misunderstand something?

We are not using prefix delegation, and according to the docs it should also not be required.

Can you share what exactly we should update in the kubelet config?

It is also weird that karpenter sets a different pod capacity for different nodes of the same instance type in the cluster, so to me this still looks like a bug.

waihong · 2024-08-30T00:40:06Z

We are encountering a similar problem that began with the upgrade to v1.0.0. We have noticed an excessive number of pods being scheduled on t3.small/t3a.small instances. Our kubelet configuration does not specify any maxPods settings as well.

iharris-luno · 2024-08-30T14:13:01Z

We're also seeing this issue after upgrading to v1.0.0. Around 10% of new nodes have wildly high allocatable pods (eg 205 for a c6a.2xlarge), whereas mostly the calculations are correct (ie 44 for a c6a.2xlarge, as we have RESERVED_ENIS=1 in the karpenter controller).
We've had to hardcode maxPods:44 in our EC2NodeClass to prevent hundreds of pods getting stuck in FailedCreatePodSandBox status.
I can confirm that the affected nodes have an incorrect maxPods value in the # Karpenter Generated NodeConfig of the instance user-data. (So AL2023 / kubelet is doing what it's told, and the problem is in karpenter's maxPods calculations)
I can reproduce this issue on multiple AWS accounts / EKS clusters / Regions / Instance families, and it affects both AL2 and AL2023 AMI families, and both with and without RESERVED_ENIS set in the karpenter controller.

iharris-luno · 2024-08-30T17:29:31Z

It appears to be related to the presence or absence of a kubelet stanza in the EC2NodeClass...

Reproduction Steps:
Create a deployment with 50 replicas, with node anti-affinity, in a nodepool which uses the following EC2NodeClass...

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: iharris
spec:
  amiSelectorTerms:
  - alias: al2023@latest
  blockDeviceMappings:
  - deviceName: /dev/xvda
    ebs:
      encrypted: true
      kmsKeyID: <redacted>
      volumeSize: 150Gi
      volumeType: gp3
  role: karpenter-node-role.<redacted>
  securityGroupSelectorTerms:
  - tags:
      karpenter.sh/discovery: staging
  subnetSelectorTerms:
  - tags:
      karpenter.sh/discovery: staging

All 50 nodes have the correct .status.allocatable.pods - yay!

Change the EC2NodeClass to...

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: iharris
spec:
  amiSelectorTerms:
  - alias: al2023@latest
  blockDeviceMappings:
  - deviceName: /dev/xvda
    ebs:
      encrypted: true
      kmsKeyID: <redacted>
      volumeSize: 150Gi
      volumeType: gp3
  kubelet:
    imageGCLowThresholdPercent: 65
  role: karpenter-node-role.<redacted>
  securityGroupSelectorTerms:
  - tags:
      karpenter.sh/discovery: staging
  subnetSelectorTerms:
  - tags:
      karpenter.sh/discovery: staging

Around 5-10% of the 50 nodes have an incorrect .status.allocatable.pods - boo!
(Nothing special about imageGCLowThresholdPercent, it seems to be the presence of spec.kubelet that triggers the behaviour.)

I think we need that bug label back, sorry!

engedaam · 2024-08-30T17:58:23Z

Can you share your NodePool? do you have the compatibility.karpenter.sh/v1beta1-kubelet-conversion annotation set on the nodepool?

iharris-luno · 2024-08-30T18:17:21Z

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  annotations:
    karpenter.sh/nodepool-hash: "15612137669406834936"
    karpenter.sh/nodepool-hash-version: v3
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"karpenter.sh/v1","kind":"NodePool","metadata":{"annotations":{},"name":"iharris"},"spec":{"disruption":{"budgets":[{"nodes":"100%"}],"consolidateAfter":"1m","consolidationPolicy":"WhenEmptyOrUnderutilized"},"limits":{"cpu":"500","memory":"2000Gi"},"template":{"metadata":{"labels":{"role":"iharris"}},"spec":{"expireAfter":"1h","nodeClassRef":{"group":"karpenter.k8s.aws","kind":"EC2NodeClass","name":"iharris"},"requirements":[{"key":"karpenter.k8s.aws/instance-category","operator":"In","values":["c","m","r"]},{"key":"karpenter.k8s.aws/instance-generation","operator":"In","values":["5","6"]},{"key":"karpenter.k8s.aws/instance-cpu","operator":"Gt","values":["7"]},{"key":"kubernetes.io/os","operator":"In","values":["linux"]},{"key":"kubernetes.io/arch","operator":"In","values":["amd64"]},{"key":"karpenter.sh/capacity-type","operator":"In","values":["on-demand"]}],"taints":[{"effect":"NoSchedule","key":"iharris","value":"true"}]}}}}
  creationTimestamp: "2024-08-29T15:29:45Z"
  generation: 4
  name: iharris
  resourceVersion: "864235779"
  uid: <redacted>
spec:
  disruption:
    budgets:
    - nodes: 100%
    consolidateAfter: 1m
    consolidationPolicy: WhenEmptyOrUnderutilized
  limits:
    cpu: "500"
    memory: 2000Gi
  template:
    metadata:
      labels:
        role: iharris
    spec:
      expireAfter: 1h
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: iharris
      requirements:
      - key: karpenter.k8s.aws/instance-category
        operator: In
        values:
        - c
        - m
        - r
      - key: karpenter.k8s.aws/instance-generation
        operator: In
        values:
        - "5"
        - "6"
      - key: karpenter.k8s.aws/instance-cpu
        operator: Gt
        values:
        - "7"
      - key: kubernetes.io/os
        operator: In
        values:
        - linux
      - key: kubernetes.io/arch
        operator: In
        values:
        - amd64
      - key: karpenter.sh/capacity-type
        operator: In
        values:
        - on-demand
      taints:
      - effect: NoSchedule
        key: iharris
        value: "true"
status:
  conditions:
  - lastTransitionTime: "2024-08-29T15:29:45Z"
    message: ""
    reason: NodeClassReady
    status: "True"
    type: NodeClassReady
  - lastTransitionTime: "2024-08-29T15:29:45Z"
    message: ""
    reason: Ready
    status: "True"
    type: Ready
  - lastTransitionTime: "2024-08-29T15:29:45Z"
    message: ""
    reason: ValidationSucceeded
    status: "True"
    type: ValidationSucceeded
  resources:
    cpu: "0"
    ephemeral-storage: "0"
    memory: "0"
    nodes: "0"
    pods: "0"

That's a new nodepool, created to test this issue. The old nodepools that were upgraded from v0.35.7 have eg a compatibility.karpenter.sh/v1beta1-nodeclass-reference: '{"name":"default"}' annotation, but none have compatibility.karpenter.sh/v1beta1-kubelet-conversion annotations.

engedaam · 2024-08-30T21:07:43Z

Can you provide all your NodePool and EC2NodeClass in the cluster?

iharris-luno · 2024-08-31T07:58:33Z

Sure thing, here's the -oyaml from the cluster I'm currently testing in: issue-6890-resources.txt. I've reproduced the issue in both the pre-upgrade default, and the post-upgrade iharris ec2nc/nodepools.

msvechla · 2024-09-02T12:17:56Z

Could it be related to #6167 which was included in v0.37.0? It mentions data races and to me this looks like a data race, as nodes of the exact same instance type have a different value assigned. As part of the v1 upgrade we also updated from v0.36.2 to the latest v0.37.x

EDIT: Its probably unrelated, as our clusters on v0.37.x have not shown this issue so far, only clusters on v1.x

msvechla · 2024-09-03T13:28:14Z

Something else I noticed:

The NodeClaim of the affected nodes has the correct value in .status.capacity.pods, just the matching Node has an incorrect value for .status.capacity.pods

@iharris-luno what instance types have been affected in your case? Also r7a.medium and m7a.medium?

iharris-luno · 2024-09-04T08:57:43Z

We've seen the issue in c6a.2xlarge and r5a.2xlarge instances.
Good spot on the NodeClaim vs Node versions of .status.capacity.pods. However it doesn't seem that the NodeClaims are always correct... I just found a NodeClaim with an incorrect .status.capacity.pods:205.

engedaam · 2024-09-09T19:59:01Z

@iharris-luno I used you configuration and I was not able to replicate the issue. Do you think you can share the node and nodeclaims that were impacted by the issue?

caiohasouza · 2024-09-10T14:49:23Z

Hi,

I have the same issue with a t3.small instance:

nodeClaim.status.allocatable:
    Cpu:                  1930m
    Ephemeral - Storage:  35Gi
    Memory:               1418Mi
    Pods:                 11

node.status.allocatable:
    cpu:                1930m
    ephemeral-storage:  37569620724
    hugepages-1Gi:      0
    hugepages-2Mi:      0
    memory:             1483068Ki
    pods:               8

I'm using 1.0.1 version but i tested with 1.0.2 version too.

Regards

iharris-luno · 2024-09-10T15:21:53Z

I've just spun up 2000 c6a.2xlarge nodes in batches of 50, and not one of them had an incorrect NodeClaim. (If I'd realised how rare they were, compared to incorrect Nodes, I'd have grabbed the yaml of the one I found previously!). Plenty of incorrect nodes though (225 / 2000), so here's one of them and its associated nodeclaim...
node-1.zip

k24dizzle · 2024-09-11T18:03:12Z

Saw these values on a r7a.medium

node.status

Allocatable:
  cpu:                940m
  ephemeral-storage:  95551679124
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             7467144Ki
  pods:               58

nodeclaim.status

  Allocatable:
    Cpu:                        940m
    Ephemeral - Storage:        89Gi
    Memory:                     7134Mi
    Pods:                       8
    vpc.amazonaws.com/pod-eni:  4

jonathan-innis · 2024-09-16T06:41:13Z

Unsure yet if it's related but we did track down a solve for #6987 which is available here

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 021119463062.dkr.ecr.us-east-1.amazonaws.com

helm upgrade --install karpenter oci://021119463062.dkr.ecr.us-east-1.amazonaws.com/karpenter/snapshot/karpenter --version "0-2f61ca341eaf5f220a0e70ee12c5d6d6c6c00438" --namespace "kube-system" --create-namespace \
  --set "settings.clusterName=${CLUSTER_NAME}" \
  --set "settings.interruptionQueue=${CLUSTER_NAME}" \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi \
  --set controller.resources.limits.cpu=1 \
  --set controller.resources.limits.memory=1Gi \
  --wait

(see #7013)

jonathan-innis · 2024-09-16T07:17:30Z

I'm moving this to burning, given the number of different issues and folks that this is impacting.

jonathan-innis · 2024-09-16T07:45:08Z

Ok, I have some good news: I have an initial hypothesis about what's going on and it looks related to this line: https://github.com/aws/karpenter-provider-aws/blob/release-v1.0.x/pkg/providers/amifamily/resolver.go#L210.

What it seems to come down to is that this function is returning back a pointer which we are then mutating on L221 in some cases. This would be fine if we were only calling this function once and the NodeClass wasn't getting used elsewhere throughout the code, but because we are mutating the original object and not just reading it, that's most likely affecting our consistent view of the object throughout the code.

From looking at the code, I could reason about the following order of operations:

The first launch template has a kubeletConfiguration but doesn't have a maxPods defined, so we resolve it and and then update the kubeletConfiguration in place
Subsequent accesses of the kubeletConfiguration then use the same maxPods value.

This also explains why you only see this issue when you set kubeletConfig -- that's the only time when we don't create a new pointer and use the existing pointer.

Still validating, but if that's the case, should be a pretty easy fix -- just a tough thing to see :)

jonathan-innis · 2024-09-16T08:19:01Z

Confirmed, that's exactly what's happening. Added some print lines and this is what I see with the existing code (you actually see it returning the different value for the same instance type for different NodeClaims)

...
         // nolint:gosec
	// We know that it's not possible to have values that would overflow int32 here since we control
	// the maxPods values that we pass in here
	if kubeletConfig.MaxPods == nil {
		fmt.Printf("NodeClaim: %s. We should hit this every time\n", nodeClaim.Name)
		kubeletConfig.MaxPods = lo.ToPtr(int32(maxPods))
	}
	fmt.Printf("NodeClaim: %s, Generated MaxPods: %d, Used MaxPods: %d\n", nodeClaim.Name, maxPods, lo.FromPtr(kubeletConfig.MaxPods))
...

NodeClaim: nodes-default-amd64-cjrgj, Generated MaxPods: 58, Used MaxPods: 58
NodeClaim: nodes-default-amd64-cjrgj, Generated MaxPods: 234, Used MaxPods: 58
NodeClaim: nodes-default-amd64-9fqc5. We should hit this every time
NodeClaim: nodes-default-amd64-9fqc5, Generated MaxPods: 58, Used MaxPods: 58
NodeClaim: nodes-default-amd64-9fqc5, Generated MaxPods: 234, Used MaxPods: 58
NodeClaim: nodes-default-amd64-7d5tc. We should hit this every time
NodeClaim: nodes-default-amd64-7d5tc, Generated MaxPods: 58, Used MaxPods: 58
NodeClaim: nodes-default-amd64-7d5tc, Generated MaxPods: 234, Used MaxPods: 58
NodeClaim: nodes-default-amd64-cllx9. We should hit this every time
NodeClaim: nodes-default-amd64-cllx9, Generated MaxPods: 234, Used MaxPods: 234
NodeClaim: nodes-default-amd64-cllx9, Generated MaxPods: 58, Used MaxPods: 234
NodeClaim: nodes-default-amd64-wtbd9. We should hit this every time
NodeClaim: nodes-default-amd64-wtbd9, Generated MaxPods: 58, Used MaxPods: 58
NodeClaim: nodes-default-amd64-wtbd9, Generated MaxPods: 234, Used MaxPods: 58
NodeClaim: nodes-default-amd64-cj8jr. We should hit this every time
NodeClaim: nodes-default-amd64-cj8jr, Generated MaxPods: 58, Used MaxPods: 58
NodeClaim: nodes-default-amd64-cj8jr, Generated MaxPods: 234, Used MaxPods: 58

And when I change the pointer to be deep-copied.

...
	ret, err := utils.GetKubeletConfigurationWithNodeClaim(nodeClaim, nodeClass)
	if err != nil {
		return nil, fmt.Errorf("resolving kubelet configuration, %w", err)
	}
	kubeletConfig := &v1.KubeletConfiguration{}
	if ret != nil {
		kubeletConfig = ret.DeepCopy()
	}
         // nolint:gosec
	// We know that it's not possible to have values that would overflow int32 here since we control
	// the maxPods values that we pass in here
	if kubeletConfig.MaxPods == nil {
		fmt.Printf("NodeClaim: %s. We should hit this every time\n", nodeClaim.Name)
		kubeletConfig.MaxPods = lo.ToPtr(int32(maxPods))
	}
	fmt.Printf("NodeClaim: %s, Generated MaxPods: %d, Used MaxPods: %d\n", nodeClaim.Name, maxPods, lo.FromPtr(kubeletConfig.MaxPods))
...

NodeClaim: nodes-default-amd64-gbczg. We should hit this every time
NodeClaim: nodes-default-amd64-gbczg, Generated MaxPods: 58, Used MaxPods: 58
NodeClaim: nodes-default-amd64-gbczg. We should hit this every time
NodeClaim: nodes-default-amd64-gbczg, Generated MaxPods: 234, Used MaxPods: 234
NodeClaim: nodes-default-amd64-r6vqc. We should hit this every time
NodeClaim: nodes-default-amd64-r6vqc, Generated MaxPods: 58, Used MaxPods: 58
NodeClaim: nodes-default-amd64-r6vqc. We should hit this every time
NodeClaim: nodes-default-amd64-r6vqc, Generated MaxPods: 234, Used MaxPods: 234
NodeClaim: nodes-default-amd64-7p5hk. We should hit this every time
NodeClaim: nodes-default-amd64-7p5hk, Generated MaxPods: 58, Used MaxPods: 58
NodeClaim: nodes-default-amd64-7p5hk. We should hit this every time
NodeClaim: nodes-default-amd64-7p5hk, Generated MaxPods: 234, Used MaxPods: 234

jonathan-innis · 2024-09-16T08:21:34Z

We'll raise something and get some testing out for it tomorrow morning PST time but for now it looks like we can actually make progress towards a patch 🎉

jonathan-innis · 2024-09-16T16:55:27Z

PR has been raised. You should be able to try the snapshot with

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 021119463062.dkr.ecr.us-east-1.amazonaws.com

helm upgrade --install karpenter oci://021119463062.dkr.ecr.us-east-1.amazonaws.com/karpenter/snapshot/karpenter --version "0-cd04d65077eaed45e212e2140c0081768f3de547" --namespace "kube-system" --create-namespace \
  --set "settings.clusterName=${CLUSTER_NAME}" \
  --set "settings.interruptionQueue=${CLUSTER_NAME}" \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi \
  --set controller.resources.limits.cpu=1 \
  --set controller.resources.limits.memory=1Gi \
  --wait

For those willing to try -- let me know if you see the issue after the new install.

iharris-luno · 2024-09-17T13:26:30Z

Looking good! 500 nodes created so far with no maxPods issues in either node or nodeclaim resources. I'll leave it churning for a bit, just in case, but looks like the problem's fixed. 🎉 Thank you!

jonathan-innis · 2024-09-18T01:17:53Z

#7020 merged! So I think we are good to close this out now. We should have a patch that includes this soon! Please continue to post on this issue if you see any more issues with this, but from what I'm hearing, this appears to be resolved!

guitmz · 2024-09-24T12:13:31Z

@jonathan-innis when is the release expected? we are facing this issue now and karpenter can't spawn new machines. Adding/removing maxPods does not help

msvechla · 2024-09-26T10:03:46Z

@jonathan-innis v1.0.3 has been released yesterday, but it looks like this is still not part of the release. Is there a specific reason for it? It looks like this is affecting quite a few user.

caiohasouza · 2024-09-30T16:50:29Z

Version 1.0.4 was released, but we don't have this fix either.

aoi1 · 2024-10-04T01:20:02Z

After upgrading Karpenter to v1.0.1 in our environment, we encountered a significant issue. This problem has a major impact on our environment, and we cannot proceed with upgrading Karpenter to v1 until it is resolved. We would appreciate it if you could inform us in which release the fix will be included.

engedaam · 2024-10-04T02:54:36Z

@caiohasouza did you upgrade to use v1.0.4? The fix was included in that version https://github.com/aws/karpenter-provider-aws/releases/tag/v1.0.4

caiohasouza · 2024-10-07T10:52:53Z

@engedaam, I upgraded to v1.0.6 today. If the issue persists, I will update here.

Thank you!

mohammed-nazim · 2024-10-08T05:49:39Z

Hi Team,

After upgrading to 1.0.6, I am still receiving this error :

{"level":"ERROR","time":"2024-10-08T05:47:02.546Z","logger":"controller","message":"consistency error","commit":"6174c75","controller":"nodeclaim.consistency","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"XXXXXXXXXXXXXXXXXX"},"namespace":"","name":"XXXXXXXXXXXXXXXXXX","reconcileID":"XXXXXXXXXXXXXXXXXXXXXXXXXX","error":"expected 234 of resource pods, but found 58 (24.8% of expected)"}

msvechla added bug Something isn't working needs-triage Issues that need to be triaged labels Aug 28, 2024

rschalo removed the bug Something isn't working label Aug 29, 2024

engedaam added the bug Something isn't working label Aug 30, 2024

engedaam self-assigned this Sep 9, 2024

jonathan-innis mentioned this issue Sep 16, 2024

FailedConsistencyCheck: expected 58 of resource pods, but found 18 (31.0% of expected) #7011

Closed

jonathan-innis mentioned this issue Sep 16, 2024

use-max-pods is disabled by default on AL2 and wrong max-pods is set on the node #6979

Closed

jonathan-innis added burning Time sensitive issues and removed needs-triage Issues that need to be triaged labels Sep 16, 2024

jonathan-innis mentioned this issue Sep 16, 2024

fix: Fix pointer mutation causing incorrect maxPods values #7020

Merged

3 tasks

jonathan-innis self-assigned this Sep 16, 2024

jonathan-innis closed this as completed Sep 18, 2024

irizzant mentioned this issue Sep 20, 2024

Decouple MaxPods from CNI Logic bottlerocket-os/bottlerocket#1721

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect Number of maxPods / Node Pods Capacity #6890

Incorrect Number of maxPods / Node Pods Capacity #6890

msvechla commented Aug 28, 2024

rschalo commented Aug 29, 2024 •

edited

Loading

msvechla commented Aug 29, 2024 •

edited

Loading

waihong commented Aug 30, 2024

iharris-luno commented Aug 30, 2024 •

edited

Loading

iharris-luno commented Aug 30, 2024

engedaam commented Aug 30, 2024

iharris-luno commented Aug 30, 2024 •

edited

Loading

engedaam commented Aug 30, 2024

iharris-luno commented Aug 31, 2024

msvechla commented Sep 2, 2024 •

edited

Loading

msvechla commented Sep 3, 2024 •

edited

Loading

iharris-luno commented Sep 4, 2024

engedaam commented Sep 9, 2024

caiohasouza commented Sep 10, 2024

iharris-luno commented Sep 10, 2024

k24dizzle commented Sep 11, 2024

jonathan-innis commented Sep 16, 2024

jonathan-innis commented Sep 16, 2024

jonathan-innis commented Sep 16, 2024 •

edited

Loading

jonathan-innis commented Sep 16, 2024 •

edited

Loading

jonathan-innis commented Sep 16, 2024

jonathan-innis commented Sep 16, 2024

iharris-luno commented Sep 17, 2024

jonathan-innis commented Sep 18, 2024

guitmz commented Sep 24, 2024 •

edited

Loading

msvechla commented Sep 26, 2024

caiohasouza commented Sep 30, 2024

aoi1 commented Oct 4, 2024

engedaam commented Oct 4, 2024 •

edited

Loading

caiohasouza commented Oct 7, 2024

mohammed-nazim commented Oct 8, 2024

Incorrect Number of maxPods / Node Pods Capacity #6890

Incorrect Number of maxPods / Node Pods Capacity #6890

Comments

msvechla commented Aug 28, 2024

Description

rschalo commented Aug 29, 2024 • edited Loading

msvechla commented Aug 29, 2024 • edited Loading

waihong commented Aug 30, 2024

iharris-luno commented Aug 30, 2024 • edited Loading

iharris-luno commented Aug 30, 2024

engedaam commented Aug 30, 2024

iharris-luno commented Aug 30, 2024 • edited Loading

engedaam commented Aug 30, 2024

iharris-luno commented Aug 31, 2024

msvechla commented Sep 2, 2024 • edited Loading

msvechla commented Sep 3, 2024 • edited Loading

iharris-luno commented Sep 4, 2024

engedaam commented Sep 9, 2024

caiohasouza commented Sep 10, 2024

iharris-luno commented Sep 10, 2024

k24dizzle commented Sep 11, 2024

jonathan-innis commented Sep 16, 2024

jonathan-innis commented Sep 16, 2024

jonathan-innis commented Sep 16, 2024 • edited Loading

jonathan-innis commented Sep 16, 2024 • edited Loading

jonathan-innis commented Sep 16, 2024

jonathan-innis commented Sep 16, 2024

iharris-luno commented Sep 17, 2024

jonathan-innis commented Sep 18, 2024

guitmz commented Sep 24, 2024 • edited Loading

msvechla commented Sep 26, 2024

caiohasouza commented Sep 30, 2024

aoi1 commented Oct 4, 2024

engedaam commented Oct 4, 2024 • edited Loading

caiohasouza commented Oct 7, 2024

mohammed-nazim commented Oct 8, 2024

rschalo commented Aug 29, 2024 •

edited

Loading

msvechla commented Aug 29, 2024 •

edited

Loading

iharris-luno commented Aug 30, 2024 •

edited

Loading

iharris-luno commented Aug 30, 2024 •

edited

Loading

msvechla commented Sep 2, 2024 •

edited

Loading

msvechla commented Sep 3, 2024 •

edited

Loading

jonathan-innis commented Sep 16, 2024 •

edited

Loading

jonathan-innis commented Sep 16, 2024 •

edited

Loading

guitmz commented Sep 24, 2024 •

edited

Loading

engedaam commented Oct 4, 2024 •

edited

Loading