-
Notifications
You must be signed in to change notification settings - Fork 995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect Number of maxPods / Node Pods Capacity #6890
Comments
It looks like this problem is related to https://karpenter.sh/v1.0/troubleshooting/#maxpods-is-greater-than-the-nodes-supported-pod-density. I'll point out that some of the language there needs to be updated, for example I believe Please share an update if the problem persists after updating the kubelet spec or enabling prefix delegation. |
I'm not quite sure what you mean. I posted my kubelet spec / the entire Or did I misunderstand something? We are not using prefix delegation, and according to the docs it should also not be required. Can you share what exactly we should update in the kubelet config? It is also weird that karpenter sets a different pod capacity for different nodes of the same instance type in the cluster, so to me this still looks like a bug. |
We are encountering a similar problem that began with the upgrade to v1.0.0. We have noticed an excessive number of pods being scheduled on t3.small/t3a.small instances. Our kubelet configuration does not specify any maxPods settings as well. |
We're also seeing this issue after upgrading to v1.0.0. Around 10% of new nodes have wildly high allocatable pods (eg 205 for a c6a.2xlarge), whereas mostly the calculations are correct (ie 44 for a c6a.2xlarge, as we have RESERVED_ENIS=1 in the karpenter controller). |
It appears to be related to the presence or absence of a Reproduction Steps:
All 50 nodes have the correct Change the EC2NodeClass to...
Around 5-10% of the 50 nodes have an incorrect I think we need that |
Can you share your NodePool? do you have the |
That's a new nodepool, created to test this issue. The old nodepools that were upgraded from v0.35.7 have eg a |
Can you provide all your NodePool and EC2NodeClass in the cluster? |
Sure thing, here's the -oyaml from the cluster I'm currently testing in: issue-6890-resources.txt. I've reproduced the issue in both the pre-upgrade |
Could it be related to #6167 which was included in v0.37.0? It mentions data races and to me this looks like a data race, as nodes of the exact same instance type have a different value assigned. As part of the v1 upgrade we also updated from EDIT: Its probably unrelated, as our clusters on |
Something else I noticed: The @iharris-luno what instance types have been affected in your case? Also |
We've seen the issue in |
@iharris-luno I used you configuration and I was not able to replicate the issue. Do you think you can share the node and nodeclaims that were impacted by the issue? |
Hi, I have the same issue with a t3.small instance:
I'm using 1.0.1 version but i tested with 1.0.2 version too. Regards |
I've just spun up 2000 c6a.2xlarge nodes in batches of 50, and not one of them had an incorrect |
Saw these values on a node.status
nodeclaim.status
|
Unsure yet if it's related but we did track down a solve for #6987 which is available here aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 021119463062.dkr.ecr.us-east-1.amazonaws.com
helm upgrade --install karpenter oci://021119463062.dkr.ecr.us-east-1.amazonaws.com/karpenter/snapshot/karpenter --version "0-2f61ca341eaf5f220a0e70ee12c5d6d6c6c00438" --namespace "kube-system" --create-namespace \
--set "settings.clusterName=${CLUSTER_NAME}" \
--set "settings.interruptionQueue=${CLUSTER_NAME}" \
--set controller.resources.requests.cpu=1 \
--set controller.resources.requests.memory=1Gi \
--set controller.resources.limits.cpu=1 \
--set controller.resources.limits.memory=1Gi \
--wait (see #7013) |
I'm moving this to burning, given the number of different issues and folks that this is impacting. |
Ok, I have some good news: I have an initial hypothesis about what's going on and it looks related to this line: https://github.com/aws/karpenter-provider-aws/blob/release-v1.0.x/pkg/providers/amifamily/resolver.go#L210. What it seems to come down to is that this function is returning back a pointer which we are then mutating on L221 in some cases. This would be fine if we were only calling this function once and the NodeClass wasn't getting used elsewhere throughout the code, but because we are mutating the original object and not just reading it, that's most likely affecting our consistent view of the object throughout the code. From looking at the code, I could reason about the following order of operations:
This also explains why you only see this issue when you set kubeletConfig -- that's the only time when we don't create a new pointer and use the existing pointer. Still validating, but if that's the case, should be a pretty easy fix -- just a tough thing to see :) |
Confirmed, that's exactly what's happening. Added some print lines and this is what I see with the existing code (you actually see it returning the different value for the same instance type for different NodeClaims) ...
// nolint:gosec
// We know that it's not possible to have values that would overflow int32 here since we control
// the maxPods values that we pass in here
if kubeletConfig.MaxPods == nil {
fmt.Printf("NodeClaim: %s. We should hit this every time\n", nodeClaim.Name)
kubeletConfig.MaxPods = lo.ToPtr(int32(maxPods))
}
fmt.Printf("NodeClaim: %s, Generated MaxPods: %d, Used MaxPods: %d\n", nodeClaim.Name, maxPods, lo.FromPtr(kubeletConfig.MaxPods))
... NodeClaim: nodes-default-amd64-cjrgj, Generated MaxPods: 58, Used MaxPods: 58
NodeClaim: nodes-default-amd64-cjrgj, Generated MaxPods: 234, Used MaxPods: 58
NodeClaim: nodes-default-amd64-9fqc5. We should hit this every time
NodeClaim: nodes-default-amd64-9fqc5, Generated MaxPods: 58, Used MaxPods: 58
NodeClaim: nodes-default-amd64-9fqc5, Generated MaxPods: 234, Used MaxPods: 58
NodeClaim: nodes-default-amd64-7d5tc. We should hit this every time
NodeClaim: nodes-default-amd64-7d5tc, Generated MaxPods: 58, Used MaxPods: 58
NodeClaim: nodes-default-amd64-7d5tc, Generated MaxPods: 234, Used MaxPods: 58
NodeClaim: nodes-default-amd64-cllx9. We should hit this every time
NodeClaim: nodes-default-amd64-cllx9, Generated MaxPods: 234, Used MaxPods: 234
NodeClaim: nodes-default-amd64-cllx9, Generated MaxPods: 58, Used MaxPods: 234
NodeClaim: nodes-default-amd64-wtbd9. We should hit this every time
NodeClaim: nodes-default-amd64-wtbd9, Generated MaxPods: 58, Used MaxPods: 58
NodeClaim: nodes-default-amd64-wtbd9, Generated MaxPods: 234, Used MaxPods: 58
NodeClaim: nodes-default-amd64-cj8jr. We should hit this every time
NodeClaim: nodes-default-amd64-cj8jr, Generated MaxPods: 58, Used MaxPods: 58
NodeClaim: nodes-default-amd64-cj8jr, Generated MaxPods: 234, Used MaxPods: 58 And when I change the pointer to be deep-copied. ...
ret, err := utils.GetKubeletConfigurationWithNodeClaim(nodeClaim, nodeClass)
if err != nil {
return nil, fmt.Errorf("resolving kubelet configuration, %w", err)
}
kubeletConfig := &v1.KubeletConfiguration{}
if ret != nil {
kubeletConfig = ret.DeepCopy()
}
// nolint:gosec
// We know that it's not possible to have values that would overflow int32 here since we control
// the maxPods values that we pass in here
if kubeletConfig.MaxPods == nil {
fmt.Printf("NodeClaim: %s. We should hit this every time\n", nodeClaim.Name)
kubeletConfig.MaxPods = lo.ToPtr(int32(maxPods))
}
fmt.Printf("NodeClaim: %s, Generated MaxPods: %d, Used MaxPods: %d\n", nodeClaim.Name, maxPods, lo.FromPtr(kubeletConfig.MaxPods))
... NodeClaim: nodes-default-amd64-gbczg. We should hit this every time
NodeClaim: nodes-default-amd64-gbczg, Generated MaxPods: 58, Used MaxPods: 58
NodeClaim: nodes-default-amd64-gbczg. We should hit this every time
NodeClaim: nodes-default-amd64-gbczg, Generated MaxPods: 234, Used MaxPods: 234
NodeClaim: nodes-default-amd64-r6vqc. We should hit this every time
NodeClaim: nodes-default-amd64-r6vqc, Generated MaxPods: 58, Used MaxPods: 58
NodeClaim: nodes-default-amd64-r6vqc. We should hit this every time
NodeClaim: nodes-default-amd64-r6vqc, Generated MaxPods: 234, Used MaxPods: 234
NodeClaim: nodes-default-amd64-7p5hk. We should hit this every time
NodeClaim: nodes-default-amd64-7p5hk, Generated MaxPods: 58, Used MaxPods: 58
NodeClaim: nodes-default-amd64-7p5hk. We should hit this every time
NodeClaim: nodes-default-amd64-7p5hk, Generated MaxPods: 234, Used MaxPods: 234 |
We'll raise something and get some testing out for it tomorrow morning PST time but for now it looks like we can actually make progress towards a patch 🎉 |
PR has been raised. You should be able to try the snapshot with aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 021119463062.dkr.ecr.us-east-1.amazonaws.com
helm upgrade --install karpenter oci://021119463062.dkr.ecr.us-east-1.amazonaws.com/karpenter/snapshot/karpenter --version "0-cd04d65077eaed45e212e2140c0081768f3de547" --namespace "kube-system" --create-namespace \
--set "settings.clusterName=${CLUSTER_NAME}" \
--set "settings.interruptionQueue=${CLUSTER_NAME}" \
--set controller.resources.requests.cpu=1 \
--set controller.resources.requests.memory=1Gi \
--set controller.resources.limits.cpu=1 \
--set controller.resources.limits.memory=1Gi \
--wait For those willing to try -- let me know if you see the issue after the new install. |
Looking good! 500 nodes created so far with no maxPods issues in either node or nodeclaim resources. I'll leave it churning for a bit, just in case, but looks like the problem's fixed. 🎉 Thank you! |
#7020 merged! So I think we are good to close this out now. We should have a patch that includes this soon! Please continue to post on this issue if you see any more issues with this, but from what I'm hearing, this appears to be resolved! |
@jonathan-innis when is the release expected? we are facing this issue now and karpenter can't spawn new machines. Adding/removing |
@jonathan-innis v1.0.3 has been released yesterday, but it looks like this is still not part of the release. Is there a specific reason for it? It looks like this is affecting quite a few user. |
Version 1.0.4 was released, but we don't have this fix either. |
After upgrading Karpenter to v1.0.1 in our environment, we encountered a significant issue. This problem has a major impact on our environment, and we cannot proceed with upgrading Karpenter to v1 until it is resolved. We would appreciate it if you could inform us in which release the fix will be included. |
@caiohasouza did you upgrade to use v1.0.4? The fix was included in that version https://github.com/aws/karpenter-provider-aws/releases/tag/v1.0.4 |
@engedaam, I upgraded to v1.0.6 today. If the issue persists, I will update here. Thank you! |
Hi Team, After upgrading to 1.0.6, I am still receiving this error : {"level":"ERROR","time":"2024-10-08T05:47:02.546Z","logger":"controller","message":"consistency error","commit":"6174c75","controller":"nodeclaim.consistency","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"XXXXXXXXXXXXXXXXXX"},"namespace":"","name":"XXXXXXXXXXXXXXXXXX","reconcileID":"XXXXXXXXXXXXXXXXXXXXXXXXXX","error":"expected 234 of resource pods, but found 58 (24.8% of expected)"} |
Description
Observed Behavior:
Since we upgraded to Karpenter v1 we observed incorrect kubelet
maxPods
settings for multiple nodes. We initially only noticed the issue withm7a.medium
instances, however today we also had a case with anr7a.medium
instance.The issue becomes visible when multiple pods on a node in the cluster are stuck in initializing with:
Checking the node, it immediately becomes obvious that too many pods have been scheduled on it, and the node is running out of IP addresses.
In the example with
m7a.medium
we observed multiple nodes in the same cluster (allm7a.medium
) with a differentstatus.capacity.pods
specified.We observed nodes with
8
,58
and29
maxPods
in the cluster.According to https://github.com/awslabs/amazon-eks-ami/blob/main/templates/shared/runtime/eni-max-pods.txt#L518 the correct number should be
8
. So the nodes which had a higher number specified ran into the issue mentioned above.Logging into the nodes and checking the kubelet config revealed the following:
So it appears that the correct value is specified in
/etc/kubernetes/kubelet/config.json
but overwritten in/etc/kubernetes/kubelet/config.json.d/00-nodeadm.conf
.We use AL2023 and we do not specify any value for
podsPerCore
in our karpenter resources or similar.As we had different nodes of the same instance type with varying values, this could also be some kind of race condition or similar.
Expected Behavior:
Calculated
maxPods
matches value in https://github.com/awslabs/amazon-eks-ami/blob/main/templates/shared/runtime/eni-max-pods.txtReproduction Steps (Please include YAML):
Used
EC2NodeClass
Versions:
kubectl version
): v1.29.6-eks-db838b0The text was updated successfully, but these errors were encountered: