-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EKS IP-addresses limits #1366
Comments
It works differently for every cloud provider (e.g. on GCE, each nodes is assigned a range of IPs for pods). If I understand correctly, in this case IPs would be a cluster-level resource: it doesn't limit the number of nodes, and no matter how many we add, pods may not be able to run. Currently there's no support for such resources at all. It can probably be implemented by injecting a new pod list processor, which would remove pods that won't be able to run anyway from scale-up calculations. |
I'm not entirely sure cluster-autoscaler needs to do anything here, but your instances should be actually configured to only allow max IP address pods https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#options. A change for that was recently implemented in kops kubernetes/kops#6058, but I don't know whether this is done in EKS by default. The max pods limit should also be recognised by the CA, but I'm not entirely sure whether it is, maybe @aleksandra-malinowska knows more? |
@johanneswuerbach hello, thank you for the feedback. |
I believe scheduler's predicates checks max pods per node limit. If it's not the case, it's probably a bug.
Can you verify if your nodes indeed have this set? |
Also I have next option in kubelet-config.json: |
Does CA ignore this (i.e. scale up assuming more than 17 pods will fit)? If so, any repro you have would be useful (sample pods etc.) Scheduler code looks fairly straightforwad, not sure what may be wrong here:/ |
Hm. It's strange, but when I tried to deploy 100 pods to my EKS cluster CNI wasn't able to assign IP to pods and those pods weren't in status "Pending" (I don't remember what status was).
It seems CA works as expected. |
Oh, I've reproduced it!
|
CA only makes sure there are enough nodes to schedule pods on. In this case, it seems pod was scheduled, but kubelet wasn't actually able to run it. I'd look for the scheduling constraints that were supposed to prevent this and ensure they're in place. Perhaps 17 pods per node is too many in this case, or there's some global limit on number of pods? |
Thank you for your help. |
@okgolove did you manage to solve that ? when I increase --max-pod the ips cannot be assigned, but my machine is clearly under utilized |
@dkuida hi! I didn't. But you can subscribe to an issue about CNI aws/amazon-vpc-cni-k8s#214 |
@dkuida When using the EKS default AWS-VPC-CNI, the max-pods are set to the max IPs that is available for assignment to pods on that instance type Increasing max-pods without changing the CNI to something else won't work. (Changing the CNI is possible, but possibly unsupported) Some instance types have higher pods-per-CPU / RAM which might help... (e,g, t3.large can do 35 pods, while t3a.xlarge (double the size) and t3a.2xlarge (four times the size) can only do 58 pods) |
I think this issue should not be closed... in my experience if you cannot place a new pod due to the pod limit per node the cluster autoscaler should scale for additional nodes. Right now it says something like this:
|
IP exhaustion should trigger scaling up . |
Facing the same issue here using AWS, autoscaling doesn't trigger when "too many pods" error happens due to IP exhaustion. For those who wants to know max pods limit based on instance type: https://github.com/awslabs/amazon-eks-ami/blob/master/files/eni-max-pods.txt |
Fix fungibility: Try next flavor if can't preempt on first
Hello. EKS uses AWS CNI to assign private IP to every pod https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html#AvailableIpPerENI
So, if you haven't free IPs your pod won't be schedulded.
Can somehow autoscaler implements autoscaling based on IP limits?
The text was updated successfully, but these errors were encountered: