EKS IP-addresses limits #1366

okgolove · 2018-10-31T11:59:43Z

Hello. EKS uses AWS CNI to assign private IP to every pod https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html#AvailableIpPerENI
So, if you haven't free IPs your pod won't be schedulded.

Can somehow autoscaler implements autoscaling based on IP limits?

aleksandra-malinowska · 2018-10-31T12:19:53Z

It works differently for every cloud provider (e.g. on GCE, each nodes is assigned a range of IPs for pods). If I understand correctly, in this case IPs would be a cluster-level resource: it doesn't limit the number of nodes, and no matter how many we add, pods may not be able to run. Currently there's no support for such resources at all. It can probably be implemented by injecting a new pod list processor, which would remove pods that won't be able to run anyway from scale-up calculations.

johanneswuerbach · 2019-01-07T23:04:03Z

I'm not entirely sure cluster-autoscaler needs to do anything here, but your instances should be actually configured to only allow max IP address pods https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#options.

A change for that was recently implemented in kops kubernetes/kops#6058, but I don't know whether this is done in EKS by default.

The max pods limit should also be recognised by the CA, but I'm not entirely sure whether it is, maybe @aleksandra-malinowska knows more?

okgolove · 2019-01-08T12:19:39Z

@johanneswuerbach hello, thank you for the feedback.
I have specified that option (--max-pods). It's exactly that thing I meant. As far I exhaust IP limit Kubernetes can't schedule any new pods. I meant it will be great if CA can handle these situations.

aleksandra-malinowska · 2019-01-08T14:14:41Z

I believe scheduler's predicates checks max pods per node limit. If it's not the case, it's probably a bug.

I have specified that option (--max-pods). It's exactly that thing I meant.

Can you verify if your nodes indeed have this set? kubectl get node <node-name> -o yaml

okgolove · 2019-01-08T18:12:00Z

@aleksandra-malinowska

status:
  addresses:
  - address: 10.0.1.217
    type: InternalIP
  - address: ip-10-0-1-217.eu-west-1.compute.internal
    type: InternalDNS
  - address: ip-10-0-1-217.eu-west-1.compute.internal
    type: Hostname
  allocatable:
    cpu: "2"
    ephemeral-storage: "96625420948"
    hugepages-2Mi: "0"
    memory: 3937632Ki
    pods: "17"
  capacity:
    cpu: "2"
    ephemeral-storage: 104845292Ki
    hugepages-2Mi: "0"
    memory: 4040032Ki
    pods: "17"

Also I have next option in kubelet-config.json:
"maxPods": 17

aleksandra-malinowska · 2019-01-08T19:16:50Z

Does CA ignore this (i.e. scale up assuming more than 17 pods will fit)? If so, any repro you have would be useful (sample pods etc.) Scheduler code looks fairly straightforwad, not sure what may be wrong here:/

okgolove · 2019-01-09T12:39:35Z

Hm. It's strange, but when I tried to deploy 100 pods to my EKS cluster CNI wasn't able to assign IP to pods and those pods weren't in status "Pending" (I don't remember what status was).
But now when I deploy 100 pods I have a lot of pods in "Pending" status and CA scales my nodes correctly:

Warning FailedScheduling 8s (x7 over 39s) default-scheduler 0/3 nodes are available: 3 Insufficient pods.

nginx-bucket-6f8b645d58-vg92h   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-vpkkk   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-w5px7   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-w6fv4   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-w8dxd   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-wbn5v   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-wkc26   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-wq926   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-ws7bz   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-x9k6d   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-xcgnf   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-xxp4b   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-zfcd7   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-zlpr4   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-zmsz6   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-znjjt   0/1     Pending   0          4m

It seems CA works as expected.

okgolove · 2019-01-09T13:59:07Z

Oh, I've reproduced it!
Pod has status

0/1     Running

Warning  FailedCreatePodSandBox  58s (x12 over 70s)  kubelet, ip-10-0-2-196.eu-west-1.compute.internal  Failed create pod sandbox: rpc error: code = Unknown
 desc = NetworkPlugin cni failed to set up pod "nginx-develop-7845f449bc-lnlqv_nginx-develop" network: add cmd: failed to assign an IP a
ddress to container
  Normal   SandboxChanged          58s (x11 over 68s)  kubelet, ip-10-0-2-196.eu-west-1.compute.internal  Pod sandbox changed, it will be killed and re-create
d.```

aleksandra-malinowska · 2019-01-09T14:03:39Z

CA only makes sure there are enough nodes to schedule pods on. In this case, it seems pod was scheduled, but kubelet wasn't actually able to run it. I'd look for the scheduling constraints that were supposed to prevent this and ensure they're in place. Perhaps 17 pods per node is too many in this case, or there's some global limit on number of pods?

okgolove · 2019-01-09T14:19:06Z

Thank you for your help.
As I think it is not the CA problem.
This issue may be closed, if you think it should.

dkuida · 2019-03-26T12:27:17Z

@okgolove did you manage to solve that ? when I increase --max-pod the ips cannot be assigned, but my machine is clearly under utilized

okgolove · 2019-03-26T12:36:48Z

@dkuida hi! I didn't.
I've decided to ignore it and just use kops in production :)

But you can subscribe to an issue about CNI aws/amazon-vpc-cni-k8s#214
I hope, we will have an ability to choose CNI plugin.

mohag · 2019-10-16T10:24:22Z

@dkuida When using the EKS default AWS-VPC-CNI, the max-pods are set to the max IPs that is available for assignment to pods on that instance type

Increasing max-pods without changing the CNI to something else won't work. (Changing the CNI is possible, but possibly unsupported)

Some instance types have higher pods-per-CPU / RAM which might help... (e,g, t3.large can do 35 pods, while t3a.xlarge (double the size) and t3a.2xlarge (four times the size) can only do 58 pods)

runningman84 · 2020-02-06T14:34:58Z

I think this issue should not be closed... in my experience if you cannot place a new pod due to the pod limit per node the cluster autoscaler should scale for additional nodes. Right now it says something like this:

I0206 14:08:34.906644       1 scale_down.go:706] No candidates for scale down
I0206 14:08:34.906918       1 event.go:209] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"external-dns-5fcb999649-59jdp", UID:"44a7afe3-48d4-11ea-ab32-0a5855b3f258", APIVersion:"v1", ResourceVersion:"25283822", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added): 
I0206 14:08:34.906956       1 event.go:209] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"k8s-spot-termination-handler-rtk4c", UID:"76b73040-48c6-11ea-ab32-0a5855b3f258", APIVersion:"v1", ResourceVersion:"25283856", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added): 
I0206 14:08:34.906976       1 event.go:209] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"k8s-spot-termination-handler-xmtv9", UID:"29ae5cfc-48d4-11ea-ab32-0a5855b3f258", APIVersion:"v1", ResourceVersion:"25283551", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added):

dimitarshenkov · 2022-06-29T13:00:36Z

IP exhaustion should trigger scaling up .

ydamni · 2022-07-17T12:11:22Z

Facing the same issue here using AWS, autoscaling doesn't trigger when "too many pods" error happens due to IP exhaustion.
Had no other choice but to change the instance type of the node group to a superior one.

For those who wants to know max pods limit based on instance type: https://github.com/awslabs/amazon-eks-ami/blob/master/files/eni-max-pods.txt

Fix fungibility: Try next flavor if can't preempt on first

aleksandra-malinowska added area/cluster-autoscaler area/provider/aws Issues or PRs related to aws provider kind/feature Categorizes issue or PR as related to a new feature. sig/aws labels Oct 31, 2018

okgolove mentioned this issue Oct 31, 2018

Ability to choose another CNI aws/amazon-vpc-cni-k8s#214

Closed

aleksandra-malinowska closed this as completed Jan 9, 2019

js-timbirkett mentioned this issue Jun 23, 2020

feat: calculate the correct number of pods for custom ENIConfig awslabs/amazon-eks-ami#494

Closed

yaroslava-serdiuk pushed a commit to yaroslava-serdiuk/autoscaler that referenced this issue Feb 22, 2024

Merge pull request kubernetes#1366 from alculquicondor/KunWuLuan/main

7867957

Fix fungibility: Try next flavor if can't preempt on first

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EKS IP-addresses limits #1366

EKS IP-addresses limits #1366

okgolove commented Oct 31, 2018 •

edited

Loading

aleksandra-malinowska commented Oct 31, 2018 •

edited

Loading

johanneswuerbach commented Jan 7, 2019

okgolove commented Jan 8, 2019

aleksandra-malinowska commented Jan 8, 2019 •

edited

Loading

okgolove commented Jan 8, 2019 •

edited

Loading

aleksandra-malinowska commented Jan 8, 2019

okgolove commented Jan 9, 2019 •

edited

Loading

okgolove commented Jan 9, 2019 •

edited

Loading

aleksandra-malinowska commented Jan 9, 2019 •

edited

Loading

okgolove commented Jan 9, 2019 •

edited

Loading

dkuida commented Mar 26, 2019

okgolove commented Mar 26, 2019

mohag commented Oct 16, 2019

runningman84 commented Feb 6, 2020 •

edited

Loading

dimitarshenkov commented Jun 29, 2022

ydamni commented Jul 17, 2022

EKS IP-addresses limits #1366

EKS IP-addresses limits #1366

Comments

okgolove commented Oct 31, 2018 • edited Loading

aleksandra-malinowska commented Oct 31, 2018 • edited Loading

johanneswuerbach commented Jan 7, 2019

okgolove commented Jan 8, 2019

aleksandra-malinowska commented Jan 8, 2019 • edited Loading

okgolove commented Jan 8, 2019 • edited Loading

aleksandra-malinowska commented Jan 8, 2019

okgolove commented Jan 9, 2019 • edited Loading

okgolove commented Jan 9, 2019 • edited Loading

aleksandra-malinowska commented Jan 9, 2019 • edited Loading

okgolove commented Jan 9, 2019 • edited Loading

dkuida commented Mar 26, 2019

okgolove commented Mar 26, 2019

mohag commented Oct 16, 2019

runningman84 commented Feb 6, 2020 • edited Loading

dimitarshenkov commented Jun 29, 2022

ydamni commented Jul 17, 2022

okgolove commented Oct 31, 2018 •

edited

Loading

aleksandra-malinowska commented Oct 31, 2018 •

edited

Loading

aleksandra-malinowska commented Jan 8, 2019 •

edited

Loading

okgolove commented Jan 8, 2019 •

edited

Loading

okgolove commented Jan 9, 2019 •

edited

Loading

okgolove commented Jan 9, 2019 •

edited

Loading

aleksandra-malinowska commented Jan 9, 2019 •

edited

Loading

okgolove commented Jan 9, 2019 •

edited

Loading

runningman84 commented Feb 6, 2020 •

edited

Loading