Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EKS IP-addresses limits #1366

Closed
okgolove opened this issue Oct 31, 2018 · 16 comments
Closed

EKS IP-addresses limits #1366

okgolove opened this issue Oct 31, 2018 · 16 comments
Labels
area/cluster-autoscaler area/provider/aws Issues or PRs related to aws provider kind/feature Categorizes issue or PR as related to a new feature.

Comments

@okgolove
Copy link

okgolove commented Oct 31, 2018

Hello. EKS uses AWS CNI to assign private IP to every pod https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html#AvailableIpPerENI
So, if you haven't free IPs your pod won't be schedulded.

Can somehow autoscaler implements autoscaling based on IP limits?

@aleksandra-malinowska
Copy link
Contributor

aleksandra-malinowska commented Oct 31, 2018

It works differently for every cloud provider (e.g. on GCE, each nodes is assigned a range of IPs for pods). If I understand correctly, in this case IPs would be a cluster-level resource: it doesn't limit the number of nodes, and no matter how many we add, pods may not be able to run. Currently there's no support for such resources at all. It can probably be implemented by injecting a new pod list processor, which would remove pods that won't be able to run anyway from scale-up calculations.

@aleksandra-malinowska aleksandra-malinowska added area/cluster-autoscaler area/provider/aws Issues or PRs related to aws provider kind/feature Categorizes issue or PR as related to a new feature. sig/aws labels Oct 31, 2018
@johanneswuerbach
Copy link
Contributor

I'm not entirely sure cluster-autoscaler needs to do anything here, but your instances should be actually configured to only allow max IP address pods https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#options.

A change for that was recently implemented in kops kubernetes/kops#6058, but I don't know whether this is done in EKS by default.

The max pods limit should also be recognised by the CA, but I'm not entirely sure whether it is, maybe @aleksandra-malinowska knows more?

@okgolove
Copy link
Author

okgolove commented Jan 8, 2019

@johanneswuerbach hello, thank you for the feedback.
I have specified that option (--max-pods). It's exactly that thing I meant. As far I exhaust IP limit Kubernetes can't schedule any new pods. I meant it will be great if CA can handle these situations.

@aleksandra-malinowska
Copy link
Contributor

aleksandra-malinowska commented Jan 8, 2019

I believe scheduler's predicates checks max pods per node limit. If it's not the case, it's probably a bug.

I have specified that option (--max-pods). It's exactly that thing I meant.

Can you verify if your nodes indeed have this set? kubectl get node <node-name> -o yaml

@okgolove
Copy link
Author

okgolove commented Jan 8, 2019

@aleksandra-malinowska

status:
  addresses:
  - address: 10.0.1.217
    type: InternalIP
  - address: ip-10-0-1-217.eu-west-1.compute.internal
    type: InternalDNS
  - address: ip-10-0-1-217.eu-west-1.compute.internal
    type: Hostname
  allocatable:
    cpu: "2"
    ephemeral-storage: "96625420948"
    hugepages-2Mi: "0"
    memory: 3937632Ki
    pods: "17"
  capacity:
    cpu: "2"
    ephemeral-storage: 104845292Ki
    hugepages-2Mi: "0"
    memory: 4040032Ki
    pods: "17"

Also I have next option in kubelet-config.json:
"maxPods": 17

@aleksandra-malinowska
Copy link
Contributor

Does CA ignore this (i.e. scale up assuming more than 17 pods will fit)? If so, any repro you have would be useful (sample pods etc.) Scheduler code looks fairly straightforwad, not sure what may be wrong here:/

@okgolove
Copy link
Author

okgolove commented Jan 9, 2019

Hm. It's strange, but when I tried to deploy 100 pods to my EKS cluster CNI wasn't able to assign IP to pods and those pods weren't in status "Pending" (I don't remember what status was).
But now when I deploy 100 pods I have a lot of pods in "Pending" status and CA scales my nodes correctly:

Warning FailedScheduling 8s (x7 over 39s) default-scheduler 0/3 nodes are available: 3 Insufficient pods.

nginx-bucket-6f8b645d58-vg92h   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-vpkkk   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-w5px7   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-w6fv4   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-w8dxd   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-wbn5v   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-wkc26   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-wq926   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-ws7bz   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-x9k6d   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-xcgnf   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-xxp4b   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-zfcd7   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-zlpr4   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-zmsz6   0/1     Pending   0          4m
nginx-bucket-6f8b645d58-znjjt   0/1     Pending   0          4m

It seems CA works as expected.

@okgolove
Copy link
Author

okgolove commented Jan 9, 2019

Oh, I've reproduced it!
Pod has status

0/1     Running
Warning  FailedCreatePodSandBox  58s (x12 over 70s)  kubelet, ip-10-0-2-196.eu-west-1.compute.internal  Failed create pod sandbox: rpc error: code = Unknown
 desc = NetworkPlugin cni failed to set up pod "nginx-develop-7845f449bc-lnlqv_nginx-develop" network: add cmd: failed to assign an IP a
ddress to container
  Normal   SandboxChanged          58s (x11 over 68s)  kubelet, ip-10-0-2-196.eu-west-1.compute.internal  Pod sandbox changed, it will be killed and re-create
d.```

@aleksandra-malinowska
Copy link
Contributor

aleksandra-malinowska commented Jan 9, 2019

CA only makes sure there are enough nodes to schedule pods on. In this case, it seems pod was scheduled, but kubelet wasn't actually able to run it. I'd look for the scheduling constraints that were supposed to prevent this and ensure they're in place. Perhaps 17 pods per node is too many in this case, or there's some global limit on number of pods?

@okgolove
Copy link
Author

okgolove commented Jan 9, 2019

Thank you for your help.
As I think it is not the CA problem.
This issue may be closed, if you think it should.

@dkuida
Copy link

dkuida commented Mar 26, 2019

@okgolove did you manage to solve that ? when I increase --max-pod the ips cannot be assigned, but my machine is clearly under utilized

@okgolove
Copy link
Author

@dkuida hi! I didn't.
I've decided to ignore it and just use kops in production :)

But you can subscribe to an issue about CNI aws/amazon-vpc-cni-k8s#214
I hope, we will have an ability to choose CNI plugin.

@mohag
Copy link

mohag commented Oct 16, 2019

@dkuida When using the EKS default AWS-VPC-CNI, the max-pods are set to the max IPs that is available for assignment to pods on that instance type

Increasing max-pods without changing the CNI to something else won't work. (Changing the CNI is possible, but possibly unsupported)

Some instance types have higher pods-per-CPU / RAM which might help... (e,g, t3.large can do 35 pods, while t3a.xlarge (double the size) and t3a.2xlarge (four times the size) can only do 58 pods)

@runningman84
Copy link

runningman84 commented Feb 6, 2020

I think this issue should not be closed... in my experience if you cannot place a new pod due to the pod limit per node the cluster autoscaler should scale for additional nodes. Right now it says something like this:

I0206 14:08:34.906644       1 scale_down.go:706] No candidates for scale down
I0206 14:08:34.906918       1 event.go:209] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"external-dns-5fcb999649-59jdp", UID:"44a7afe3-48d4-11ea-ab32-0a5855b3f258", APIVersion:"v1", ResourceVersion:"25283822", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added): 
I0206 14:08:34.906956       1 event.go:209] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"k8s-spot-termination-handler-rtk4c", UID:"76b73040-48c6-11ea-ab32-0a5855b3f258", APIVersion:"v1", ResourceVersion:"25283856", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added): 
I0206 14:08:34.906976       1 event.go:209] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"k8s-spot-termination-handler-xmtv9", UID:"29ae5cfc-48d4-11ea-ab32-0a5855b3f258", APIVersion:"v1", ResourceVersion:"25283551", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added): 

@dimitarshenkov
Copy link

IP exhaustion should trigger scaling up .

@ydamni
Copy link

ydamni commented Jul 17, 2022

Facing the same issue here using AWS, autoscaling doesn't trigger when "too many pods" error happens due to IP exhaustion.
Had no other choice but to change the instance type of the node group to a superior one.

For those who wants to know max pods limit based on instance type: https://github.com/awslabs/amazon-eks-ami/blob/master/files/eni-max-pods.txt

yaroslava-serdiuk pushed a commit to yaroslava-serdiuk/autoscaler that referenced this issue Feb 22, 2024
Fix fungibility: Try next flavor if can't preempt on first
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler area/provider/aws Issues or PRs related to aws provider kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

8 participants