-
Notifications
You must be signed in to change notification settings - Fork 753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Available ENIs left dangling after node termination #608
Comments
Thanks @krzysztof-bronk for reporting, we will have to take a look why this is happening. ENIs attached to the terminated node should be freed by EC2 automatically. |
Thank you for acknowledging this. If your cluster has a high node churn, or the IP pool is small, this can quickly become an issue. The current workaround, and also an independent report for the issue can be found here: #59 (comment) |
@krzysztof-bronk Do you happen to have the ipamd logs from one such instance? |
No, it's not true. Only the main (eth0) ENI is cleaned up. The additional ones are not cleaned up, just detached and become available. |
Here's the situation: Fresh test cluster, nothing fancy or custom (except external SNAT but I tested both and it's not relevant), a single m5.xlarge worker node. Private IPs: 10.250.9.56 (eth0), 10.250.11.217 (eth1) ENI: 3 total aws-node logs are empty (even though they're in DEBUG mode and I've deployed a test nginx container):
Node terminated. ASG spun up a new one. Private IPs: 10.250.19.53 (eth0), 10.250.16.192 (eth1) ENI: FOUR total aws-node logs (with the successfully running nginx container):
So it looks like aws-cni is leaking the warmup ENIs. |
The latest pre-release, v1.6.0-rc2, has some changes to mitigate this problem. |
Good to know this is getting worked on, it's an issue for us as well. 👍 We have nightly infra testing jobs that bring up a cluster (w/ Terraform), run tests, and delete the cluster, and we've noticed that the |
... or even a 1.6.0-rc4 (but without the problems of 1.5.4?) |
@robin-engineml Hey, please try v1.6.0-rc4, fresh out of the oven. 😄 |
@mogren The problem persists with v1.6.0-rc4, which we have been using for a couple of weeks. Anecdotally, it does seem to be less frequent. |
@robin-engineml Thanks for the update! Glad that it has improved a bit at least. There is still a small chance that a fes ENIs will leak, but they should be cleaned up as long as there is at least some nodes running in the cluster. |
@mogren This occurs upon cluster termination, for us. So, there are not "some nodes running in the cluster". Would this stop occurring if we were to allow some cluster nodes to remain alive longer? (We destroy the EKS cluster via the AWS API, via Terraform.) |
@robin-engineml The issue is that the EC2 API requires you to first detach an ENI, then wait 2-5 seconds before deleting it. If the instance get terminated after the ENI is detached, but before we delete it, it will stay around. With the v1.6 branch, we try to clean up when the CNI starts up, or in the background once per hour. |
Could this be integrated with ASG Lifecycle hooks to allow the processes more time to clean up on instance termination? Simply adding a lifecycle hook to an ASG isn't enough. |
I've tested 1.6.0-rc5 a bit and I don't see much progress, after terminating a couple nodes, I saw available ENIs dangling, so I terminated all nodes and now I have:
|
@krzysztof-bronk Did those ENIs stay around? They should have been cleaned up if they were created by the CNI. Not directly, but within five minutes of another worker node being started. |
Hi, has there been any progress on this issue? This is really affecting us, we are even considering changing the CNI vendor we use. |
Hi @steven-cherry. The base issue is that the EC2 API requires clients to detach ENIs before they can be deleted. If the node (or the aws-node) gets restarted in the around 2-3 seconds we have to wait for the detach to complete, there will be an ENI with status "available" around. The code to do the clean up is here It will filter out ENIs with the tag key I've done some more tests with v1.6.0 on spot instances that get randomly terminated, and the leaked ENIs do get cleaned up eventually. The only sure way to not leak any ENIs is to have this handled outside the node like in our 2.0 CNI design. |
Thanks @mogren any ETA regarding version 2.0 for production workloads? |
I'll be getting back on the topic soon so I will have a chance to test this once more |
@mogren
instance 1 ENIs: 10.250.1.228, 10.250.5.254 The primary interfaces have a Description like "aws-K8S-i-0c841ac56fbadc9b3" indicating the node they belong to. All 4 ENIs are Active. Terminating instance 1. ASG kicks in a replacement. Waiting 10 minutes. There is a third ENI attached to the remaining instance, not sure why, there were several pods running on the terminated instances but not that many. However... The primary interface of the terminated instance is now stuck in Available state. Terminating instance 2 (the one with 3 ENIs). ASG kicks in a replacement. Waiting 10 minutes. Cluster now has 2 fresh nodes. The primary interface of the second terminated instance is now stuck in Available state. Maybe I'm triggering some special case but... the cleanup of Available ENIs simply does not happen. |
This was removed in #49, but I think that change only fixed the issue with EKS managed SG not being deleted. Stale ENIs are related to this issue aws/amazon-vpc-cni-k8s#608
I'll do some further tests because sometimes the interfaces do get cleaned up. How does the mechanism work exactly? Is only the instance that had the interfaces attached responsible for cleaning them up and there is a race condition between the instance termination and the cleanup code? Or is it that if there is at least one node in the cluster, that aws-node pod will attempt to delete unused Available interfaces for the whole cluster? |
Also noticed that the ENIs that appear to be leaking for us are missing the tags/description that mean they won't be picked up by the clean up loop. Not on 1.6 yet, but when we are I'll check if that's still the case. EDIT: It is. We run our nodes in ASGs and on scaling down a test cluster of 6 nodes it leaked all 6 ENIs and left them untagged so they won't be cleaned up. EDIT 2: It looks like it is the secondary ENIs that are getting leaked because they aren't being tagged or given the "special description" in the first place (i.e. even while "in-use") that allows the clean up to catch them. |
We upgraded to 1.6.0 and fixed an oopsie and haven't had any ENIs leaking since. |
We have upgraded to 1.6.1 and there is no issue with dangling ENIs anymore, thank you! P.S. you should have "delete on termination" enabled for a primary interface to clean it up on node's termination as well. |
It is still a small chance that ENIs will leak, but they should be cleaned up pretty quickly if there are still any nodes still in the cluster. Also, I have seen that pods creating ALBs might create ENIs in subnets that then doesn't get cleaned up. If anyone sees ENIs still around in a cluster using CNI v1.6.1 or later, please gather logs and open a new ticket. |
@mogren I am having a problem with danging ENIs using The basic problem is I am using Terraform and trying to destroy a node group and a security group that goes with it, but I cannot because the ENI is dangling after the node group is deleted, so the delete of the security group hangs. Note that the dangling ENI has the tag |
Hi @Nuru, This has been an issue forever when scaling down the pods, and then suddenly the whole instance gets deleted. The issue triggering this is that there is no EC2 API call to "delete" an ENI that is attached, so instead they first have to be detached, which takes a few seconds, then deleted. If the instance gets terminated after the ENI has been detached, but before it has been deleted, it will be leaked. We have tried to mitigate this by for example having 10s termination policy on the aws-node daemonset, and never detach any ENIs while the CNI is shutting down, but none of this helps when the instance goes away. Is this a managed nodegroup, or do you handle it on your own using Terraform? If so, terminating all the aws-node pods first, before terminating the instances might at least prevent them from detaching any ENIs in the last few seconds when the other pods are being deleted. Another option would be if we had a setting to never detach any ENIs, since then the ENIs will get deleted when the instance gets deleted. The reason we don't do this by default is that running out of ENIs is also a common problem. |
@mogren wrote
In my immediate case, I am using the AWS Terraform provider to create an Prior to the node being shut down, it is cordoned off, meaning it will be marked as "unschedulable", meaning no new pods should be assigned to the node. You could surely arrange things such that any ENIs that are freed while the node is marked unschedulable are not detached. You do not need to worry about running out of ENIs at that point because there should be no new ENIs getting created. Then the ENIs can be deleted with the instance on termination, or, if the node is marked "schedulable" again without being terminated, a detach/delete loop could be run when the pod returns to the schedulable state. This, of course, requires the "delete on termination" option be set for the ENIs, such that they are automatically deleted when the instance is deleted. I do not see any downside to that setting always being set, as it still leaves you the option of detaching and deleting the ENI when a pod is deleted but the instance is intended to remain. Maybe the building blocks were not there earlier, but it looks like the piece of the solution are now ready to be put together. Am I missing something? |
@Nuru I do think you are right, having the VPC CNI be aware of the (Btw, kube-proxy is using host-networking, just like the aws-node pod does, so it is independent of the CNI being up.) |
@mogren I would be happy to have you open the feature request, as you would know better how to put the request together (what parts of code should react to what, and how) and see it through, and also be happy to lend my support to your request. I don't need credit or recognition for the feature request, I just want this done as quickly and efficiently as possible, so I would prefer you do it if you have the time. If it won't happen unless I do it, let me know and I will do it.
|
By the way, @mogren
Have you opened a feature request for this feature? That would be even better than my suggestion. |
…destroy (#336) This is a workaround for the known VPC CNI addon's "leaked ENIs" issue: See aws/amazon-vpc-cni-k8s#608 Co-authored-by: Rafael Mendes Pereira <[email protected]>
…destroy (#336) This is a workaround for the known VPC CNI addon's "leaked ENIs" issue: See aws/amazon-vpc-cni-k8s#608 Co-authored-by: Rafael Mendes Pereira <[email protected]>
Hello,
I have encountered an issue with aws-cni 1.5.1(+?), where, even in a single node test cluster, if you terminate the node so that the ASG kicks in a replacement, the terminated instance ENI switches back to Available, holding IPs, and is seemingly never deleted.
Eventually one will exhaust the IP pool and pods will fail to be created.
This is a bit surprising as node recycling is the basis of autoscaling groups.
Is there some cleanup mechanism I am not aware of? Or is it a bug?
regards,
Krzysztof Bronk
The text was updated successfully, but these errors were encountered: