You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Depending on the node which it was scheduled, here is an example:
Now, analyzing, I could see what is happening on the node ip-10-38-58-208.ec2.internal
{"level":"info","ts":"2021-06-02T14:56:59.994Z","caller":"awsutils/awsutils.go:724","msg":"Created a new ENI: eni-05ec4b26a3b50051c"} <this when the eni is created, all good here>
...
...
{"level":"debug","ts":"2021-06-02T23:03:03.702Z","caller":"datastore/data_store.go:675","msg":"ENI eni-05ec4b26a3b50051c cannot be deleted because it is required for WARM_IP_TARGET: 5"} < controller starts to try to delete the ENI, all good until here >
...
... After some time, He actually manage to delete the ENI,
{"level":"debug","ts":"2021-06-02T23:05:40.462Z","caller":"datastore/data_store.go:675","msg":"getDeletableENI: found a deletable ENI eni-05ec4b26a3b50051c"}
{"level":"info","ts":"2021-06-02T23:05:40.462Z","caller":"ipamd/ipamd.go:523","msg":"RemoveUnusedENIFromStore eni-05ec4b26a3b50051c: IP address pool stats: free 1 addresses, total: 52, assigned: 46"}
{"level":"debug","ts":"2021-06-02T23:05:40.462Z","caller":"ipamd/ipamd.go:489","msg":"Start freeing ENI eni-05ec4b26a3b50051c"}
{"level":"info","ts":"2021-06-02T23:05:40.462Z","caller":"awsutils/awsutils.go:946","msg":"Trying to free ENI: eni-05ec4b26a3b50051c"}
{"level":"debug","ts":"2021-06-02T23:05:40.577Z","caller":"awsutils/awsutils.go:946","msg":"Found ENI eni-05ec4b26a3b50051c attachment id: eni-attach-0b34b3738a21e3dce "}
{"level":"info","ts":"2021-06-02T23:05:41.089Z","caller":"retry/retry.go:69","msg":"Successfully detached ENI: eni-05ec4b26a3b50051c"}
{"level":"debug","ts":"2021-06-02T23:05:43.090Z","caller":"awsutils/awsutils.go:990","msg":"Trying to delete ENI: eni-05ec4b26a3b50051c"}
{"level":"debug","ts":"2021-06-02T23:05:43.376Z","caller":"retry/retry.go:69","msg":"Not able to delete ENI: InvalidParameterValue: Network interface 'eni-05ec4b26a3b50051c' is currently in use.\n\tstatus code: 400, request id: 4e7d28b4-2d61-4b58-9bfc-fe8ddc1c9077 "}
{"level":"debug","ts":"2021-06-02T23:05:44.199Z","caller":"retry/retry.go:69","msg":"Not able to delete ENI: InvalidParameterValue: Network interface 'eni-05ec4b26a3b50051c' is currently in use.\n\tstatus code: 400, request id: 9b345d05-c090-47e7-8373-9177f929dfcf "}
{"level":"info","ts":"2021-06-02T23:05:45.730Z","caller":"retry/retry.go:69","msg":"Successfully deleted ENI: eni-05ec4b26a3b50051c"}
{"level":"info","ts":"2021-06-02T23:05:45.730Z","caller":"awsutils/awsutils.go:946","msg":"Successfully freed ENI: eni-05ec4b26a3b50051c"} ... after some time, we had the following situation
{"level":"debug","ts":"2021-06-02T23:07:10.019Z","caller":"awsutils/awsutils.go:569","msg":"Found ENI: eni-05ec4b26a3b50051c, MAC 0e:c3:3c:06:0d:55, device 2"}
...
{"level":"error","ts":"2021-06-02T23:07:10.090Z","caller":"ipamd/ipamd.go:934","msg":"Failed to call ec2:DescribeNetworkInterfaces for [eni-0770f69b1b7fd3933 eni-05ec4b26a3b50051c eni-003f996afb640642a]: InvalidNetworkInterfaceID.NotFound: The networkInterface ID 'eni-05ec4b26a3b50051c' does not exist\n\tstatus code: 400, request id: 85313606-a8dd-48a0-ad57-34d808d7e7a0"}
{"level":"debug","ts":"2021-06-02T23:07:10.090Z","caller":"ipamd/ipamd.go:934","msg":"Could not find interface: The networkInterface ID 'eni-05ec4b26a3b50051c' does not exist, ID: eni-05ec4b26a3b50051c"}
... end after some hours we end up having the aws-cni pod (agent) not able to start anymore
{"level":"debug","ts":"2021-06-03T11:20:23.645Z","caller":"awsutils/awsutils.go:388","msg":"Update ENI eni-05ec4b26a3b50051c"}
{"level":"error","ts":"2021-06-03T11:20:23.808Z","caller":"aws-k8s-agent/main.go:28","msg":"Initialization failure: ipamd: can not initialize with AWS SDK interface: refreshSGIDs: unable to update the ENI's SG: InvalidNetworkInterfaceID.NotFound: The networkInterface ID 'eni-05ec4b26a3b50051c' does not exist\n\tstatus code: 400, request id: 9a2ec4d8-0b6a-4bbd-ab57-0995d1db9762"}
Then as expected, Kubelet start to have issues as well
Jun 3 11:24:32 ip-10-38-58-208 kubelet: E0603 11:24:32.026947 4869 pod_workers.go:191] Error syncing pod 5b7ad595-0a55-490d-af50-a83800f8cff4 ("cash-collection-hn-6fd657fc96-wnsrx_default(5b7ad595-0a55-490d-af50-a83800f8cff4)"), skipping: failed to "KillPodSandbox" for "5b7ad595-0a55-490d-af50-a83800f8cff4" with KillPodSandboxError: "rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod "cash-collection-hn-6fd657fc96-wnsrx_default" network: del cmd: error received from DelNetwork gRPC call: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:50051: connect: connection refused""
Extra
describe pod on aws-node-p62dc
Events:
Type Reason Age From Message
Warning Unhealthy 14m (x1318 over 7h18m) kubelet (combined from similar events): Readiness probe failed: {"level":"info","ts":"2021-06-03T12:52:35.683Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"timeout: failed to connect service ":50051" within 1s"}
Warning BackOff 4m23s (x1244 over 7h7m) kubelet Back-off restarting failed container
describe pod on dashboard-ar-web-598cf9b987-jq4pv
Events:
Type Reason Age From Message
Normal SandboxChanged 105s (x1019 over 3h41m) kubelet Pod sandbox changed, it will be killed and re-created.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Deleting the aws-cni pod, did not solve, still trying to get the eni-05ec4b26a3b50051c interface, which as we saw does not exist anymore.
{"level":"error","ts":"2021-06-03T13:18:15.960Z","caller":"aws-k8s-agent/main.go:28","msg":"Initialization failure: ipamd: can not initialize with AWS SDK interface: refreshSGIDs: unable to update the ENI's SG: InvalidNetworkInterfaceID.NotFound: The networkInterface ID 'eni-05ec4b26a3b50051c' does not exist\n\tstatus code: 400, request id: db7e1487-0a22-4873-af78-f9c343b57b20"}
Environment: eks Production
Kubernetes version (use kubectl version): v1.18.9-eks-d1db3c
CNI Version: v1.7.5
OS (e.g: cat /etc/os-release):
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
Kernel (e.g. uname -a):
Linux ip-10-38-58-208.ec2.internal 4.14.209-160.339.amzn2.x86_64 Initial commit of amazon-vpc-cni-k8s #1 SMP Wed Dec 16 22:44:04 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered:
From time to time, we have a problem in our production environment, where some pods stucks in ContainerCreating.
Attach logs
eks_i-0f7c80b4e860e2978_2021-06-03_1124-UTC_0.6.2.tar.gz
What you expected to happen:
Depending on the node which it was scheduled, here is an example:
![Screenshot 2021-06-03 at 14 56 20](https://user-images.githubusercontent.com/5530002/120648422-e829b100-c47b-11eb-8e80-160223b50170.png)
Now, analyzing, I could see what is happening on the node ip-10-38-58-208.ec2.internal
{"level":"info","ts":"2021-06-02T14:56:59.994Z","caller":"awsutils/awsutils.go:724","msg":"Created a new ENI: eni-05ec4b26a3b50051c"} <this when the eni is created, all good here>
...
...
{"level":"debug","ts":"2021-06-02T23:03:03.702Z","caller":"datastore/data_store.go:675","msg":"ENI eni-05ec4b26a3b50051c cannot be deleted because it is required for WARM_IP_TARGET: 5"} < controller starts to try to delete the ENI, all good until here >
...
...
After some time, He actually manage to delete the ENI,
{"level":"debug","ts":"2021-06-02T23:05:40.462Z","caller":"datastore/data_store.go:675","msg":"getDeletableENI: found a deletable ENI eni-05ec4b26a3b50051c"}
{"level":"info","ts":"2021-06-02T23:05:40.462Z","caller":"ipamd/ipamd.go:523","msg":"RemoveUnusedENIFromStore eni-05ec4b26a3b50051c: IP address pool stats: free 1 addresses, total: 52, assigned: 46"}
{"level":"debug","ts":"2021-06-02T23:05:40.462Z","caller":"ipamd/ipamd.go:489","msg":"Start freeing ENI eni-05ec4b26a3b50051c"}
{"level":"info","ts":"2021-06-02T23:05:40.462Z","caller":"awsutils/awsutils.go:946","msg":"Trying to free ENI: eni-05ec4b26a3b50051c"}
{"level":"debug","ts":"2021-06-02T23:05:40.577Z","caller":"awsutils/awsutils.go:946","msg":"Found ENI eni-05ec4b26a3b50051c attachment id: eni-attach-0b34b3738a21e3dce "}
{"level":"info","ts":"2021-06-02T23:05:41.089Z","caller":"retry/retry.go:69","msg":"Successfully detached ENI: eni-05ec4b26a3b50051c"}
{"level":"debug","ts":"2021-06-02T23:05:43.090Z","caller":"awsutils/awsutils.go:990","msg":"Trying to delete ENI: eni-05ec4b26a3b50051c"}
{"level":"debug","ts":"2021-06-02T23:05:43.376Z","caller":"retry/retry.go:69","msg":"Not able to delete ENI: InvalidParameterValue: Network interface 'eni-05ec4b26a3b50051c' is currently in use.\n\tstatus code: 400, request id: 4e7d28b4-2d61-4b58-9bfc-fe8ddc1c9077 "}
{"level":"debug","ts":"2021-06-02T23:05:44.199Z","caller":"retry/retry.go:69","msg":"Not able to delete ENI: InvalidParameterValue: Network interface 'eni-05ec4b26a3b50051c' is currently in use.\n\tstatus code: 400, request id: 9b345d05-c090-47e7-8373-9177f929dfcf "}
{"level":"info","ts":"2021-06-02T23:05:45.730Z","caller":"retry/retry.go:69","msg":"Successfully deleted ENI: eni-05ec4b26a3b50051c"}
{"level":"info","ts":"2021-06-02T23:05:45.730Z","caller":"awsutils/awsutils.go:946","msg":"Successfully freed ENI: eni-05ec4b26a3b50051c"}
... after some time, we had the following situation
{"level":"debug","ts":"2021-06-02T23:07:10.019Z","caller":"awsutils/awsutils.go:569","msg":"Found ENI: eni-05ec4b26a3b50051c, MAC 0e:c3:3c:06:0d:55, device 2"}
...
{"level":"error","ts":"2021-06-02T23:07:10.090Z","caller":"ipamd/ipamd.go:934","msg":"Failed to call ec2:DescribeNetworkInterfaces for [eni-0770f69b1b7fd3933 eni-05ec4b26a3b50051c eni-003f996afb640642a]: InvalidNetworkInterfaceID.NotFound: The networkInterface ID 'eni-05ec4b26a3b50051c' does not exist\n\tstatus code: 400, request id: 85313606-a8dd-48a0-ad57-34d808d7e7a0"}
{"level":"debug","ts":"2021-06-02T23:07:10.090Z","caller":"ipamd/ipamd.go:934","msg":"Could not find interface: The networkInterface ID 'eni-05ec4b26a3b50051c' does not exist, ID: eni-05ec4b26a3b50051c"}
... end after some hours we end up having the aws-cni pod (agent) not able to start anymore
{"level":"debug","ts":"2021-06-03T11:20:23.645Z","caller":"awsutils/awsutils.go:388","msg":"Update ENI eni-05ec4b26a3b50051c"}
{"level":"error","ts":"2021-06-03T11:20:23.808Z","caller":"aws-k8s-agent/main.go:28","msg":"Initialization failure: ipamd: can not initialize with AWS SDK interface: refreshSGIDs: unable to update the ENI's SG: InvalidNetworkInterfaceID.NotFound: The networkInterface ID 'eni-05ec4b26a3b50051c' does not exist\n\tstatus code: 400, request id: 9a2ec4d8-0b6a-4bbd-ab57-0995d1db9762"}
Then as expected, Kubelet start to have issues as well
Jun 3 11:24:32 ip-10-38-58-208 kubelet: E0603 11:24:32.026947 4869 pod_workers.go:191] Error syncing pod 5b7ad595-0a55-490d-af50-a83800f8cff4 ("cash-collection-hn-6fd657fc96-wnsrx_default(5b7ad595-0a55-490d-af50-a83800f8cff4)"), skipping: failed to "KillPodSandbox" for "5b7ad595-0a55-490d-af50-a83800f8cff4" with KillPodSandboxError: "rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod "cash-collection-hn-6fd657fc96-wnsrx_default" network: del cmd: error received from DelNetwork gRPC call: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:50051: connect: connection refused""
Extra
describe pod on aws-node-p62dc
Events:
Type Reason Age From Message
Warning Unhealthy 14m (x1318 over 7h18m) kubelet (combined from similar events): Readiness probe failed: {"level":"info","ts":"2021-06-03T12:52:35.683Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"timeout: failed to connect service ":50051" within 1s"}
Warning BackOff 4m23s (x1244 over 7h7m) kubelet Back-off restarting failed container
describe pod on dashboard-ar-web-598cf9b987-jq4pv
Events:
Type Reason Age From Message
Normal SandboxChanged 105s (x1019 over 3h41m) kubelet Pod sandbox changed, it will be killed and re-created.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Deleting the aws-cni pod, did not solve, still trying to get the eni-05ec4b26a3b50051c interface, which as we saw does not exist anymore.
{"level":"error","ts":"2021-06-03T13:18:15.960Z","caller":"aws-k8s-agent/main.go:28","msg":"Initialization failure: ipamd: can not initialize with AWS SDK interface: refreshSGIDs: unable to update the ENI's SG: InvalidNetworkInterfaceID.NotFound: The networkInterface ID 'eni-05ec4b26a3b50051c' does not exist\n\tstatus code: 400, request id: db7e1487-0a22-4873-af78-f9c343b57b20"}
Environment: eks Production
kubectl version
): v1.18.9-eks-d1db3ccat /etc/os-release
):NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
uname -a
):Linux ip-10-38-58-208.ec2.internal 4.14.209-160.339.amzn2.x86_64 Initial commit of amazon-vpc-cni-k8s #1 SMP Wed Dec 16 22:44:04 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: