Error from server: error dialing backend: dial tcp 10.10.71.57:10250: i/o timeout #1181

ShibraAmin18 · 2023-02-13T11:31:45Z

What happened:

for CIS ami changed /tmp to /home/ec2-user and --bin-dir /bin/ to /usr/local/bin

amazon-eks-ami/scripts/install-worker.sh

Line 138 in 343e830

sudo "${AWSCLI_DIR}/aws/install" --bin-dir /bin/
When using the image created with EKS node groups, cannot exec into or view pod logs.

Error from server: error dialing backend: dial tcp 10.10.71.57:10250: i/o timeout

What you expected to happen:
Successfully run the AMI created in EKS and be able to view logs and exec into the pod.

Environment:

AWS Region: us-east-2
Instance Type(s): t3a.medium
EKS Platform version (use aws eks describe-cluster --name <name> --query cluster.platformVersion): "eks.6"
Kubernetes version (use aws eks describe-cluster --name <name> --query cluster.version): "1.23"
AMI Version: CIS Amazon Linux 2 Benchmark v2.0.0.16 - Level 2-c41d38c4-3f6a-4434-9a86-06dd331d3f9c

The text was updated successfully, but these errors were encountered:

cartermckinnon · 2023-02-14T23:31:34Z

We need more information to understand the issue you're reporting. Does the node become Ready? Do the kubelet logs indicate any issues?

cartermckinnon · 2023-05-25T07:16:08Z

I'm closing this issue because we don't have enough information to reach a conclusion.

raackley · 2023-06-29T19:28:41Z

Can this be reopened? I believe I'm seeing the same problem, when building on top of the CIS Level 1 Benchmark AL2 AMI.

The AMI seems to build fine, but when deployed, I see issues similar to what was described, "i/o timeout", etc. This only seems to happen for some pods, sometimes, when attempting to connect to the kubernetes API service endpoint IP that is local to the cluster.

I suspect the problem is from some certain sysctl parameter that was set in the CIS base image, but I can't find what exactly.

cartermckinnon · 2023-06-30T20:31:05Z

@raackley we don't currently ship a CIS variant of the AMI, and this template doesn't explicitly support building on a CIS base. If there's a reasonable change we can make in the template to unblock that use case, we're not opposed; but we don't track CIS-specific issues here.

raackley · 2023-06-30T21:17:58Z

@cartermckinnon Sure, but is there a specific reason why it is known not to work? Seems like it should be fine. Obviously people are wanting this, so can this be a formal request for a CIS hardened base EKS image, or support to select a CIS hardened AMI to build your own?

cartermckinnon · 2023-06-30T21:39:12Z

Sure, but is there a specific reason why it is known not to work?

The issue you're describing:

This only seems to happen for some pods, sometimes, when attempting to connect to the kubernetes API service endpoint IP that is local to the cluster.

Sounds different from the one reported here. The OP was describing communication failures between the API server and kubelet.

can this be a formal request for a CIS hardened base EKS image

This is a longstanding request (#99), it's absolutely on our radar.

support to select a CIS hardened AMI to build your own?

If the issue lies in the CIS base AMI as you suspect, there's probably not much we can do in this template to ensure compatibility. If some logic here is causing the breakage, we're totally open to PR's.

Th0masL · 2023-07-13T15:38:45Z

I'm also building CIS-hardened AMIs on Amazon Linux, and we were experiencing similar issues when upgrading from EKS 1.23 (on Docker) to EKS 1.25 (on Containerd).

On our side, the problem was due to the fact that the CIS-hardened AMI is disabling IP Fowarding (most likely directly in /etc/sysctl.conf, but Docker is able to re-enable it on the fly when the EC2 instance starts.

When using ContainerD, it is not enabling IP Forwarding, so the containers don't have network access.

By looking in the sysctl config, we can see that the IP Forwarding parameters that are configured in the file /etc/sysctl.d/99-kubernetes-cri.conf are being overwritten by the parameters from /etc/sysctl.conf (that are disabled by the CIS-hardening script).

See this ticket that confirms the behavior of loading /etc/sysctl.conf after all the other files in /etc/sysctl.d/*.conf

In order to fix the network connectivity issue for the containers, we wrote a small bash script that re-enable IP Forwarding in /etc/sysctl.conf in our custom AMI after applying the CIS-hardening script.

So if you see problems with containers connectivity, you can run the following command to confirm that IP Forwarding is enabled:

$ sysctl -a 2>/dev/null | grep -E "ip_forward |\.forwarding "

Example output :

net.ipv4.conf.all.forwarding = 1
net.ipv4.conf.default.forwarding = 1
net.ipv4.conf.eni0f6f0f8b0af.forwarding = 1
net.ipv4.conf.eni101ddb65507.forwarding = 1
net.ipv4.conf.eni123f74d1678.forwarding = 1
net.ipv4.conf.eni25a0644a127.forwarding = 1
net.ipv4.conf.eni3c9bdd19a55.forwarding = 1
net.ipv4.conf.eni4cb3ea8d9b7.forwarding = 1
net.ipv4.conf.eni4f5f128fbd7.forwarding = 1
net.ipv4.conf.eni533143c3bfa.forwarding = 1
net.ipv4.conf.eni5a9462606df.forwarding = 1
net.ipv4.conf.eni7d229876a48.forwarding = 1
net.ipv4.conf.enic2b1da1faef.forwarding = 1
net.ipv4.conf.enic611f40a78a.forwarding = 1
net.ipv4.conf.enic68b338523f.forwarding = 1
net.ipv4.conf.enid9a8087c67d.forwarding = 1
net.ipv4.conf.eth0.forwarding = 1
net.ipv4.conf.eth1.forwarding = 1
net.ipv4.conf.lo.forwarding = 1
net.ipv4.ip_forward = 1

cartermckinnon closed this as not planned Won't fix, can't repro, duplicate, stale May 25, 2023

Th0masL mentioned this issue Jul 14, 2023

Some pods not working properly with containerd runtime and CNI plugin. #911

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error from server: error dialing backend: dial tcp 10.10.71.57:10250: i/o timeout #1181

Error from server: error dialing backend: dial tcp 10.10.71.57:10250: i/o timeout #1181

ShibraAmin18 commented Feb 13, 2023 •

edited

Loading

cartermckinnon commented Feb 14, 2023

cartermckinnon commented May 25, 2023

raackley commented Jun 29, 2023

cartermckinnon commented Jun 30, 2023

raackley commented Jun 30, 2023

cartermckinnon commented Jun 30, 2023

Th0masL commented Jul 13, 2023 •

edited

Loading

Error from server: error dialing backend: dial tcp 10.10.71.57:10250: i/o timeout #1181

Error from server: error dialing backend: dial tcp 10.10.71.57:10250: i/o timeout #1181

Comments

ShibraAmin18 commented Feb 13, 2023 • edited Loading

cartermckinnon commented Feb 14, 2023

cartermckinnon commented May 25, 2023

raackley commented Jun 29, 2023

cartermckinnon commented Jun 30, 2023

raackley commented Jun 30, 2023

cartermckinnon commented Jun 30, 2023

Th0masL commented Jul 13, 2023 • edited Loading

ShibraAmin18 commented Feb 13, 2023 •

edited

Loading

Th0masL commented Jul 13, 2023 •

edited

Loading