-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error from server: error dialing backend: dial tcp 10.10.71.57:10250: i/o timeout #1181
Comments
We need more information to understand the issue you're reporting. Does the node become |
I'm closing this issue because we don't have enough information to reach a conclusion. |
Can this be reopened? I believe I'm seeing the same problem, when building on top of the CIS Level 1 Benchmark AL2 AMI. The AMI seems to build fine, but when deployed, I see issues similar to what was described, "i/o timeout", etc. This only seems to happen for some pods, sometimes, when attempting to connect to the kubernetes API service endpoint IP that is local to the cluster. I suspect the problem is from some certain sysctl parameter that was set in the CIS base image, but I can't find what exactly. |
@raackley we don't currently ship a CIS variant of the AMI, and this template doesn't explicitly support building on a CIS base. If there's a reasonable change we can make in the template to unblock that use case, we're not opposed; but we don't track CIS-specific issues here. |
@cartermckinnon Sure, but is there a specific reason why it is known not to work? Seems like it should be fine. Obviously people are wanting this, so can this be a formal request for a CIS hardened base EKS image, or support to select a CIS hardened AMI to build your own? |
The issue you're describing:
Sounds different from the one reported here. The OP was describing communication failures between the API server and kubelet.
This is a longstanding request (#99), it's absolutely on our radar.
If the issue lies in the CIS base AMI as you suspect, there's probably not much we can do in this template to ensure compatibility. If some logic here is causing the breakage, we're totally open to PR's. |
I'm also building CIS-hardened AMIs on Amazon Linux, and we were experiencing similar issues when upgrading from EKS 1.23 (on Docker) to EKS 1.25 (on Containerd). On our side, the problem was due to the fact that the CIS-hardened AMI is disabling IP Fowarding (most likely directly in When using ContainerD, it is not enabling IP Forwarding, so the containers don't have network access. By looking in the sysctl config, we can see that the IP Forwarding parameters that are configured in the file See this ticket that confirms the behavior of loading In order to fix the network connectivity issue for the containers, we wrote a small bash script that re-enable IP Forwarding in So if you see problems with containers connectivity, you can run the following command to confirm that IP Forwarding is enabled: $ sysctl -a 2>/dev/null | grep -E "ip_forward |\.forwarding " Example output : net.ipv4.conf.all.forwarding = 1
net.ipv4.conf.default.forwarding = 1
net.ipv4.conf.eni0f6f0f8b0af.forwarding = 1
net.ipv4.conf.eni101ddb65507.forwarding = 1
net.ipv4.conf.eni123f74d1678.forwarding = 1
net.ipv4.conf.eni25a0644a127.forwarding = 1
net.ipv4.conf.eni3c9bdd19a55.forwarding = 1
net.ipv4.conf.eni4cb3ea8d9b7.forwarding = 1
net.ipv4.conf.eni4f5f128fbd7.forwarding = 1
net.ipv4.conf.eni533143c3bfa.forwarding = 1
net.ipv4.conf.eni5a9462606df.forwarding = 1
net.ipv4.conf.eni7d229876a48.forwarding = 1
net.ipv4.conf.enic2b1da1faef.forwarding = 1
net.ipv4.conf.enic611f40a78a.forwarding = 1
net.ipv4.conf.enic68b338523f.forwarding = 1
net.ipv4.conf.enid9a8087c67d.forwarding = 1
net.ipv4.conf.eth0.forwarding = 1
net.ipv4.conf.eth1.forwarding = 1
net.ipv4.conf.lo.forwarding = 1
net.ipv4.ip_forward = 1 |
What happened:
for CIS ami changed /tmp to /home/ec2-user and --bin-dir /bin/ to /usr/local/bin
amazon-eks-ami/scripts/install-worker.sh
Line 138 in 343e830
sudo "${AWSCLI_DIR}/aws/install" --bin-dir /bin/
When using the image created with EKS node groups, cannot exec into or view pod logs.
Error from server: error dialing backend: dial tcp 10.10.71.57:10250: i/o timeout
What you expected to happen:
Successfully run the AMI created in EKS and be able to view logs and exec into the pod.
Environment:
aws eks describe-cluster --name <name> --query cluster.platformVersion
): "eks.6"aws eks describe-cluster --name <name> --query cluster.version
): "1.23"The text was updated successfully, but these errors were encountered: