Traffic not goes through routes at Fedora 33 #1614

rma945 · 2021-09-13T07:24:58Z

What happened:
I have built a new AMI based on community Fedora 33 1.2 AMI with bootstrap scripts from this repository - https://github.com/awslabs/amazon-eks-ami and everything works fine, except the aws-vpc-cni. I have checked the nodes and found that pods are successfully created, the elastic interfaces are allocated and the routing tables are created successfully, but the pods can't ping or connect through a TCP to any external or local IP. But when I change the CNI plugin to calico - pods are able to reach any IP. The second worker node, based at AWS EKS AMI - works fine, the problem only with a Fedora-based AMI worker. Also, I was tried to switch the container runtime from docker to pure contained, but that not helps, the aws-vpc-cni still not works.

What you expected to happen:
Pods should be able to connect to local and internal services

How to reproduce it (as minimally and precisely as possible):
Get the Fedora 33 1.2 AMI, join it into the EKS cluster and add aws-vpc addon

Attached logs*
eks_i-0cca70aab4bdd2bc1_2021-09-13_0719-UTC_0.6.2.tar.gz

Anything else we need to know?:
Environment:

Kubernetes version: 1.21
CNI Version: v1.9.0-eksbuild.1
OS: Fedora 33 (Cloud Edition)
Kernel: 5.8.15-301.fc33.x86_64

jayanthvn · 2021-09-14T05:53:32Z

Hi @rma945

Can we capture tcpdump on any of the nodes to verify if there is any issue with the ip tables or rules? Say when you ping from Pod-a to Pod-b

install tcpdump on the node
start capturing traffic on eth0 of node (assuming eth0 is the ENI for Pods)
tcpdump -i eth0 -w node_a_eth0.pcap

You can attach the pcap to the issue.

Also the logs attached above doesn't seem to download so can you reattach it?

rma945 · 2021-09-14T08:44:03Z

There is the PCAP file, and I have also re-upload the debug logs
node_a_eth0.zip

jayanthvn · 2021-09-15T17:47:09Z

Can you confirm the source and destination Pod IPs and also did you ping or curl?

rma945 · 2021-09-16T06:32:27Z

Yes, any address except 127.0.01 can't be accessed. I have tried ping and curl to different IP addresses (pods, internal kube api, external services)

jayanthvn · 2021-09-21T18:28:40Z

Can you please check if you are hitting this issue - #1600 (comment)

RomanCherednikovAZ · 2021-09-22T16:19:58Z

yes, I have already tried to disable the NetworkManager routing rules, as suggested at this issue but this not help. Also, in my case - looks like that the routes are added properly, but the routing is blocked by some reason

[root@ip-172-24-67-130 ~]# ip rule list
0:      from all lookup local
512:    from all to 172.24.67.25 lookup main  <- pod cni ip
1024:   from all fwmark 0x80/0x80 lookup main
32766:  from all lookup main
32767:  from all lookup default

jayanthvn · 2021-09-24T05:11:12Z

Thanks for checking @RomanCherednikovAZ . Sorry I meant in the pcap file attached, can you please confirm the source and destination IPs ? If you have some bandwidth can you also please capture tcp dump on the destination side? Basically if we have the tcp dump on the source and destination nodes then we can co-relate where the drop is happening.

Also I see the CNI version in your logs is 1.7.10 so can you please confirm the cni version?

Sep 13 07:13:50 ip-172-24-68-126.definiens.local kubelet[1099]: {"level":"info","ts":"2021-09-13T07:13:50.020Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"CNI Plugin version: v1.7.10 ..."}

I do see this log in kubelet -

Sep 13 07:12:04 ip-172-24-68-126.definiens.local kubelet[1099]: I0913 07:12:04.930400    1099 cni.go:204] "Error validating CNI config list" configList="{\n  \"cniVersion\": \"0.3.1\",\n  \"name\": \"aws-cni\",\n  \"plugins\": [\n    {\n      \"name\": \"aws-cni\",\n      \"type\": \"aws-cni\",\n      \"vethPrefix\": \"eni\",\n      \"mtu\": \"9001\",\n      \"pluginLogFile\": \"/var/log/aws-routed-eni/plugin.log\",\n      \"pluginLogLevel\": \"DEBUG\"\n    },\n    {\n      \"type\": \"portmap\",\n      \"capabilities\": {\"portMappings\": true},\n      \"snat\": true\n    }\n  ]\n}" err="[failed to find plugin \"aws-cni\" in path [/opt/cni/bin]]"

achevuru · 2021-10-14T19:18:15Z

@RomanCherednikovAZ Any update w.r.t the above logs? It appears that aws-node failed to copy the CNI binary to /opt/cni/bin. Can we check if there are any permission issues? You should be able to exec in to aws-node pod and try it out manually to see if it succeeds.

https://github.com/aws/amazon-vpc-cni-k8s/blob/master/scripts/entrypoint.sh#L149

rma945 · 2021-10-15T10:46:36Z

Sorry for the confusion you, but rma945 and RomanCherednikovAZ = the same person. Yeah, I have checked that the CNI binaries were successfully added at the node, and the Network state for the node was changed to Ready. So the CNI itself - works fine, but looks like there were some issue with routing at the node. Anyway, at this moment we were anyway migrate our worker nodes back to the AWS EKS AMI.

rma945 added the bug label Sep 13, 2021

rma945 closed this as completed Oct 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Traffic not goes through routes at Fedora 33 #1614

Traffic not goes through routes at Fedora 33 #1614

rma945 commented Sep 13, 2021 •

edited

Loading

jayanthvn commented Sep 14, 2021

rma945 commented Sep 14, 2021 •

edited

Loading

jayanthvn commented Sep 15, 2021

rma945 commented Sep 16, 2021 •

edited

Loading

jayanthvn commented Sep 21, 2021

RomanCherednikovAZ commented Sep 22, 2021

jayanthvn commented Sep 24, 2021 •

edited

Loading

achevuru commented Oct 14, 2021

rma945 commented Oct 15, 2021

Traffic not goes through routes at Fedora 33 #1614

Traffic not goes through routes at Fedora 33 #1614

Comments

rma945 commented Sep 13, 2021 • edited Loading

jayanthvn commented Sep 14, 2021

rma945 commented Sep 14, 2021 • edited Loading

jayanthvn commented Sep 15, 2021

rma945 commented Sep 16, 2021 • edited Loading

jayanthvn commented Sep 21, 2021

RomanCherednikovAZ commented Sep 22, 2021

jayanthvn commented Sep 24, 2021 • edited Loading

achevuru commented Oct 14, 2021

rma945 commented Oct 15, 2021

rma945 commented Sep 13, 2021 •

edited

Loading

rma945 commented Sep 14, 2021 •

edited

Loading

rma945 commented Sep 16, 2021 •

edited

Loading

jayanthvn commented Sep 24, 2021 •

edited

Loading