External network past SNAT unavailable on pod start #732

gbarazer · 2021-03-29T06:37:09Z

Following our discussion on Slack, we confirmed that any network request to an external network made during 0-3 seconds after pod start is blocked and as a result lots of pod are failing (such as pod executing helm or curl or git clone as their first CMD).
During investigation we confirmed that this behavior occurs only on requests made to external network, which lead us to look at the SNAT implementation.
It seems that when the CNI is attaching the network, it return success if the pod gateway is pingable, but at that time the SNAT rules for external access are not installed yet, because they are installed and updated every 3 seconds by ipset/iptables. This causes in turn the requests to external network to be blocked and most commands are waiting forever until the pod fails.
This can be reproduced using a pod run command like this one :
kubectl run testcurl --image=centos:8 --restart=Never -- sh -c 'for i in $(seq 1 10); do echo $i ; date; curl -vsI --connect-timeout 0.5 https://www.google.com/; echo ; sleep 0.1; done'

On other implementations, the logs show that the external network is always ready on the first request, where as in kube-ovn this happens only a few (1-3) seconds after pod creation.

The text was updated successfully, but these errors were encountered:

oilbeater · 2022-02-10T08:45:49Z

We see this issue again in 1.8.2 with lower probability after the patch, maybe there is still some delay between ipset take effect. We need find new way to fix it.

hackeren · 2022-02-17T09:02:29Z

We see this issue again in 1.8.2 with lower probability after the patch, maybe there is still some delay between ipset take effect. We need find new way to fix it.

这个bug是不是在如下场景中也会出现：
一个使用EIP的Pod在initContainer中判断所需要的service是否就绪，此时该initContainer会访问nodelocaldns，即169.xx网段（相当于是外网），此时initContainer无法进行dns解析。从lr-router-list中看到，该Pod的源IP路由未被添加。

zhangzujian · 2022-03-01T12:21:03Z

It seems to be a TCP-related problem. In my testing, it was reproduced only in TCP connections.

EDIT:

It's still an ipset issue. Should be fixes now.

oilbeater mentioned this issue Mar 31, 2021

Remove daemon check to ping pod gw #734

Closed

zhangzujian added a commit to zhangzujian/kube-ovn that referenced this issue Apr 22, 2021

fix kubeovn#732

412f6b0

zhangzujian mentioned this issue Apr 22, 2021

fix SNAT on pod startup #755

Merged

oilbeater closed this as completed in #755 Apr 23, 2021

zhangzujian mentioned this issue Sep 8, 2021

fix nat-outgoing/policy-routing on pod startup #1015

Merged

oilbeater reopened this Feb 10, 2022

oilbeater assigned zhangzujian Feb 16, 2022

zhangzujian mentioned this issue Mar 2, 2022

fix SNAT/PR on Pod startup #1346

Merged

zhangzujian closed this as completed in #1346 Mar 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

External network past SNAT unavailable on pod start #732

External network past SNAT unavailable on pod start #732

gbarazer commented Mar 29, 2021

oilbeater commented Feb 10, 2022

hackeren commented Feb 17, 2022

zhangzujian commented Mar 1, 2022 •

edited

Loading

External network past SNAT unavailable on pod start #732

External network past SNAT unavailable on pod start #732

Comments

gbarazer commented Mar 29, 2021

oilbeater commented Feb 10, 2022

hackeren commented Feb 17, 2022

zhangzujian commented Mar 1, 2022 • edited Loading

zhangzujian commented Mar 1, 2022 •

edited

Loading