-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
External network past SNAT unavailable on pod start #732
Comments
We see this issue again in 1.8.2 with lower probability after the patch, maybe there is still some delay between ipset take effect. We need find new way to fix it. |
这个bug是不是在如下场景中也会出现: |
It seems to be a TCP-related problem. In my testing, it was reproduced only in TCP connections. EDIT: It's still an |
Following our discussion on Slack, we confirmed that any network request to an external network made during 0-3 seconds after pod start is blocked and as a result lots of pod are failing (such as pod executing helm or curl or git clone as their first CMD).
During investigation we confirmed that this behavior occurs only on requests made to external network, which lead us to look at the SNAT implementation.
It seems that when the CNI is attaching the network, it return success if the pod gateway is pingable, but at that time the SNAT rules for external access are not installed yet, because they are installed and updated every 3 seconds by ipset/iptables. This causes in turn the requests to external network to be blocked and most commands are waiting forever until the pod fails.
This can be reproduced using a pod run command like this one :
kubectl run testcurl --image=centos:8 --restart=Never -- sh -c 'for i in $(seq 1 10); do echo $i ; date; curl -vsI --connect-timeout 0.5 https://www.google.com/; echo ; sleep 0.1; done'
On other implementations, the logs show that the external network is always ready on the first request, where as in kube-ovn this happens only a few (1-3) seconds after pod creation.
The text was updated successfully, but these errors were encountered: