Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman network rm: ... cni plugin bridge failed #12459

Closed
edsantiago opened this issue Nov 30, 2021 · 4 comments · Fixed by #12469
Closed

podman network rm: ... cni plugin bridge failed #12459

edsantiago opened this issue Nov 30, 2021 · 4 comments · Fixed by #12469
Assignees
Labels
flakes Flakes from Continuous Integration In Progress This issue is actively being worked by the assignee, please do not work on this at this time. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. rootless

Comments

@edsantiago
Copy link
Member

Old flake that I'm only just now categorizing:

Running: podman [options] network rm -f IntTestac189a6ddfab205ea1735da9e2eee448a7ea24c1267e1c1c2e7983cc8f2b546a
time="2021-11-29T15:59:20Z" level=warning msg="Failed to load cached network config: network IntTestac189a6ddfab205ea1735da9e2eee448a7ea24c1267e1c1c2e7983cc8f2b546a not found in CNI cache, falling back to loading network IntTestac189a6ddfab205ea1735da9e2eee448a7ea24c1267e1c1c2e7983cc8f2b546a from disk"
time="2021-11-29T15:59:20Z" level=error msg="Unable to cleanup network for container c6d76e20da51d3a803e401de6fb9a678d80712b49ad9be491248bbef2e4c0e14: \"error tearing down network namespace configuration for container c6d76e20da51d3a803e401de6fb9a678d80712b49ad9be491248bbef2e4c0e14: 1 error occurred:\\n\\t* plugin type=\\\"bridge\\\" failed (delete): cni plugin bridge failed: running [/usr/sbin/iptables -t nat -D POSTROUTING -s 10.89.1.3 -j CNI-39998ddfc7c5a7d9bb7f8f3e -m comment --comment name: \\\"IntTestac189a6ddfab205ea1735da9e2eee448a7ea24c1267e1c1c2e7983cc8f2b546a\\\" id: \\\"c6d76e20da51d3a803e401de6fb9a678d80712b49ad9be491248bbef2e4c0e14\\\" --wait]: exit status 2: iptables v1.8.7 (nf_tables): Chain 'CNI-39998ddfc7c5a7d9bb7f8f3e' does not exist\\nTry `iptables -h' or 'iptables --help' for more information.\\n\\n\\n\""

This is another one of those that flake in multiple different tests, making it very hard to isolate:

Podman run networking [It] podman run check dnsname plugin

It might be coincidence, but in logs where this error happens, I also tend to see this error:

Running: podman [options] run --log-driver k8s-file --rm --net 902b4601ee5ad3180d7defd95b28c0fca045a0761cb355a53b01ab849af977e1 --uidmap 0:1:4096 quay.io/libpod/alpine:latest sh -c echo podman | nc -w 1 nc-server.dns.podman 9480
nc: bad address 'nc-server.dns.podman'
time="2021-11-09T03:17:04-06:00" level=error msg="Unable to cleanup network for container c50a51af56265902babb1c903471eea7ff245ec1088961d39bcb673f83a879ac: \"error getting rootless network namespace: failed to Statfs \\\"/run/user/9228/netns/rootless-netns\\\": no such file or directory\""

...and I also tend to see errors in this other test:

Podman run networking [It] podman cni network works across user ns

@edsantiago edsantiago added flakes Flakes from Continuous Integration rootless labels Nov 30, 2021
@Luap99
Copy link
Member

Luap99 commented Dec 1, 2021

PR #12348 should have fixed the error getting rootless network namespace: failed to Statfs \\\"/run/user/9228/netns/rootless-netns\\\": no such file or directory problem. Given that almost all logs are from before that PR I think it helped but there still seems to be a small corner case where it can fail.

@Luap99
Copy link
Member

Luap99 commented Dec 1, 2021

I think I can fix it by ignoring the ENOENT error on teardown. If the namespace file does not exists there is no reason to teardown.

@Luap99 Luap99 added the In Progress This issue is actively being worked by the assignee, please do not work on this at this time. label Dec 1, 2021
@edsantiago
Copy link
Member Author

There is (a corner case). The reason I took the time to file this is because I saw it on my own PR the other day (Nov 29) (and yes, I was fully rebased). I didn't realize that the earlier instances were different. Thanks for looking into it.

@Luap99
Copy link
Member

Luap99 commented Dec 1, 2021

Actually no my assumption is wrong, ignoring ENOENT is a bad idea.

The problem is that the rootless netns operations are locked. So if two container are stopped at the same time one needs to wait for the other. After the first container is done it will call cleanup to make sure we are not leaking the netns if no containers are running. This is a simple is running check and it will return true because the second container is already stopped and just waiting for the rootless netns lock. So now cleanup will remove the netns and when the second container tries to use the netns it will fail. This is a problem because we need to teardown the network for the second container to free assigned ips for example so we cannot ignore this error.

I am not sure if I can find a good solution here.

Luap99 added a commit to Luap99/libpod that referenced this issue Dec 1, 2021
rootlessNetNS.Cleanup() has an issue with how it detects if cleanup
is needed, reading the container state is not good ebough because
containers are first stopped and than cleanup will be called. So at one
time two containers could wait for cleanup but the second one will fail
because the first one triggered already the cleanup thus making rootless
netns unavailable for the second container resulting in an teardown
error. Instead of checking the container state we need to check the
netns state.

Secondly, podman unshare --rootless-netns should not do the cleanup.
This causes more issues than it is worth fixing. Users also might want
to use this to setup the namespace in a special way. If unshare also
cleans this up right away we cannot do this.

[NO NEW TESTS NEEDED]

Fixes containers#12459

Signed-off-by: Paul Holzinger <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
flakes Flakes from Continuous Integration In Progress This issue is actively being worked by the assignee, please do not work on this at this time. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. rootless
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants