podman stop: Unable to clean up network: netavark: remove aardvark entries: check aardvark-dns netns: IO error: Permission denied #22103

edsantiago · 2024-03-20T20:51:53Z

This is one of those nasty ones that hides in logs, making it impossible for me to get full data.

Best I can tell, the first instance was Feb 9, in rawhide rootless. Seen also in f39 root.

$ podman [options] stop --all -t 0
time="2024-03-20T12:25:48-05:00" level=error msg="Unable to clean up network for container SHA: \"1 error occurred:\\n\\t* netavark: remove aardvark entries: check aardvark-dns netns: IO error: Permission denied (os error 13)\\n\\n\""

Incomplete list below. There are maybe 3-4 others, it is way too hard to get a complete list.

fedora-39 : int podman fedora-39 root host sqlite
- 03-04 07:36 in TOP-LEVEL [AfterEach] Podman kube play with auto update annotations for first container only
fedora-39 : int podman fedora-39 rootless host sqlite
- 03-20 13:38 in TOP-LEVEL [AfterEach] Podman kube play with image data
rawhide : int podman rawhide root host sqlite
- 03-05 13:32 in TOP-LEVEL [AfterEach] Podman kube play no security context

x	x	x	x	x	x
int(3)	podman(3)	fedora-39(2)	root(2)	host(3)	sqlite(3)
		rawhide(1)	rootless(1)

The text was updated successfully, but these errors were encountered:

Luap99 · 2024-03-21T09:57:26Z

I don't get why it would fail with EACCES even as root.
These are the only two lines that could fail https://github.com/containers/netavark/blob/cc3f35d2e87defa2e12d0ffeb59a57035e8a5902/src/dns/aardvark.rs#L131-L132

And I really do not see why this would fail with anything other the ENOENT which is already ignored by the code. I can see the EACCES might happen as rootless in case where the aardvark pid was already reused by another process where we do not have privs on, but as root that can never be the case.

Luap99 · 2024-03-21T10:22:35Z

ok I guess we need to ignore more errors, I am using something this to reproduce the logic easily:
while :; do sleep 10 & kill -HUP $! && ls -l /proc/$!/ns/net 2>&1 | tee /dev/stderr | grep -E "No such file or directory|net:" || break ; done
I wrongly assumed the only error can be ENOENT, however during testing this several times I also saw ESRCH and importantly the here reported EACCES.

So at this point I wonder if it makes sense to not simply ignore all errors. This check is only a nice to have to make us aware of a inconsistent aardvark-dns vs rootless-netns state: #20396.

edsantiago · 2024-04-02T19:24:21Z

ping

fedora-39 : int podman fedora-39 rootless host sqlite
- 03-26 14:31 in TOP-LEVEL [AfterEach] Podman kube play test with reserved Label annotation in yaml
rawhide : int podman rawhide root host sqlite
- 03-26 07:50 in Podman checkpoint podman checkpoint container with established tcp connections
rawhide : int podman rawhide rootless host sqlite
- 03-26 10:03 in TOP-LEVEL [AfterEach] Podman kube play use network mode from config

x	x	x	x	x	x
int(3)	podman(3)	rawhide(2)	rootless(2)	host(3)	sqlite(3)
		fedora-39(1)	root(1)

Right now there is a race condition where we return errors even in cases where they should be ignored. When we send SIGHUP to aardvark on teardown it might exit when all containers are removed. This means the check afterwards might read the netns path at a weird time while the process is being removed from the kernel structures. I asummed the only error can be ENOENT but I was wrong, in CI we also see EACCES and in my reproducer I also saw ESRCH. Given the check is a nice to have do ignore all errors there. Fixes containers/podman#22103 Signed-off-by: Paul Holzinger <[email protected]>

Right now there is a race condition where we return errors even in cases where they should be ignored. When we send SIGHUP to aardvark on teardown it might exit when all containers are removed. This means the check afterwards might read the netns path at a weird time while the process is being removed from the kernel structures. I assumed the only error can be ENOENT but I was wrong, in CI we also see EACCES and in my reproducer I also saw ESRCH. Given the check is a nice to have do ignore all errors there. Fixes containers/podman#22103 Signed-off-by: Paul Holzinger <[email protected]>

Luap99 · 2024-04-03T12:03:54Z

containers/netavark#956

edsantiago · 2024-09-06T13:37:44Z

Looks like the same bug, except ENOENT instead of EACCESS:

# podman [options] stop --all -t 0
[cid1]
Error: removing container [cid2] network: netavark: remove aardvark entries: failed to get aardvark pid: IO error: No such file or directory (os error 2)

In f40 root. File a new bug, or reopen this one?

Luap99 · 2024-09-06T14:04:45Z

I saw that earlier, we can reopen this but on stop it is working differently and I very much fear that there is no way around these races until containers/aardvark-dns#338 is addressed

edsantiago added the flakes Flakes from Continuous Integration label Mar 20, 2024

Luap99 added the network Networking related issue or feature label Mar 21, 2024

Luap99 self-assigned this Apr 3, 2024

Luap99 added the In Progress This issue is actively being worked by the assignee, please do not work on this at this time. label Apr 3, 2024

Luap99 mentioned this issue Apr 3, 2024

fix aardvark-dns netns check containers/netavark#956

Merged

openshift-merge-bot bot closed this as completed in containers/netavark#956 Apr 10, 2024

Luap99 mentioned this issue May 6, 2024

podman top: read /proc/226420/cmdline: no such process #22619

Closed

stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Jul 10, 2024

stale-locking-app bot locked as resolved and limited conversation to collaborators Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

podman stop: Unable to clean up network: netavark: remove aardvark entries: check aardvark-dns netns: IO error: Permission denied #22103

podman stop: Unable to clean up network: netavark: remove aardvark entries: check aardvark-dns netns: IO error: Permission denied #22103

edsantiago commented Mar 20, 2024

Luap99 commented Mar 21, 2024

Luap99 commented Mar 21, 2024

edsantiago commented Apr 2, 2024

Luap99 commented Apr 3, 2024

edsantiago commented Sep 6, 2024

Luap99 commented Sep 6, 2024

podman stop: Unable to clean up network: netavark: remove aardvark entries: check aardvark-dns netns: IO error: Permission denied #22103

podman stop: Unable to clean up network: netavark: remove aardvark entries: check aardvark-dns netns: IO error: Permission denied #22103

Comments

edsantiago commented Mar 20, 2024

Luap99 commented Mar 21, 2024

Luap99 commented Mar 21, 2024

edsantiago commented Apr 2, 2024

Luap99 commented Apr 3, 2024

edsantiago commented Sep 6, 2024

Luap99 commented Sep 6, 2024