pod stats: unknown FS magic on "/run/user/4902/netns/netns-etc-etc" #17903

edsantiago · 2023-03-23T20:40:37Z

podman pod stats on a specific running pod
...
$ podman [options] run --http-proxy=false --pod cec7af2e2f8e0479a91d94bff88cf8482c2907f7644eb7c90a9ea01b2f13ff22 -d quay.io/libpod/alpine:latest top
eed6053f6f0e8a2f883464f994bb30d09304274e1b1e1cdb2eb52a0f49ae3985
$ podman [options] pod stats --no-stream cec7af2e2f8e0479a91d94bff88cf8482c2907f7644eb7c90a9ea01b2f13ff22
Error: unknown FS magic on "/run/user/4902/netns/netns-a2804d97-802b-9e57-2e40-11c1206c102c": 1021994
[AfterEach] Podman pod stats
  /var/tmp/go/src/github.com[/containers/podman/test/e2e/pod_stats_test.go:33](https://github.com/containers/podman/blob/956677a741cdcce627dda4336f85c8fc0be83a5c/test/e2e/pod_stats_test.go#L33)
$ podman [options] pod rm -fa -t 0
time="2023-03-23T15:11:45-05:00" level=error msg="Unable to clean up network for container 652344ca5642edd24b93f810fcb9ebbb1c0969195a67454bc05b8a67e7ed3185: \"unmounting network namespace for container 652344ca5642edd24b93f810fcb9ebbb1c0969195a67454bc05b8a67e7ed3185: failed to unmount NS: at /run/user/4902/netns/netns-a2804d97-802b-9e57-2e40-11c1206c102c: invalid argument\""
         cec7af2e2f8e0479a91d94bff88cf8482c2907f7644eb7c90a9ea01b2f13ff22

The text was updated successfully, but these errors were encountered:

Luap99 · 2023-03-24T13:16:25Z

The only way I can see this happening is if /run/user/4902/netns/netns-a2804d97-802b-9e57-2e40-11c1206c102c was unmounted after the pod was started. The question would be why?
My best guess would be that we somehow used the same netns for two different containers, reading
https://github.com/containers/common/blob/main/pkg/netns/netns_linux.go#L103-L104 I see zero duplicate detection, so if the RNG returns the same name it will cause issues. Very unlikely but in theory could cause this one.

The code picks a random name but never checked if this name was already in use. If the RNG creates generates the same name again we will end up with containers sharing the same netns which can be very bad. It is very unlikely to happen but this the only explanation I find for containers/podman#17903. Signed-off-by: Paul Holzinger <[email protected]>

The code picks a random name but never checked if this name was already in use. If the RNG creates generates the same name again we will end up with contianers sharing the same netns which can be very bad. It is very unlikely to happen but this the only explanation I find for containers/podman#17903. Signed-off-by: Paul Holzinger <[email protected]>

The code picks a random name but never checked if this name was already in use. If the RNG generates the same name again we will end up with containers sharing the same netns which can be very bad. It is very unlikely to happen but this the only explanation I find for containers/podman#17903. Signed-off-by: Paul Holzinger <[email protected]>

Luap99 · 2023-03-24T14:17:33Z

I think https://github.com/containers/common/pull/1381should prevent this but I cannot be sure what the cause here was.

The code picks a random name but never checked if this name was already in use. If the RNG generates the same name again we will end up with containers sharing the same netns which can be very bad. It is very unlikely to happen but this the only explanation I find for containers/podman#17903. Signed-off-by: Paul Holzinger <[email protected]>

edsantiago · 2023-03-28T11:29:54Z

Could this be related, or should I file a new issue?

  podman run container with systemd PID1
...
$ podman [options] rm -fa -t 0
time="2023-03-27T20:34:09-05:00" level=error msg="Unable to clean up network 
    for container b717c2ad753aa77f65e18a8bd64de0e62f336d954411f361c0a427187bb80155: 
    \"unmounting network namespace for container b717c2ad753aa77f65e18a8bd64de0e62f336d954411f361c0a427187bb80155:
     failed to unmount NS: at /run/user/6098/netns/netns-5db6ed3c-beda-ab7d-9006-4442e676aa85: invalid argument\""

Luap99 · 2023-03-28T11:42:00Z

It is the same error so yes it should be the same problem, however if we see it more often then I no longer think my fix is enough.

It still can be that we try to unmount the netns twice in podman and at least this is the first time you report this bug and both logs say sqlite so it can be very well a new sqlite bug.

edsantiago · 2023-03-28T11:50:39Z

both logs say sqlite so it can be very well a new sqlite bug.

Big grain of salt: all of these are being found in #17831 which, yes, is hammering on sqlite, but the big big big deal in that PR is disabling the ginkgo triple-flake-retry. It is possible, even likely, that these are all bugs that have always been there but we never see because of the ginkgo triple-retry.

edsantiago · 2023-03-28T11:56:35Z

Oooh, here's an interesting one, f37 rootless boltdb, showing two failures very close together in time, suggesting that it's a concurrency issue, that something needs to be locked to prevent these tests from running at the same time.

  podman stats on a running container
...
Error: unknown FS magic on "/run/user/2657/netns/netns-a70ab2b5-2ef4-2369-5a47-29b423e10a55": 1021994

...
  podman stats on all containers
...
Error: unknown FS magic on "/run/user/2657/netns/netns-0c7f9246-ecb5-754a-dd54-3b2723c4609f": 1021994

edsantiago · 2023-04-03T11:43:32Z

Here's an interesting new variation on the theme. In system tests (the usual FS-magic flake is in int), debian rootless boltdb:

[+1170s] not ok 421 podman network create
...
$ podman network create --subnet 172.23.105.0/24 testnet-8cBnbGlZxb
testnet-8cBnbGlZxb
...
$ podman run --rm --network testnet-8cBnbGlZxb quay.io/libpod/testimage:20221018 ip a
Error: creating rootless network namespace: open /run/user/3152/netns/rootless-netns-c24c98c9183e681a4c80: file exists

From that point on, pretty much all the rest of the tests fail. The word "magic" does not appear anywhere in the test log, but I still think this is related. If it isn't, lmk and I'll file a separate issue.

Luap99 · 2023-04-03T11:46:09Z

This is definitely caused by my change, file exists was not a problem before.
I thought the rootless netns namespace is locked properly but apparently there is a race somewhere.

Luap99 · 2023-04-03T12:06:29Z

Mhh, I am not really sure where it went wrong.
Looking at the code it is definitely locked:

podman/libpod/networking_linux.go

Lines 335 to 379 in 5190ab1

    
           lfile := filepath.Join(runDir, "rootless-netns.lock") 
        
           lock, err := lockfile.GetLockFile(lfile) 
        
           if err != nil { 
        
           	return nil, fmt.Errorf("failed to get rootless-netns lockfile: %w", err) 
        
           } 
        
           lock.Lock() 
        
           defer func() { 
        
           	// In case of an error (early exit) rootlessNetNS will be nil. 
        
           	// Make sure to unlock otherwise we could deadlock. 
        
           	if rootlessNetNS == nil { 
        
           		lock.Unlock() 
        
           	} 
        
           }() 
        
           rootlessNetNsDir := filepath.Join(runDir, rootlessNetNsName) 
        
           err = os.MkdirAll(rootlessNetNsDir, 0700) 
        
           if err != nil { 
        
           	return nil, fmt.Errorf("could not create rootless-netns directory: %w", err) 
        
           } 
        
           nsDir, err := netns.GetNSRunDir() 
        
           if err != nil { 
        
           	return nil, err 
        
           } 
        
           // create a hash from the static dir 
        
           // the cleanup will check if there are running containers 
        
           // if you run a several libpod instances with different root/runroot directories this check will fail 
        
           // we want one netns for each libpod static dir so we use the hash to prevent name collisions 
        
           hash := sha256.Sum256([]byte(r.config.Engine.StaticDir)) 
        
           netnsName := fmt.Sprintf("%s-%x", rootlessNetNsName, hash[:10]) 
        
           path := filepath.Join(nsDir, netnsName) 
        
           ns, err := ns.GetNS(path) 
        
           if err != nil { 
        
           	if !new { 
        
           		// return an error if we could not get the namespace and should no create one 
        
           		return nil, fmt.Errorf("getting rootless network namespace: %w", err) 
        
           	} 
        
           	// create a new namespace 
        
           	logrus.Debugf("creating rootless network namespace with name %q", netnsName) 
        
           	ns, err = netns.NewNSWithName(netnsName) 
        
           	if err != nil { 
        
           		return nil, fmt.Errorf("creating rootless network namespace: %w", err) 
        
           	}

Somehow ns.GetNS(path) is failing but netns.NewNSWithName(netnsName) is complaining about file exists.
The only way this can happen is when the file exists but there is no netns bind mounted on it. In this case ns.GetNS(path) will fail with unknown FS magic on... so it looks very relevant to this issue.

It is not quite clear what is going on but I think something must be unmounting the netns bind mounts.
Or maybe a podman process got killed between the unmount() and remove() call: https://github.com/containers/common/blob/e572bd0cf69426db7e94aaf2afa44742f3c118be/pkg/netns/netns_linux.go#L191-L197

edsantiago · 2023-04-03T12:15:58Z

Or maybe a podman process got killed

These are system tests. That's not impossible (there are a few explicit background podman jobs), but it seems unlikely.

Luap99 · 2023-04-03T12:20:32Z

Or maybe a podman process got killed

These are system tests. That's not impossible (there are a few explicit background podman jobs), but it seems unlikely.

Yes I agree very unlikely but I consider something unmounts the netns also in the category very unlikely.

Luap99 · 2023-04-03T13:30:14Z

@edsantiago Could you confirm that this only happens rootless?

I think I understand now what may trigger this. In order for the netns to be valid it must be a child of the podman user namespace. This namespace is kept alive by the podman pause process.

I can reproduce this the following way:

$ podman unshare --rootless-netns ip a
$ pkill "podman pause"
$ podman unshare --rootless-netns ip a
Error: creating rootless network namespace: open /run/user/1000/netns/rootless-netns-294b745a758a274d54d9: file exists

and if I patch it to show the error from ns.GetNS() we see that this is the same issue:
Error: getting rootless network namespace: unknown FS magic on "/run/user/1000/netns/rootless-netns-294b745a758a274d54d9": 1021994

This is at least the only thing which can consistently trigger this, so the question then would be what killed podman pause? One command which does that is podman system migrate but I cannot find any reference of this in either e2e or system tests.

@giuseppe Any idea?

edsantiago · 2023-04-03T13:38:34Z

Nice detective work. Unfortunately I can't confirm anything because the log I linked above is the only instance I've found of that failure. But if we look at the sequence of tests:

ok 417 podman pod manages /etc/hosts correctly  <<<<<<<<<<<<<<<<<<<<<<<<<<<
ok 418 podman run with slirp4ns assigns correct addresses to /etc/hosts
ok 419 podman run with slirp4ns adds correct dns address to resolv.conf
ok 420 podman run with slirp4ns assigns correct ip address container
not ok 421 podman network create

...and we look at the source of the 417 test:

podman/test/system/500-networking.bats

Lines 171 to 175 in 80a199a

    
           run_podman pod rm $pod_name 
        
           is "$output" "$pid" "Only ID in output (no extra errors)" 
        
           # Clean up 
        
           run_podman rmi $(pause_image)

...we see pod and pause-image cleanup, which is consistent with your findings (even if the timing doesn't appear to be). Is it worth poking into pod removal code to see if something is not being cleaned up properly?

Luap99 · 2023-04-03T13:52:42Z

The pod pause image is different and should not be related to the podman pause process.

The podman pause process is created by the first podman command you run, podman will create the user namesapce once and to keep it alive it needs a running process (podman pause). All the following process will join this user and mount namespace via setns(). You can find the pause pid in $XDG_RUNTIME_DIR/libpod/tmp/pause.pid

Luap99 · 2023-04-03T14:09:55Z

#18024 should at least let us recover so not all tests blow up when it happens. This should restore the previous behavior but it doesn't address the root cause.

Luap99 · 2023-04-04T09:41:07Z

IMO, we should either clean up that directory or not fail if a file already exists. I think your patch is correct.

Ok lets go with that then for now.

However we still need to figure out if the pause process really got killed in the CI env. I think one possibility is to patch the test to read cat $XDG_RUNTIME_DIR/libpod/tmp/pause.pid before and after each test and compare if the pid changed.

edsantiago · 2023-04-05T01:14:50Z

I've been playing with variations of --this --that in my PR, and just noted this one, f37 rootless:

Execing [...]/podman --storage-opt vfs.imagestore=/tmp/imagecachedir \
    --root /tmp/podman_test986753573/server_root \
     --tmpdir /tmp/podman_test986753573 \
     --runtime crun \
     --cgroup-manager systemd \
     --db-backend sqlite \
     --storage-driver vfs \
     rm -f top_RPXePqrY
  time="2023-04-04T19:59:20-05:00" level=error \
     msg="Unable to clean up network for container <id>: 
      \"unmounting network namespace for container <id>:
      failed to unmount NS: at /run/user/5575/netns/netns-e4831cc7-2e54-668d-84c1-a34e39804cf9:
     invalid argument\""
         top_RPXePqrY

Granted, for rootless I should've included --net-something /private-dir but I was lazy.

Anyhow, this was a push from tonight, and it includes #18024, so I was surprised to see that message. Quitting for the day.

Luap99 · 2023-04-05T09:36:14Z

I've been playing with variations of --this --that in my PR, and just noted this one, f37 rootless:
Execing [...]/podman --storage-opt vfs.imagestore=/tmp/imagecachedir \
    --root /tmp/podman_test986753573/server_root \
     --tmpdir /tmp/podman_test986753573 \
     --runtime crun \
     --cgroup-manager systemd \
     --db-backend sqlite \
     --storage-driver vfs \
     rm -f top_RPXePqrY
  time="2023-04-04T19:59:20-05:00" level=error \
     msg="Unable to clean up network for container <id>: 
      \"unmounting network namespace for container <id>:
      failed to unmount NS: at /run/user/5575/netns/netns-e4831cc7-2e54-668d-84c1-a34e39804cf9:
     invalid argument\""
         top_RPXePqrY
Granted, for rootless I should've included --net-something /private-dir but I was lazy.

Anyhow, this was a push from tonight, and it includes #18024, so I was surprised to see that message. Quitting for the day.

Note that #18024 made no attempt at fixing this issue, all it fixed was the regression I introduced with the c/common change, so only the error from #17903 (comment) was fixed.

Luap99 · 2023-04-05T10:02:24Z

Ok, --this --that was a good hint:
I now understand were it goes wrong, here is a reproducer:

First make sure you have no podman processes or containers running running, then kill the pause process to start from a clean system:

$ pkill "podman pause"
$ dir=$(mktemp -d)
$ podman --root $dir/root --runroot $dir/runroot --tmpdir $dir/tmp  run -d --name test alpine top
# Now this process will create a pause.pid file but not under $XDG_RUNTIME_DIR/libpod/tmp/pause.pid,
# instead it uses the --tmpdir value so $dir/tmp/pause.pid
# If you then use a regular podman command without the custom options:
$ podman run -d --name test alpine top
$ pgrep "podman pause"
18786
19198
# The normal command had no clue about the pause process so it created its own, so far no problem with that.
# The problem happens when we now use the command with the options again:
$ podman --root $dir/root --runroot $dir/runroot --tmpdir $dir/tmp  stop test
ERRO[0000] Unable to clean up network for container 6fc14126834157223df6318300affcce613b0a3acc0cdeec5e3e8df55a1335c1: "unmounting network namespace for container 6fc14126834157223df6318300affcce613b0a3acc0cdeec5e3e8df55a1335c1: failed to unmount NS: at /run/user/1000/netns/netns-34ab0c13-c7e0-6c3e-2f98-32d2f67bb4e1: invalid argument"

The bug here is that there is a shortcut in pkg/rootless/rootless_linux.c which is always run before any go code is run (including the option parsing). The c code just sees $XDG_RUNTIME_DIR/libpod/tmp/pause.pid and imminently joins this namespace from this process.
This shortcut is only there to join, so if the process does not exits it will do nothing and let podman handle the namespace and pause process creation.

So that is why you have to run the first time with --tmpdir before the pause process p pid existed at $XDG_RUNTIME_DIR/libpod/tmp/pause.pid . If you do it the other way around even the process with --tmpdir it would have joined the namespace via the shortcut so it would use the same one and not cause issues.
See the possibility for flakes here?

The fix for this is of course to not do the shortcut when we see --tmpdir so the podman go code can handle it, Note that we already special case other commands:

podman/pkg/rootless/rootless_linux.c

Lines 378 to 384 in ac1d297

    
           if (strcmp (argv[argc], "mount") == 0 
        
               || strcmp (argv[argc], "machine") == 0 
        
               || strcmp (argv[argc], "version") == 0 
        
               || strcmp (argv[argc], "context") == 0 
        
               || strcmp (argv[argc], "search") == 0 
        
               || (strcmp (argv[argc], "system") == 0 && argv[argc+1] && strcmp (argv[argc+1], "service") != 0)) 
        
             {

When using --tmpdir for the podman cli we use this as location for the pause.pid file. However the c shortcut code has no idea about this option and always assumes XDG_RUNTIME_DIR/libpod/tmp. This can cause us to join the wrong user namespace which leads to very weird issues as mounts are missing there. Fixes containers#17903 Signed-off-by: Paul Holzinger <[email protected]>

edsantiago · 2023-04-06T11:30:37Z

Slightly different symptom seen yesterday, f37 rootless:

  podman play kube --no-host
...
$ podman [options] exec <workerCID> cat /etc/hosts
Error: OCI runtime error: crun: fchown std stream 0: Bad file descriptor

That causes the test to fail. Then, in cleanup:

[AfterEach] Podman play kube
$ podman [options] pod rm -fa -t 0
time="2023-04-05T19:00:41-05:00" level=error msg="getting rootless network namespace:
          unknown FS magic on \"/run/user/6459/netns/rootless-netns-0e73abf2193b2cbf501b\": 1021994"
time="2023-04-05T19:00:41-05:00" level=error msg="Unable to clean up network for container <infraCID>:
          \"unmounting network namespace for container <infraCID>:
          failed to unmount NS: at /run/user/6459/netns/netns-b2c8e3d5-a67a-913a-7c99-0e32a0092781: invalid argument\""

I'm assiging that flake to this issue.

Luap99 · 2023-04-11T16:54:50Z

Still happening:
#18083 (comment)

edsantiago · 2023-04-11T17:07:46Z

Locks are dangerous of course, but isn't there some way that podman system reset tests can be wrapped inside a lock? That command is too dangerous to run in parallel, even with every possible --root --tmpdir --networkdir --everything option.

Luap99 · 2023-04-12T12:12:32Z

I only see two callers of system reset in the e2e test. I think we can convert them into system tests to fix this issue.
Then make sure nobody ever adds new system migrate or reset tests into e2e.

Luap99 · 2023-04-12T14:24:01Z

Ok new plan, my goal is to update to ginkgo v2 (#18163) so we can use decorators to mark a single test as serial.
https://onsi.github.io/ginkgo/#serial-specs

edsantiago added the flakes Flakes from Continuous Integration label Mar 23, 2023

Luap99 self-assigned this Mar 24, 2023

Luap99 mentioned this issue Mar 24, 2023

pkg/netns: NewNS() check if file name already exists containers/common#1381

Merged

edsantiago added the kind/bug Categorizes issue or PR as related to a bug. label Mar 28, 2023

Luap99 mentioned this issue Apr 3, 2023

rootless netns: recover from invalid netns #18024

Merged

Luap99 mentioned this issue Apr 5, 2023

WIP: pkg/rootless: do not use shortcut with --tmpdir #18057

Closed

Luap99 mentioned this issue Apr 11, 2023

rootless: make sure we only use a single pause process #18083

Merged

openshift-merge-robot closed this as completed in #18083 Apr 11, 2023

Luap99 reopened this Apr 11, 2023

Luap99 mentioned this issue Apr 12, 2023

test/e2e/systemd_activate_test.go: simplify test #18056

Merged

Luap99 mentioned this issue Apr 17, 2023

update to ginkgo v2 #18163

Merged

edsantiago mentioned this issue May 2, 2023

executable file /catatonit [et al] not found in $PATH: No such file or directory #17042

Closed

openshift-merge-robot closed this as completed in 054154c May 2, 2023

Luap99 mentioned this issue May 5, 2023

exec: crun: setns(pid=87793, CLONE_SOMETHING): Operation not permitted: OCI permission denied containers/crun#1264

Closed

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Aug 25, 2023

github-actions bot locked as resolved and limited conversation to collaborators Aug 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pod stats: unknown FS magic on "/run/user/4902/netns/netns-etc-etc" #17903

pod stats: unknown FS magic on "/run/user/4902/netns/netns-etc-etc" #17903

edsantiago commented Mar 23, 2023

Luap99 commented Mar 24, 2023

Luap99 commented Mar 24, 2023

edsantiago commented Mar 28, 2023

Luap99 commented Mar 28, 2023

edsantiago commented Mar 28, 2023

edsantiago commented Mar 28, 2023

edsantiago commented Apr 3, 2023

Luap99 commented Apr 3, 2023

Luap99 commented Apr 3, 2023

edsantiago commented Apr 3, 2023

Luap99 commented Apr 3, 2023

Luap99 commented Apr 3, 2023

edsantiago commented Apr 3, 2023

Luap99 commented Apr 3, 2023

Luap99 commented Apr 3, 2023

Luap99 commented Apr 4, 2023

edsantiago commented Apr 5, 2023

Luap99 commented Apr 5, 2023

Luap99 commented Apr 5, 2023

edsantiago commented Apr 6, 2023

Luap99 commented Apr 11, 2023

edsantiago commented Apr 11, 2023

Luap99 commented Apr 12, 2023

Luap99 commented Apr 12, 2023

pod stats: unknown FS magic on "/run/user/4902/netns/netns-etc-etc" #17903

pod stats: unknown FS magic on "/run/user/4902/netns/netns-etc-etc" #17903

Comments

edsantiago commented Mar 23, 2023

Luap99 commented Mar 24, 2023

Luap99 commented Mar 24, 2023

edsantiago commented Mar 28, 2023

Luap99 commented Mar 28, 2023

edsantiago commented Mar 28, 2023

edsantiago commented Mar 28, 2023

edsantiago commented Apr 3, 2023

Luap99 commented Apr 3, 2023

Luap99 commented Apr 3, 2023

edsantiago commented Apr 3, 2023

Luap99 commented Apr 3, 2023

Luap99 commented Apr 3, 2023

edsantiago commented Apr 3, 2023

Luap99 commented Apr 3, 2023

Luap99 commented Apr 3, 2023

Luap99 commented Apr 4, 2023

edsantiago commented Apr 5, 2023

Luap99 commented Apr 5, 2023

Luap99 commented Apr 5, 2023

edsantiago commented Apr 6, 2023

Luap99 commented Apr 11, 2023

edsantiago commented Apr 11, 2023

Luap99 commented Apr 12, 2023

Luap99 commented Apr 12, 2023