-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
podman-in-podman: Error: timed out waiting for file: internal libpod error #13227
Comments
we hit this lot in Testing Farm here (Fedora CI, RHEL CI and CentOS Stream CI backend) for our Currently, we run rootless https://gitlab.com/testing-farm/gluetool-modules/-/blob/testing-farm/container/Dockerfile It is based on https://github.com/thrix/podman/blob/main/contrib/podmanimage/stable/Dockerfile.centos8 The privileged podman has:
I am trying to get some reasonable, more handy reproducer for the problem. |
The error "timed out waiting for file" seems to come from Lines 51 to 96 in 38b19c1
|
The reason for the error is that the expected exit file (written by Conmon) isn't materializing where we expect it to be. Impossible to say more about this without context - I'm not even sure if this is an exec session or a container causing the errors. More complete logs or, ideally, a reproducer would help. |
No luck reproducing this manually (at least for now) ... I believe the log has to come from the inside the first podman. This is the log from there:
you can
Sorry it is a bit dense, trying hard to reproduce it by hand :( Host podman is Host podman is run like this: https://gitlab.com/testing-farm/infrastructure/-/blob/testing-farm/ansible/files/citool-container.sh#L36 |
@mheon @edsantiago so I found a minimal reproducer. But I cannot hit the same problem on my localhost, just on our With these commands I have no problems to hit the issue always (randomly on the last
|
@rhatdan You're the container-in-container expert - any ideas? |
Thank you Miro! I can easily reproduce with podman 4.x on rawhide. Two points:
# for i in {1..100}; do podman exec container cat /testdir/testfile || break; sleep 2; done
Hardened: etc etc
...
Error: timed out waiting for file /var/lib/containers/storage/overlay-containers/cc02377887642a48f7b8565b242168d2fafe2ae63d227cf7a8bb2ed55331cdc0/userdata/b4a5f0d138b48ac800ca27a30f844718c0fb67f9a3f70854ac65e5d8001492ff/exit/cc02377887642a48f7b8565b242168d2fafe2ae63d227cf7a8bb2ed55331cdc0: internal libpod error Without the break, output scrolls by too fast to notice. Note that this is 100% a tty I/O problem: redirecting the |
@edsantiago sorry forgot the good old |
@edsantiago glad you reproduced it, we are a step closer :) |
@containers/podman-maintainers this is really blowing up and causing problems everywhere, e.g. https://status.testing-farm.io/issues/2022-02-08-libpod-error/ How can we escalate this? |
Also running into this irregularity but frequently in our CI. We are not running "podman in podman", but only a single rootless podman layer. Let me know if any further details would be helpful here! Podman compiled from source (as it is not available through apt for ubuntu 20.04):
|
Can't reproduce on my laptop. Are the Echos and the |
Getting access to an environment that does reproduce would be useful. I strongly suspect this is tied to disk IO somehow, perhaps Conmon being starved of it? But just reading a large text file repeatedly shouldn't be enough to do that. |
Well, I can't reproduce it in today's 1minutetip environment. I have no idea what changed between Feb 28 (when I reproduced it) and today. @mheon see my Feb 28 note for my suspicions about stdout, not disk I/O. |
@edsantiago Where are you redirecting? Conmon should have no relation to TTY, so my assumption is that it's the process of writing the logs for the exec session that's causing it to lag and fail to create the exit file in a timely fashion (or somehow hang?). If you're redirecting the output on the host system, I don't see how that could affect Conmon. |
Every exec session run attached will, on exit, do two things: it will signal the associated `podman exec` that it is finished (to allow Podman to collect the exit code and exit), and spawn a cleanup process to clean up the exec session (in case the `podman exec` process died, we still need to clean up). If an exec session is created that exits almost instantly, but generates a large amount of output (e.g. prints thousands of lines), the cleanup process can potentially execute before `podman exec` has a chance to read the exit code, resulting in errors. Handle this by detecting if the cleanup process has already removed the exec session before handling the error from reading the exec exit code. [NO NEW TESTS NEEDED] I have no idea how to test this in CI. Fixes containers#13227 Signed-off-by: Matthew Heon <[email protected]>
Every exec session run attached will, on exit, do two things: it will signal the associated `podman exec` that it is finished (to allow Podman to collect the exit code and exit), and spawn a cleanup process to clean up the exec session (in case the `podman exec` process died, we still need to clean up). If an exec session is created that exits almost instantly, but generates a large amount of output (e.g. prints thousands of lines), the cleanup process can potentially execute before `podman exec` has a chance to read the exit code, resulting in errors. Handle this by detecting if the cleanup process has already removed the exec session before handling the error from reading the exec exit code. [NO NEW TESTS NEEDED] I have no idea how to test this in CI. Fixes containers#13227 Signed-off-by: Matthew Heon <[email protected]>
Example where this would be needed: ``` docker rm -f 4fcd7c2012b6 Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg. time="2022-08-09T17:22:11Z" level=error msg="container \"4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91\" does not exist" Error: cannot remove container 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91 as it could not be stopped: timed out waiting for file /tmp/podman-run-1111/libpod/tmp/exits/4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91: internal libpod error docker rm -f 4fcd7c2012b6 Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg. ERRO[0000] Joining network namespace for container 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91: error retrieving network namespace at /tmp/podman-run-1111/netns/netns-600d6718-1a5e-2fed-bf39-ad7048004be2: unknown FS magic on "/tmp/podman-run-1111/netns/netns-600d6718-1a5e-2fed-bf39-ad7048004be2": ef53 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91 ``` See containers/podman#13227 This also makes the function fail fatally. Signed-off-by: David Galloway <[email protected]>
Example where this would be needed: ``` docker rm -f 4fcd7c2012b6 Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg. time="2022-08-09T17:22:11Z" level=error msg="container \"4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91\" does not exist" Error: cannot remove container 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91 as it could not be stopped: timed out waiting for file /tmp/podman-run-1111/libpod/tmp/exits/4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91: internal libpod error docker rm -f 4fcd7c2012b6 Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg. ERRO[0000] Joining network namespace for container 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91: error retrieving network namespace at /tmp/podman-run-1111/netns/netns-600d6718-1a5e-2fed-bf39-ad7048004be2: unknown FS magic on "/tmp/podman-run-1111/netns/netns-600d6718-1a5e-2fed-bf39-ad7048004be2": ef53 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91 ``` See containers/podman#13227 This also makes the function fail fatally. Signed-off-by: David Galloway <[email protected]>
Example where this would be needed: ``` docker rm -f 4fcd7c2012b6 Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg. time="2022-08-09T17:22:11Z" level=error msg="container \"4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91\" does not exist" Error: cannot remove container 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91 as it could not be stopped: timed out waiting for file /tmp/podman-run-1111/libpod/tmp/exits/4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91: internal libpod error docker rm -f 4fcd7c2012b6 Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg. ERRO[0000] Joining network namespace for container 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91: error retrieving network namespace at /tmp/podman-run-1111/netns/netns-600d6718-1a5e-2fed-bf39-ad7048004be2: unknown FS magic on "/tmp/podman-run-1111/netns/netns-600d6718-1a5e-2fed-bf39-ad7048004be2": ef53 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91 ``` See containers/podman#13227 This also makes the function fail fatally. Signed-off-by: David Galloway <[email protected]>
Example where this would be needed: ``` docker rm -f 4fcd7c2012b6 Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg. time="2022-08-09T17:22:11Z" level=error msg="container \"4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91\" does not exist" Error: cannot remove container 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91 as it could not be stopped: timed out waiting for file /tmp/podman-run-1111/libpod/tmp/exits/4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91: internal libpod error docker rm -f 4fcd7c2012b6 Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg. ERRO[0000] Joining network namespace for container 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91: error retrieving network namespace at /tmp/podman-run-1111/netns/netns-600d6718-1a5e-2fed-bf39-ad7048004be2: unknown FS magic on "/tmp/podman-run-1111/netns/netns-600d6718-1a5e-2fed-bf39-ad7048004be2": ef53 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91 ``` See containers/podman#13227 This also makes the function fail fatally. Signed-off-by: David Galloway <[email protected]>
Example where this would be needed: ``` docker rm -f 4fcd7c2012b6 Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg. time="2022-08-09T17:22:11Z" level=error msg="container \"4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91\" does not exist" Error: cannot remove container 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91 as it could not be stopped: timed out waiting for file /tmp/podman-run-1111/libpod/tmp/exits/4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91: internal libpod error docker rm -f 4fcd7c2012b6 Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg. ERRO[0000] Joining network namespace for container 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91: error retrieving network namespace at /tmp/podman-run-1111/netns/netns-600d6718-1a5e-2fed-bf39-ad7048004be2: unknown FS magic on "/tmp/podman-run-1111/netns/netns-600d6718-1a5e-2fed-bf39-ad7048004be2": ef53 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91 ``` See containers/podman#13227 This also makes the function fail fatally. Signed-off-by: David Galloway <[email protected]>
Example where this would be needed: ``` docker rm -f 4fcd7c2012b6 Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg. time="2022-08-09T17:22:11Z" level=error msg="container \"4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91\" does not exist" Error: cannot remove container 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91 as it could not be stopped: timed out waiting for file /tmp/podman-run-1111/libpod/tmp/exits/4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91: internal libpod error docker rm -f 4fcd7c2012b6 Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg. ERRO[0000] Joining network namespace for container 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91: error retrieving network namespace at /tmp/podman-run-1111/netns/netns-600d6718-1a5e-2fed-bf39-ad7048004be2: unknown FS magic on "/tmp/podman-run-1111/netns/netns-600d6718-1a5e-2fed-bf39-ad7048004be2": ef53 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91 ``` See containers/podman#13227 This also makes the function fail fatally. Signed-off-by: David Galloway <[email protected]>
Example where this would be needed: ``` docker rm -f 4fcd7c2012b6 Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg. time="2022-08-09T17:22:11Z" level=error msg="container \"4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91\" does not exist" Error: cannot remove container 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91 as it could not be stopped: timed out waiting for file /tmp/podman-run-1111/libpod/tmp/exits/4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91: internal libpod error docker rm -f 4fcd7c2012b6 Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg. ERRO[0000] Joining network namespace for container 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91: error retrieving network namespace at /tmp/podman-run-1111/netns/netns-600d6718-1a5e-2fed-bf39-ad7048004be2: unknown FS magic on "/tmp/podman-run-1111/netns/netns-600d6718-1a5e-2fed-bf39-ad7048004be2": ef53 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91 ``` See containers/podman#13227 This also makes the function fail fatally. Signed-off-by: David Galloway <[email protected]>
I've managed to consistently reproduce the issue by writing a lengthy stream of data (think in gigabytes) via
I've noticed that if you don't overwhelm the output, podman seems to deal with lengthy streams just fine. |
Fedora CI is failing with:
Miro Vadkerti pinged me about it last week but has not yet filed an issue because it's hard to reproduce. All I know is, this is podman-in-podman.
The text was updated successfully, but these errors were encountered: