Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman-in-podman: Error: timed out waiting for file: internal libpod error #13227

Closed
edsantiago opened this issue Feb 14, 2022 · 16 comments · Fixed by #13600
Closed

podman-in-podman: Error: timed out waiting for file: internal libpod error #13227

edsantiago opened this issue Feb 14, 2022 · 16 comments · Fixed by #13600
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@edsantiago
Copy link
Member

Fedora CI is failing with:

Error: timed out waiting for file /var/lib/containers/storage/overlay-containers/SHA1/userdata/SHA2/exit/SHA1: internal libpod error

Miro Vadkerti pinged me about it last week but has not yet filed an issue because it's hard to reproduce. All I know is, this is podman-in-podman.

  2022-02-08 18:06:06 mvadkert    esm: it happens in a quite a lot of runs in CI, I will try to get a 
  reproducer tmrw
  2022-02-08 18:06:13 mvadkert    esm: our host runs
  2022-02-08 18:07:39 mvadkert    (me looking it up)
  2022-02-08 18:09:02 mvadkert    podman-3.4.4-1.fc35.x86_64
  2022-02-08 18:09:22 esm mvadkert, which version of crun?
  2022-02-08 18:09:25 esm https://github.com/containers/podman/issues/12262
  2022-02-08 18:09:27 mvadkert    and in container we we have podman-3.4.1-3.module_el8.6.0+954+963caf
  36.x86_64
  2022-02-08 18:09:38 mvadkert    crun-1.4-1.fc35.x86_64
  2022-02-08 18:09:41 esm oh no, this is podman-in-podman??
  2022-02-08 18:09:47 mvadkert    esm: yeah
  2022-02-08 18:09:51 mvadkert    esm: crun-1.2-1.module_el8.6.0+954+963caf36.x86_64
  2022-02-08 18:09:57 mvadkert    fc35 is the main host
  2022-02-08 18:10:06 mvadkert    in it we rune centos8 image
  2022-02-08 18:10:09 mvadkert    with podman
  2022-02-08 18:10:24 mvadkert    image is based on podman stable, but it is build on centos 8 stream 
  2022-02-08 18:10:38 mvadkert    esm: it is rootless podman in privileged podman
  2022-02-08 18:10:44 mvadkert    esm: but the problem seems random
  2022-02-08 18:11:10 mvadkert    esm: or test tools runs podman start and then podman exec to run ins
  ide the container commands
  2022-02-08 18:11:17 mvadkert    esm: anwyay, I will try to find some reproducer
  2022-02-08 18:11:24 mvadkert    it is quite common latel y :(
  2022-02-08 18:12:09 esm check conmon maybe?
  2022-02-08 18:12:28 mvadkert    aconmon-2.0.30-2.fc35.x86_64
  2022-02-08 18:12:36 ✭ esm doesn't even want to think about what time it is in cz
  2022-02-08 18:12:37 mvadkert    conmon-2.0.30-1.module_el8.6.0+944+d413f95e.x86_64
  2022-02-08 18:12:46 mvadkert    esm: yeah, I am not really here :)
  2022-02-08 18:12:48 esm oh, okay, that looks good
  2022-02-08 18:12:57 mvadkert    esm: will try to find some reproducer and I guess file it :)
  2022-02-08 18:13:10 mvadkert    esm: the erorrs are hidden a bit inside test failures
  2022-02-08 18:13:18 mvadkert    esm: so we did not spot it early enough
@edsantiago edsantiago added the flakes Flakes from Continuous Integration label Feb 14, 2022
@thrix
Copy link

thrix commented Feb 14, 2022

we hit this lot in Testing Farm here (Fedora CI, RHEL CI and CentOS Stream CI backend) for our rpminspect runs:

Currently, we run rootless podman in privilged podman. The privileged podman container is:

https://gitlab.com/testing-farm/gluetool-modules/-/blob/testing-farm/container/Dockerfile
quay.io/testing-farm/gluetool-modules:testing-farm

It is based on centos:stream8 image (in fork of podman):

https://github.com/thrix/podman/blob/main/contrib/podmanimage/stable/Dockerfile.centos8

The privileged podman has:

$ podman run --entrypoint rpm quay.io/testing-farm/gluetool-modules:testing-farm -q podman conmon
podman-3.4.1-3.module_el8.6.0+954+963caf36.x86_64
conmon-2.0.30-1.module_el8.6.0+944+d413f95e.x86_64

I am trying to get some reasonable, more handy reproducer for the problem.

@edsantiago
Copy link
Member Author

The error "timed out waiting for file" seems to come from utils.go:WaitForFile(), here:

podman/libpod/util.go

Lines 51 to 96 in 38b19c1

func WaitForFile(path string, chWait chan error, timeout time.Duration) (bool, error) {
var inotifyEvents chan fsnotify.Event
watcher, err := fsnotify.NewWatcher()
if err == nil {
if err := watcher.Add(filepath.Dir(path)); err == nil {
inotifyEvents = watcher.Events
}
defer watcher.Close()
defer watcher.Remove(filepath.Dir(path))
}
var timeoutChan <-chan time.Time
if timeout != 0 {
timeoutChan = time.After(timeout)
}
for {
select {
case e := <-chWait:
return true, e
case <-inotifyEvents:
_, err := os.Stat(path)
if err == nil {
return false, nil
}
if !os.IsNotExist(err) {
return false, err
}
case <-time.After(25 * time.Millisecond):
// Check periodically for the file existence. It is needed
// if the inotify watcher could not have been created. It is
// also useful when using inotify as if for any reasons we missed
// a notification, we won't hang the process.
_, err := os.Stat(path)
if err == nil {
return false, nil
}
if !os.IsNotExist(err) {
return false, err
}
case <-timeoutChan:
return false, errors.Wrapf(define.ErrInternal, "timed out waiting for file %s", path)
}
}
}

@mheon
Copy link
Member

mheon commented Feb 14, 2022

The reason for the error is that the expected exit file (written by Conmon) isn't materializing where we expect it to be. Impossible to say more about this without context - I'm not even sure if this is an exec session or a container causing the errors. More complete logs or, ideally, a reproducer would help.

@thrix
Copy link

thrix commented Feb 15, 2022

No luck reproducing this manually (at least for now) ... I believe the log has to come from the inside the first podman.

This is the log from there:

curl -LO http://artifacts.dev.testing-farm.io/0a5b9787-5896-42fc-a26e-b5a460b9b2c1/work-rpminspectIX9Qw4/log.txt

you can | egrep '(podman|libpod)' on that file to see a bit more details on how we run podman (inside podman):

134:10:48:24             Run command 'podman run --security-opt seccomp=unconfined --name tmt-var-ARTIFACTS-work-rpminspectIX9Qw4-rpminspect -v /var/ARTIFACTS/work-rpminspectIX9Qw4/rpminspect:/var/ARTIFACTS/work-rpminspectIX9Qw4/rpminspect:Z -itd quay.io/fedoraci/rpminspect:87c9e66'.
138:10:48:24             Run command 'podman exec tmt-var-ARTIFACTS-work-rpminspectIX9Qw4-rpminspect sh -c export PREVIOUS_TAG=f36-updates TASK_ID=82763548 RPMINSPECT_PROFILE_NAME='' REPOSITORY_URL=https://src.fedoraproject.org/rpms/gtk2hs-buildtools CONFIG_BRANCH=f36 DEFAULT_RELEASE_STRING=fc36 GIT_COMMIT=fc55348c423887d3ff46f89adcf4a4b9b2ab2668 DEBUG=False TMT_PLAN_DATA=/var/ARTIFACTS/work-rpminspectIX9Qw4/rpminspect/data; cat /etc/os-release'.
162:10:48:24             Run command 'podman exec tmt-var-ARTIFACTS-work-rpminspectIX9Qw4-rpminspect sh -c export PREVIOUS_TAG=f36-updates TASK_ID=82763548 RPMINSPECT_PROFILE_NAME='' REPOSITORY_URL=https://src.fedoraproject.org/rpms/gtk2hs-buildtools CONFIG_BRANCH=f36 DEFAULT_RELEASE_STRING=fc36 GIT_COMMIT=fc55348c423887d3ff46f89adcf4a4b9b2ab2668 DEBUG=False TMT_PLAN_DATA=/var/ARTIFACTS/work-rpminspectIX9Qw4/rpminspect/data; uname -r'.

.....

26016:10:50:47             Run command 'podman exec tmt-var-ARTIFACTS-work-rpminspectIX9Qw4-rpminspect sh -c cd '/var/ARTIFACTS/work-rpminspectIX9Qw4/rpminspect/discover/default/tests'; export PREVIOUS_TAG=f36-updates TASK_ID=82763548 RPMINSPECT_PROFILE_NAME='' REPOSITORY_URL=https://src.fedoraproject.org/rpms/gtk2hs-buildtools CONFIG_BRANCH=f36 DEFAULT_RELEASE_STRING=fc36 GIT_COMMIT=fc55348c423887d3ff46f89adcf4a4b9b2ab2668 DEBUG=False TMT_PLAN_DATA=/var/ARTIFACTS/work-rpminspectIX9Qw4/rpminspect/data TMT_TREE=/var/ARTIFACTS/work-rpminspectIX9Qw4/rpminspect/tree TMT_TEST_DATA=/var/ARTIFACTS/work-rpminspectIX9Qw4/rpminspect/execute/data/rpmdeps/data TMT_REBOOT_COUNT=0 REBOOTCOUNT=0 RSTRNT_REBOOTCOUNT=0; set -eo pipefail; /usr/local/bin/rpminspect_runner.sh 82763548 f36-updates rpmdeps'.
29657:10:50:52                 out: Error: timed out waiting for file /var/lib/containers/storage/overlay-containers/720faf9892ed8dfa4f71351601206760ea5df049819f0e9e35d7b994d02f3a80/userdata/40651a01b77a59aa87c8a4e7a3cb99d14102836eaa3951b4e9b82a05b67747db/exit/720faf9892ed8dfa4f71351601206760ea5df049819f0e9e35d7b994d02f3a80: internal libpod error
29666:10:50:52             Run command 'podman exec tmt-var-ARTIFACTS-work-rpminspectIX9Qw4-rpminspect sh -c cd '/var/ARTIFACTS/work-rpminspectIX9Qw4/rpminspect/discover/default/tests'; export PREVIOUS_TAG=f36-updates TASK_ID=82763548 RPMINSPECT_PROFILE_NAME='' REPOSITORY_URL=https://src.fedoraproject.org/rpms/gtk2hs-buildtools CONFIG_BRANCH=f36 DEFAULT_RELEASE_STRING=fc36 GIT_COMMIT=fc55348c423887d3ff46f89adcf4a4b9b2ab2668 DEBUG=False TMT_PLAN_DATA=/var/ARTIFACTS/work-rpminspectIX9Qw4/rpminspect/data TMT_TREE=/var/ARTIFACTS/work-rpminspectIX9Qw4/rpminspect/tree TMT_TEST_DATA=/var/ARTIFACTS/work-rpminspectIX9Qw4/rpminspect/execute/data/diagnostics/data TMT_REBOOT_COUNT=0 REBOOTCOUNT=0 RSTRNT_REBOOTCOUNT=0; set -eo pipefail; /usr/local/bin/rpminspect_runner.sh 82763548 f36-updates diagnostics'.
29748:10:50:53             Run command 'podman exec tmt-var-ARTIFACTS-work-rpminspectIX9Qw4-rpminspect sh -c export PREVIOUS_TAG=f36-updates TASK_ID=82763548 RPMINSPECT_PROFILE_NAME='' REPOSITORY_URL=https://src.fedoraproject.org/rpms/gtk2hs-buildtools CONFIG_BRANCH=f36 DEFAULT_RELEASE_STRING=fc36 GIT_COMMIT=fc55348c423887d3ff46f89adcf4a4b9b2ab2668 DEBUG=False TMT_PLAN_DATA=/var/ARTIFACTS/work-rpminspectIX9Qw4/rpminspect/data; rm "/usr/local/bin/rstrnt-reboot"'.
29750:10:50:53             Run command 'podman exec tmt-var-ARTIFACTS-work-rpminspectIX9Qw4-rpminspect sh -c export PREVIOUS_TAG=f36-updates TASK_ID=82763548 RPMINSPECT_PROFILE_NAME='' REPOSITORY_URL=https://src.fedoraproject.org/rpms/gtk2hs-buildtools CONFIG_BRANCH=f36 DEFAULT_RELEASE_STRING=fc36 GIT_COMMIT=fc55348c423887d3ff46f89adcf4a4b9b2ab2668 DEBUG=False TMT_PLAN_DATA=/var/ARTIFACTS/work-rpminspectIX9Qw4/rpminspect/data; rm "/usr/local/bin/rhts-reboot"'.
29752:10:50:53             Run command 'podman exec tmt-var-ARTIFACTS-work-rpminspectIX9Qw4-rpminspect sh -c export PREVIOUS_TAG=f36-updates TASK_ID=82763548 RPMINSPECT_PROFILE_NAME='' REPOSITORY_URL=https://src.fedoraproject.org/rpms/gtk2hs-buildtools CONFIG_BRANCH=f36 DEFAULT_RELEASE_STRING=fc36 GIT_COMMIT=fc55348c423887d3ff46f89adcf4a4b9b2ab2668 DEBUG=False TMT_PLAN_DATA=/var/ARTIFACTS/work-rpminspectIX9Qw4/rpminspect/data; rm "/usr/local/bin/tmt-reboot"'.
29813:10:50:53             Run command 'podman container stop tmt-var-ARTIFACTS-work-rpminspectIX9Qw4-rpminspect'.
29817:10:51:04             Run command 'podman container rm -f tmt-var-ARTIFACTS-work-rpminspectIX9Qw4-rpminspect'.

Sorry it is a bit dense, trying hard to reproduce it by hand :(

Host podman is podman-3.4.4-1.fc35.x86_64, in it we run podman-3.4.1-3.module_el8.6.0+954+963caf36.x86_64

Host podman is run like this: https://gitlab.com/testing-farm/infrastructure/-/blob/testing-farm/ansible/files/citool-container.sh#L36

@thrix
Copy link

thrix commented Feb 25, 2022

@mheon @edsantiago so I found a minimal reproducer. But I cannot hit the same problem on my localhost, just on our i3.2xlarge AWS workers (I can give you access for debugging).

With these commands I have no problems to hit the issue always (randomly on the last podman exec:

# rpm -q podman
podman-3.4.4-1.fc35.x86_64

# podman run --rm --privileged -it --entrypoint=bash quay.io/testing-farm/podman

# rpm -q podman
podman-3.4.1-3.module_el8.6.0+954+963caf36.x86_64

# podman run --name container -v $(pwd):/testdir:Z -itd quay.io/fedoraci/rpminspect:87c9e66

# for i in {1..10000}; do echo "Hardened: /usr/sbin/audispd-zos-remote: Overall: PASS." >> testfile; done    

# for i in {1..100}; do podman exec container cat /testdir/testfile; sleep 2; done 
(look on output and observe occasionally hitting)
Error: timed out waiting for file /var/lib/containers/storage/overlay-containers/60fe53118e2bf33f4f70c109babb905573417c007d91c1c943e0b8bb1572e125/userdata/544dc5fbec571fa02e9d24a9caf42b07b61c0c5e52ade45c3c6ef1c3fcc05755/exit/60fe53118e2bf33f4f70c109babb905573417c007d91c1c943e0b8bb1572e125: internal libpod error

@mheon
Copy link
Member

mheon commented Feb 25, 2022

@rhatdan You're the container-in-container expert - any ideas?

@edsantiago
Copy link
Member Author

Thank you Miro! I can easily reproduce with podman 4.x on rawhide. Two points:

  • The prompts in the reproducer above are confusing: they should be hash (#), not dollar
  • To make it much much much easier to see the problem, add || break in the last command:
# for i in {1..100}; do podman exec container cat /testdir/testfile || break; sleep 2; done
Hardened: etc etc
...
Error: timed out waiting for file /var/lib/containers/storage/overlay-containers/cc02377887642a48f7b8565b242168d2fafe2ae63d227cf7a8bb2ed55331cdc0/userdata/b4a5f0d138b48ac800ca27a30f844718c0fb67f9a3f70854ac65e5d8001492ff/exit/cc02377887642a48f7b8565b242168d2fafe2ae63d227cf7a8bb2ed55331cdc0: internal libpod error

Without the break, output scrolls by too fast to notice.

Note that this is 100% a tty I/O problem: redirecting the cat makes the problem go away. This might actually be super-reproducible on my end because I am ssh'ed in over a relatively slow (10Mbps) DSL connection. If you have trouble reproducing it, try some sort of bandwidth-throttling tool.

@thrix
Copy link

thrix commented Feb 28, 2022

@edsantiago sorry forgot the good old root / non-root PS1 indication, fixed :)

@thrix
Copy link

thrix commented Feb 28, 2022

@edsantiago glad you reproduced it, we are a step closer :)

@edsantiago
Copy link
Member Author

@containers/podman-maintainers this is really blowing up and causing problems everywhere, e.g. https://status.testing-farm.io/issues/2022-02-08-libpod-error/

How can we escalate this?

@ahoereth
Copy link

ahoereth commented Mar 15, 2022

Also running into this irregularity but frequently in our CI. We are not running "podman in podman", but only a single rootless podman layer. Let me know if any further details would be helpful here!

Podman compiled from source (as it is not available through apt for ubuntu 20.04):

# podman info
host:
  arch: amd64
  buildahVersion: 1.24.1
  cgroupControllers: []
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: 'conmon: /usr/libexec/podman/conmon'
    path: /usr/libexec/podman/conmon
    version: 'conmon version 2.0.30, commit: '
  cpus: 16
  distribution:
    codename: focal
    distribution: ubuntu
    version: "20.04"
  eventLogger: file
  hostname: ...
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.4.0-90-generic
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 53768282112
  memTotal: 134997995520
  networkBackend: cni
  ociRuntime:
    name: crun
    package: 'crun: /usr/bin/crun'
    path: /usr/bin/crun
    version: |-
      crun version UNKNOWN
      commit: ea1fe3938eefa14eb707f1d22adff4db670645d6
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: 'slirp4netns: /usr/bin/slirp4netns'
    version: |-
      slirp4netns version 1.1.8
      commit: unknown
      libslirp: 4.3.1-git
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.4.3
  swapFree: 1847644160
  swapTotal: 4289720320
  uptime: 450h 23m 34.02s (Approximately 18.75 days)
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: .../.config/containers/storage.conf
  containerStore:
    number: 9
    paused: 0
    running: 7
    stopped: 2
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: 'fuse-overlayfs: /usr/bin/fuse-overlayfs'
      Version: |-
        fusermount3 version: 3.9.0
        fuse-overlayfs: version 1.5
        FUSE library version 3.9.0
        using FUSE kernel interface version 7.31
  graphRoot: .../.local/share/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 63
  runRoot: /run/user/1000/containers
  volumePath: .../.local/share/containers/storage/volumes
version:
  APIVersion: 4.0.2
  Built: 1647337134
  BuiltTime: Tue Mar 15 10:38:54 2022
  GitCommit: 342c8259381b63296e96ad29519bd4b9c7afbf97-dirty
  GoVersion: devel go1.19-81767e23c2 Fri Mar 4 01:18:57 2022 +0000
  OsArch: linux/amd64
  Version: 4.0.2

@mheon
Copy link
Member

mheon commented Mar 21, 2022

Can't reproduce on my laptop. Are the Echos and the podman execs meant to be going on at the same time, to cause a lot of disk IO?

@mheon
Copy link
Member

mheon commented Mar 21, 2022

Getting access to an environment that does reproduce would be useful. I strongly suspect this is tied to disk IO somehow, perhaps Conmon being starved of it? But just reading a large text file repeatedly shouldn't be enough to do that.

@edsantiago
Copy link
Member Author

Well, I can't reproduce it in today's 1minutetip environment. I have no idea what changed between Feb 28 (when I reproduced it) and today.

@mheon see my Feb 28 note for my suspicions about stdout, not disk I/O.

@mheon
Copy link
Member

mheon commented Mar 21, 2022

@edsantiago Where are you redirecting? Conmon should have no relation to TTY, so my assumption is that it's the process of writing the logs for the exec session that's causing it to lag and fail to create the exit file in a timely fashion (or somehow hang?). If you're redirecting the output on the host system, I don't see how that could affect Conmon.

mheon added a commit to mheon/libpod that referenced this issue Mar 23, 2022
Every exec session run attached will, on exit, do two things: it
will signal the associated `podman exec` that it is finished (to
allow Podman to collect the exit code and exit), and spawn a
cleanup process to clean up the exec session (in case the `podman
exec` process died, we still need to clean up). If an exec
session is created that exits almost instantly, but generates a
large amount of output (e.g. prints thousands of lines), the
cleanup process can potentially execute before `podman exec` has
a chance to read the exit code, resulting in errors. Handle this
by detecting if the cleanup process has already removed the exec
session before handling the error from reading the exec exit
code.

[NO NEW TESTS NEEDED] I have no idea how to test this in CI.

Fixes containers#13227

Signed-off-by: Matthew Heon <[email protected]>
mheon added a commit to mheon/libpod that referenced this issue Mar 30, 2022
Every exec session run attached will, on exit, do two things: it
will signal the associated `podman exec` that it is finished (to
allow Podman to collect the exit code and exit), and spawn a
cleanup process to clean up the exec session (in case the `podman
exec` process died, we still need to clean up). If an exec
session is created that exits almost instantly, but generates a
large amount of output (e.g. prints thousands of lines), the
cleanup process can potentially execute before `podman exec` has
a chance to read the exit code, resulting in errors. Handle this
by detecting if the cleanup process has already removed the exec
session before handling the error from reading the exec exit
code.

[NO NEW TESTS NEEDED] I have no idea how to test this in CI.

Fixes containers#13227

Signed-off-by: Matthew Heon <[email protected]>
djgalloway pushed a commit to ceph/ceph-container that referenced this issue Aug 9, 2022
Example where this would be needed:
```
docker rm -f 4fcd7c2012b6
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
time="2022-08-09T17:22:11Z" level=error msg="container \"4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91\" does not exist"

Error: cannot remove container 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91 as it could not be stopped: timed out waiting for file /tmp/podman-run-1111/libpod/tmp/exits/4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91: internal libpod error
docker rm -f 4fcd7c2012b6
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
ERRO[0000] Joining network namespace for container 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91: error retrieving network namespace at /tmp/podman-run-1111/netns/netns-600d6718-1a5e-2fed-bf39-ad7048004be2: unknown FS magic on "/tmp/podman-run-1111/netns/netns-600d6718-1a5e-2fed-bf39-ad7048004be2": ef53
4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91
```

See containers/podman#13227

This also makes the function fail fatally.

Signed-off-by: David Galloway <[email protected]>
djgalloway pushed a commit to ceph/ceph-container that referenced this issue Aug 9, 2022
Example where this would be needed:
```
docker rm -f 4fcd7c2012b6
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
time="2022-08-09T17:22:11Z" level=error msg="container \"4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91\" does not exist"

Error: cannot remove container 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91 as it could not be stopped: timed out waiting for file /tmp/podman-run-1111/libpod/tmp/exits/4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91: internal libpod error
docker rm -f 4fcd7c2012b6
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
ERRO[0000] Joining network namespace for container 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91: error retrieving network namespace at /tmp/podman-run-1111/netns/netns-600d6718-1a5e-2fed-bf39-ad7048004be2: unknown FS magic on "/tmp/podman-run-1111/netns/netns-600d6718-1a5e-2fed-bf39-ad7048004be2": ef53
4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91
```

See containers/podman#13227

This also makes the function fail fatally.

Signed-off-by: David Galloway <[email protected]>
djgalloway pushed a commit to ceph/ceph-container that referenced this issue Aug 9, 2022
Example where this would be needed:
```
docker rm -f 4fcd7c2012b6
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
time="2022-08-09T17:22:11Z" level=error msg="container \"4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91\" does not exist"

Error: cannot remove container 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91 as it could not be stopped: timed out waiting for file /tmp/podman-run-1111/libpod/tmp/exits/4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91: internal libpod error
docker rm -f 4fcd7c2012b6
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
ERRO[0000] Joining network namespace for container 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91: error retrieving network namespace at /tmp/podman-run-1111/netns/netns-600d6718-1a5e-2fed-bf39-ad7048004be2: unknown FS magic on "/tmp/podman-run-1111/netns/netns-600d6718-1a5e-2fed-bf39-ad7048004be2": ef53
4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91
```

See containers/podman#13227

This also makes the function fail fatally.

Signed-off-by: David Galloway <[email protected]>
djgalloway pushed a commit to ceph/ceph-container that referenced this issue Aug 9, 2022
Example where this would be needed:
```
docker rm -f 4fcd7c2012b6
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
time="2022-08-09T17:22:11Z" level=error msg="container \"4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91\" does not exist"

Error: cannot remove container 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91 as it could not be stopped: timed out waiting for file /tmp/podman-run-1111/libpod/tmp/exits/4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91: internal libpod error
docker rm -f 4fcd7c2012b6
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
ERRO[0000] Joining network namespace for container 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91: error retrieving network namespace at /tmp/podman-run-1111/netns/netns-600d6718-1a5e-2fed-bf39-ad7048004be2: unknown FS magic on "/tmp/podman-run-1111/netns/netns-600d6718-1a5e-2fed-bf39-ad7048004be2": ef53
4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91
```

See containers/podman#13227

This also makes the function fail fatally.

Signed-off-by: David Galloway <[email protected]>
djgalloway pushed a commit to ceph/ceph-container that referenced this issue Aug 9, 2022
Example where this would be needed:
```
docker rm -f 4fcd7c2012b6
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
time="2022-08-09T17:22:11Z" level=error msg="container \"4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91\" does not exist"

Error: cannot remove container 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91 as it could not be stopped: timed out waiting for file /tmp/podman-run-1111/libpod/tmp/exits/4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91: internal libpod error
docker rm -f 4fcd7c2012b6
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
ERRO[0000] Joining network namespace for container 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91: error retrieving network namespace at /tmp/podman-run-1111/netns/netns-600d6718-1a5e-2fed-bf39-ad7048004be2: unknown FS magic on "/tmp/podman-run-1111/netns/netns-600d6718-1a5e-2fed-bf39-ad7048004be2": ef53
4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91
```

See containers/podman#13227

This also makes the function fail fatally.

Signed-off-by: David Galloway <[email protected]>
djgalloway pushed a commit to ceph/ceph-container that referenced this issue Aug 9, 2022
Example where this would be needed:
```
docker rm -f 4fcd7c2012b6
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
time="2022-08-09T17:22:11Z" level=error msg="container \"4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91\" does not exist"

Error: cannot remove container 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91 as it could not be stopped: timed out waiting for file /tmp/podman-run-1111/libpod/tmp/exits/4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91: internal libpod error
docker rm -f 4fcd7c2012b6
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
ERRO[0000] Joining network namespace for container 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91: error retrieving network namespace at /tmp/podman-run-1111/netns/netns-600d6718-1a5e-2fed-bf39-ad7048004be2: unknown FS magic on "/tmp/podman-run-1111/netns/netns-600d6718-1a5e-2fed-bf39-ad7048004be2": ef53
4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91
```

See containers/podman#13227

This also makes the function fail fatally.

Signed-off-by: David Galloway <[email protected]>
guits pushed a commit to ceph/ceph-container that referenced this issue Aug 10, 2022
Example where this would be needed:
```
docker rm -f 4fcd7c2012b6
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
time="2022-08-09T17:22:11Z" level=error msg="container \"4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91\" does not exist"

Error: cannot remove container 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91 as it could not be stopped: timed out waiting for file /tmp/podman-run-1111/libpod/tmp/exits/4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91: internal libpod error
docker rm -f 4fcd7c2012b6
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
ERRO[0000] Joining network namespace for container 4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91: error retrieving network namespace at /tmp/podman-run-1111/netns/netns-600d6718-1a5e-2fed-bf39-ad7048004be2: unknown FS magic on "/tmp/podman-run-1111/netns/netns-600d6718-1a5e-2fed-bf39-ad7048004be2": ef53
4fcd7c2012b63e6bf9bffd9b7d490d12f1d557ba924e889192ec858821019d91
```

See containers/podman#13227

This also makes the function fail fatally.

Signed-off-by: David Galloway <[email protected]>
@Dragas
Copy link

Dragas commented Nov 30, 2022

I've managed to consistently reproduce the issue by writing a lengthy stream of data (think in gigabytes) via dd if=/dev/random count=4294967296 to stdout. Note that I'm also not running podman-in-podman setup, but rather an ubuntu server vm on windows host via vmware.

dragas@dvm:~$ podman info
host:
  arch: amd64
  buildahVersion: 1.23.1
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: 'conmon: /usr/bin/conmon'
    path: /usr/bin/conmon
    version: 'conmon version 2.0.25, commit: unknown'
  cpus: 2
  distribution:
    codename: jammy
    distribution: ubuntu
    version: "22.04"
  eventLogger: journald
  hostname: dvm
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.15.0-53-generic
  linkmode: dynamic
  logDriver: journald
  memFree: 1426100224
  memTotal: 4078845952
  ociRuntime:
    name: crun
    package: 'crun: /usr/bin/crun'
    path: /usr/bin/crun
    version: |-
      crun version 0.17
      commit: 0e9229ae34caaebcb86f1fde18de3acaf18c6d9a
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SE
TPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: 'slirp4netns: /usr/bin/slirp4netns'
    version: |-
      slirp4netns version 1.0.1
      commit: 6a7b16babc95b6a3056b33fb45b74a6f62262dd4
      libslirp: 4.6.1
  swapFree: 3516313600
  swapTotal: 4080005120
  uptime: 24h 43m 52.59s (Approximately 1.00 days)
plugins:
  log:
  - k8s-file
  - none
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries: {}
store:
  configFile: /home/dragas/.config/containers/storage.conf
  containerStore:
    number: 5
    paused: 0
    running: 3
    stopped: 2
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/dragas/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 44
  runRoot: /run/user/1000/containers
  volumePath: /home/dragas/.local/share/containers/storage/volumes
version:
  APIVersion: 3.4.4
  Built: 0
  BuiltTime: Thu Jan  1 00:00:00 1970
  GitCommit: ""
  GoVersion: go1.17.3
  OsArch: linux/amd64
  Version: 3.4.4

I've noticed that if you don't overwhelm the output, podman seems to deal with lengthy streams just fine.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 9, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants