Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Systemd libpod scope fails after container OOM exit #1138

Closed
m3nu opened this issue Feb 8, 2023 · 15 comments · Fixed by #1139
Closed

[Bug]: Systemd libpod scope fails after container OOM exit #1138

m3nu opened this issue Feb 8, 2023 · 15 comments · Fixed by #1139

Comments

@m3nu
Copy link

m3nu commented Feb 8, 2023

Issue Description

Since recently, when a rootless container with constrained memory is killed by the kernel due to excess memory usage (OOM), it can't be restarted, due to a failed Systemd libpod-xxx.scope unit. The error shown by podman is this:

$ podman start main
Error: OCI runtime error: unable to start container "155d489006c7cce4308ec63de210513c41a6df8d431de812d9a99735f5615355": crun: sd-bus call: Unit libpod-155d489006c7cce4308ec63de210513c41a6df8d431de812d9a99735f5615355.scope was already loaded or has a fragment file.: File exists

To get the container going again, one needs to reset the libpod scope:

$ systemctl --user reset-failed 'libpod-*.scope'

Or add the same in the container's service file:

[Service]
Restart=always
...
ExecStart=/usr/bin/podman start main
ExecStartPre=-/bin/systemctl --user reset-failed 'libpod-*.scope'

Steps to reproduce the issue

Steps to reproduce the issue

  1. Run any rootless container with limited memory.
"podman",
"container",
"create",
"--name",
"main",
"--network",
"slirp4netns:enable_ipv6=true",
"--cpus",
"1.0",
"--memory",
"2.0g",
"--memory-swap",
"2.0g",
"--publish",
"127.0.0.1:9984:80",
"docker.io/library/nextcloud:25.0.3-apache"
  1. Make the container use all the memory. E.g. using the stress tool.
podman exec -it main bash
apt update && apt install stress
stress -c 2 -i 1 -m 1 --vm-bytes 4G -t 60s
  1. Wait for container to be killed
[Wed Feb  8 10:19:21 2023] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=user.slice,mems_allowed=0,oom_memcg=/user.slice/user-6984.slice/[email protected]/user.slice/libpod-155d489006c7cce4308ec63de210513c41a6df8d431de812d9a99735f5615355.scope/container,task_memcg=/user.slice/user-6984.slice/[email protected]/user.slice/libpod-155d489006c7cce4308ec63de210513c41a6df8d431de812d9a99735f5615355.scope/container,task=stress,pid=809864,uid=2655904
[Wed Feb  8 10:19:21 2023] Memory cgroup out of memory: Killed process 809864 (stress) total-vm:2100844kB, anon-rss:1887848kB, file-rss:0kB, shmem-rss:0kB, UID:2655904 pgtables:3744kB oom_score_adj:0
  1. Container restarts fail. Both manually and by Systemd.
Feb 08 10:21:47 podhost15.test.pikapods.com systemd[112845]: Starting Podman container-main.service...
Feb 08 10:21:48 podhost15.test.pikapods.com podman[813215]: Error: OCI runtime error: unable to start container "155d489006c7cce4308ec63de210513c41a6df8d431de812d9a99735f5615355": crun: sd-bus call: Unit libpod-155d489006c7cce4308ec63de210513c41a6df8d431de812d9a99735f5615355.scope was already loaded or has a fragment file.: File exists
Feb 08 10:21:48 podhost15.test.pikapods.com systemd[112845]: container-main.service: Control process exited, code=exited, status=125/n/a
Feb 08 10:21:48 podhost15.test.pikapods.com podman[813241]: main
Feb 08 10:21:48 podhost15.test.pikapods.com systemd[112845]: container-main.service: Failed with result 'exit-code'.
Feb 08 10:21:48 podhost15.test.pikapods.com systemd[112845]: Failed to start Podman container-main.service.
Feb 08 10:21:58 podhost15.test.pikapods.com systemd[112845]: container-main.service: Scheduled restart job, restart counter is at 15.
Feb 08 10:21:58 podhost15.test.pikapods.com systemd[112845]: Stopped Podman container-main.service.
Feb 08 10:21:58 podhost15.test.pikapods.com systemd[112845]: Starting Podman container-main.service...
Feb 08 10:21:58 podhost15.test.pikapods.com podman[813387]: Error: OCI runtime error: unable to start container "155d489006c7cce4308ec63de210513c41a6df8d431de812d9a99735f5615355": crun: sd-bus call: Unit libpod-155d489006c7cce4308ec63de210513c41a6df8d431de812d9a99735f5615355.scope was already loaded or has a fragment file.: File exists
Feb 08 10:21:58 podhost15.test.pikapods.com systemd[112845]: container-main.service: Control process exited, code=exited, status=125/n/a
Feb 08 10:21:58 podhost15.test.pikapods.com podman[813413]: main
Feb 08 10:21:58 podhost15.test.pikapods.com systemd[112845]: container-main.service: Failed with result 'exit-code'.
Feb 08 10:21:58 podhost15.test.pikapods.com systemd[112845]: Failed to start Podman container-main.service.
Feb 08 10:22:08 podhost15.test.pikapods.com systemd[112845]: container-main.service: Scheduled restart job, restart counter is at 16.
Feb 08 10:22:08 podhost15.test.pikapods.com systemd[112845]: Stopped Podman container-main.service.
Feb 08 10:22:08 podhost15.test.pikapods.com systemd[112845]: Starting Podman container-main.service...
Feb 08 10:22:08 podhost15.test.pikapods.com podman[813533]: Error: OCI runtime error: unable to start container "155d489006c7cce4308ec63de210513c41a6df8d431de812d9a99735f5615355": crun: sd-bus call: Unit libpod-155d489006c7cce4308ec63de210513c41a6df8d431de812d9a99735f5615355.scope was already loaded or has a fragment file.: File exists
Feb 08 10:22:08 podhost15.test.pikapods.com systemd[112845]: container-main.service: Control process exited, code=exited, status=125/n/a
Feb 08 10:22:08 podhost15.test.pikapods.com podman[813560]: main
Feb 08 10:22:08 podhost15.test.pikapods.com systemd[112845]: container-main.service: Failed with result 'exit-code'.
Feb 08 10:22:08 podhost15.test.pikapods.com systemd[112845]: Failed to start Podman container-main.service.
$ podman start main
Error: OCI runtime error: unable to start container "155d489006c7cce4308ec63de210513c41a6df8d431de812d9a99735f5615355": crun: sd-bus call: Unit libpod-155d489006c7cce4308ec63de210513c41a6df8d431de812d9a99735f5615355.scope was already loaded or has a fragment file.: File exists

Describe the results you received

A container that was killed due to OOM can't be restarted automatically or manually.

Describe the results you expected

A container should be restarted automatically after an OOM kill, if restart policy is set to always.

podman info output

podman info
host:
  arch: amd64
  buildahVersion: 1.29.0
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.5-1.el9.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.5, commit: 48adb81a22c26f0660f0f37d984baebe7b9ade98'
  cpuUtilization:
    idlePercent: 91.91
    systemPercent: 3.41
    userPercent: 4.68
  cpus: 4
  distribution:
    distribution: '"centos"'
    version: "9"
  eventLogger: file
  hostname: podhost15.test.pikapods.com
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 6984
      size: 1
    - container_id: 1
      host_id: 2655904
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 6984
      size: 1
    - container_id: 1
      host_id: 2655904
      size: 65536
  kernel: 5.14.0-252.el9.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 2248404992
  memTotal: 16504320000
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.8-1.el9.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.8
      commit: 0356bf4aff9a133d655dc13b1d9ac9424706cac4
      rundir: /run/user/6984/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    path: /run/user/6984/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /bin/slirp4netns
    package: slirp4netns-1.2.0-2.el9.x86_64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 2530480128
  swapTotal: 4293914624
  uptime: 13h 56m 0.00s (Approximately 0.54 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /srv/pods/p6984/.config/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.imagestore: /var/lib/containers/storage
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.10-1.el9.x86_64
      Version: |-
        fusermount3 version: 3.10.2
        fuse-overlayfs: version 1.10
        FUSE library version 3.10.2
        using FUSE kernel interface version 7.31
  graphRoot: /srv/pods/p6984/.local/share/containers/storage
  graphRootAllocated: 104102100992
  graphRootUsed: 54225817600
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 65
  runRoot: /run/user/6984/containers
  transientStore: false
  volumePath: /srv/pods/p6984/.local/share/containers/storage/volumes
version:
  APIVersion: 4.4.0
  Built: 1675439377
  BuiltTime: Fri Feb  3 15:49:37 2023
  GitCommit: ""
  GoVersion: go1.19.4
  Os: linux
  OsArch: linux/amd64
  Version: 4.4.0

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

Container is running as non-priviledged user (rootless) on a physical machine.

Additional information

This worked about 2-3 months ago and could be related to some recent change. Maybe containers/podman#13731.

@rhatdan
Copy link
Member

rhatdan commented Feb 8, 2023

@vrothberg @giuseppe PTAL

@Ember-ruby
Copy link

experiencing this too.

@m3nu m3nu changed the title [Bug]: [Bug]: Systemd libpod scope fails after container OOM exit Feb 9, 2023
@giuseppe
Copy link
Member

giuseppe commented Feb 9, 2023

I am not able to reproduce on Fedora 37 (systemd-251.10-588.fc37.x86_64), on CentOS (systemd-252-3.el9.x86_64) and on RHEL (systemd-250-12.el9_1.1.x86_64):

$ podman create --name foo --memory 20M --memory-swap 20M fedora sh -c 'yes | shuf'
56af22839eececd8f5687e37e64ce73da6dd353b0185cc712a69868ca979dde7
$ podman start foo
foo
$ podman start foo
foo
$ podman start foo
foo

If you are using the cpu controller, it means you needed to tweak the default systemd configuration. Could you please share the changes you've done?

@m3nu
Copy link
Author

m3nu commented Feb 9, 2023

Hey @giuseppe! Appreciate having a look and doing a minimal example. The issue happens only when the container is supervised by Systemd. So to expand on your example: (ran on Centos as ordinary user with SELinux enforcing)

$ podman create --name foo --memory 20M --memory-swap 20M fedora sh -c 'yes | shuf'
185aaf9908f968d4e2c0d1e7880fe68dd3af185e2aad797cfb8b8384b4a5b234
$ cd ~/.config/systemd/user
$ podman generate systemd --files --restart-policy always --name foo
$  systemctl --user daemon-reload
$ systemctl --user enable --now container-foo

Now to review the error.

First the service is failed.

$ systemctl --user status container-foo
× container-foo.service - Podman container-foo.service
     Loaded: loaded (/srv/pods/p6953/.config/systemd/user/container-foo.service; enabled; preset: disabled)
     Active: failed (Result: exit-code) since Thu 2023-02-09 20:42:33 UTC; 12s ago
   Duration: 364ms
       Docs: man:podman-generate-systemd(1)
    Process: 2775588 ExecStart=/usr/bin/podman start foo (code=exited, status=125)
    Process: 2775614 ExecStopPost=/usr/bin/podman stop -t 10 foo (code=exited, status=0/SUCCESS)
        CPU: 169ms

Status in podman

$ podman ps -a
CONTAINER ID  IMAGE                                     COMMAND           CREATED        STATUS                           PORTS                     NAMES
185aaf9908f9  registry.fedoraproject.org/fedora:latest  sh -c yes | shuf  2 minutes ago  Exited (137) About a minute ago                            foo

Stop service and try to start manually => Get described error.

$ systemctl --user stop container-foo
$ podman start foo
Error: OCI runtime error: unable to start container "185aaf9908f968d4e2c0d1e7880fe68dd3af185e2aad797cfb8b8384b4a5b234": crun: sd-bus call: Unit libpod-185aaf9908f968d4e2c0d1e7880fe68dd3af185e2aad797cfb8b8384b4a5b234.scope was already loaded or has a fragment file.: File exists

I didn't do any CPU modifications and the error also happens without CPU limit set.

@giuseppe
Copy link
Member

giuseppe commented Feb 10, 2023

I am not still able to reproduce the issue. When I attempt to use the stress command, only the stress command itself is killed not the entire container. Have you tweaked the cgroup configuration in any way (e.g. set memory.oom.group`)?
Are you using oomd?

I don't think the configuration snippet you've provided is going to work.

[Service]
Restart=always
...
ExecStart=/usr/bin/podman start main
ExecStartPre=-/bin/systemctl --user reset-failed 'libpod-*.scope'

systemd will monitor the podman command that exits as soon as the container is started.

You need at least to use "start -a" so the podman command stays around while the container runs, and even in this case it will cause races with the cleanup process.

Even better if you use a template generated by podman generate systemd.

@m3nu
Copy link
Author

m3nu commented Feb 10, 2023

No, I'm not using oomd and I haven't made any cgroup changes. I don't really know what that is.

The Systemd service is generated by podman. Below the full version. So you are saying podman 4.3.1 generates a wrong Systemd file and is missing the -a flag?

I'll try to reproduce this on a fresh VM now. That has no modification at all. Will report back in 15min roughly.

# container-main.service
# autogenerated by Podman 4.3.1
# Fri Jan 27 19:31:34 UTC 2023

[Unit]
Description=Podman container-main.service
Documentation=man:podman-generate-systemd(1)
Wants=network-online.target
After=network-online.target
RequiresMountsFor=/run/user/6953/containers
BindsTo=pod-mainpod.service
After=pod-mainpod.service

[Service]
Environment=PODMAN_SYSTEMD_UNIT=%n
Restart=always
TimeoutStopSec=70
ExecStart=/usr/bin/podman start main
ExecStop=/usr/bin/podman stop  \
        -t 10 main
ExecStopPost=/usr/bin/podman stop  \
        -t 10 main
PIDFile=/run/user/6953/containers/overlay-containers/f09a7167266e2376cff3d5a3a5d9df4b7b302c89ab295ab59fdcc81276c793c7/userdata/conmon.pid
Type=forking
ExecStartPre=-/bin/systemctl --user reset-failed 'libpod-*.scope'

[Install]
WantedBy=default.target

@giuseppe
Copy link
Member

no the default service has no issue because it is using PIDFile and Type=forking.

@m3nu
Copy link
Author

m3nu commented Feb 10, 2023

I was successful at reproducing this on a fresh VM that doesn't have any customizations. (CentOS 9 on Hetzner cloud)

Commands run:

# yum update -y && reboot
# yum install -y podman
# adduser pod2
# loginctl enable-linger pod2
# sudo -s -u pod2
$ export XDG_RUNTIME_DIR="/run/user/$UID"
$ mkdir -p ~/.config/systemd/user && cd ~/.config/systemd/user
$ podman create --name foo --memory 20M --memory-swap 20M fedora sh -c 'yes | shuf'
$ podman generate systemd --files --restart-policy always --name foo
$ systemctl --user daemon-reload
$ systemctl --user enable --now container-foo
$ systemctl --user stop container-foo
$ podman start foo
Error: OCI runtime error: unable to start container "089e5999feb092f23e8c269393aca90fe7e0606b4721838d377c6adf60d00e0e": crun: sd-bus call: Unit libpod-089e5999feb092f23e8c269393aca90fe7e0606b4721838d377c6adf60d00e0e.scope was already loaded or has a fragment file.: File exists

@giuseppe
Copy link
Member

I am still not able to see the issue, I've tried on a fully updated CentOS Stream 9 VM (although not freshly installed).

$ podman --version
podman version 4.4.0

how have you installed it? I'll try with a fresh VM on Monday if it makes any difference

@m3nu
Copy link
Author

m3nu commented Feb 10, 2023

This is just the image Hetzner provides. They wouldn't do many customizations. I tried 2x to make sure it's reproducible.

Our own physical machines are installed from the official ISO.

I'll try on a second cloud provider later.

@giuseppe
Copy link
Member

thanks for the update, I am finally able to reproduce it on a freshly installed CentOS 9 Stream VM.

@m3nu
Copy link
Author

m3nu commented Feb 10, 2023

Awesome! Excited to learn what you think and if there is a better workaround or potential fix in Podman.

@giuseppe giuseppe transferred this issue from containers/podman Feb 10, 2023
giuseppe added a commit to giuseppe/crun that referenced this issue Feb 10, 2023
if the unit could not be started, attempt to reset it first, and then
start it again.

Closes: containers#1138

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe
Copy link
Member

let's try to workaround this problem in crun: #1139

@m3nu
Copy link
Author

m3nu commented Feb 11, 2023

Good stuff! Will try to compile and test this by Monday at the latest.

@m3nu
Copy link
Author

m3nu commented Feb 12, 2023

The changes in this branch seem to solve the issue. Thanks for the quick fix and taking the time to reproduce @giuseppe! Tested as follows on CentOS 9:

yum update -y
yum --enablerepo='*' --disablerepo='media-*' --disablerepo='nfv-*' install -y make automake \
    autoconf gettext \
    libtool gcc libcap-devel systemd-devel yajl-devel libgcrypt-devel \
    glibc-static libseccomp-devel git podman
git clone https://github.com/giuseppe/crun/
cd crun && git checkout reset-systemd-unit-on-error
./autogen.sh
./configure --enable-shared --prefix=/
make install

Then steps from #1138 (comment)

As expected the container can still be started after an OOM kill: 👍

[   78.811050] Memory cgroup out of memory: Killed process 1564 (shuf) total-vm:29088kB, anon-rss:19512kB, file-rss:1280kB, shmem-rss:0kB, UID:1000 pgtables:92kB oom_score_adj:200
[pod2@centos-2gb-ash-1 user]$ systemctl --user stop container-foo
[pod2@centos-2gb-ash-1 user]$ podman start foo
foo

Is it working for you as well, @Ember-ruby?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants