[Bug]: Systemd libpod scope fails after container OOM exit #1138

m3nu · 2023-02-08T10:33:30Z

Issue Description

Since recently, when a rootless container with constrained memory is killed by the kernel due to excess memory usage (OOM), it can't be restarted, due to a failed Systemd libpod-xxx.scope unit. The error shown by podman is this:

$ podman start main
Error: OCI runtime error: unable to start container "155d489006c7cce4308ec63de210513c41a6df8d431de812d9a99735f5615355": crun: sd-bus call: Unit libpod-155d489006c7cce4308ec63de210513c41a6df8d431de812d9a99735f5615355.scope was already loaded or has a fragment file.: File exists

To get the container going again, one needs to reset the libpod scope:

$ systemctl --user reset-failed 'libpod-*.scope'

Or add the same in the container's service file:

[Service]
Restart=always
...
ExecStart=/usr/bin/podman start main
ExecStartPre=-/bin/systemctl --user reset-failed 'libpod-*.scope'

Steps to reproduce the issue

Run any rootless container with limited memory.

"podman",
"container",
"create",
"--name",
"main",
"--network",
"slirp4netns:enable_ipv6=true",
"--cpus",
"1.0",
"--memory",
"2.0g",
"--memory-swap",
"2.0g",
"--publish",
"127.0.0.1:9984:80",
"docker.io/library/nextcloud:25.0.3-apache"

Make the container use all the memory. E.g. using the stress tool.

podman exec -it main bash
apt update && apt install stress
stress -c 2 -i 1 -m 1 --vm-bytes 4G -t 60s

Wait for container to be killed

[Wed Feb  8 10:19:21 2023] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=user.slice,mems_allowed=0,oom_memcg=/user.slice/user-6984.slice/[email protected]/user.slice/libpod-155d489006c7cce4308ec63de210513c41a6df8d431de812d9a99735f5615355.scope/container,task_memcg=/user.slice/user-6984.slice/[email protected]/user.slice/libpod-155d489006c7cce4308ec63de210513c41a6df8d431de812d9a99735f5615355.scope/container,task=stress,pid=809864,uid=2655904
[Wed Feb  8 10:19:21 2023] Memory cgroup out of memory: Killed process 809864 (stress) total-vm:2100844kB, anon-rss:1887848kB, file-rss:0kB, shmem-rss:0kB, UID:2655904 pgtables:3744kB oom_score_adj:0

Container restarts fail. Both manually and by Systemd.

Feb 08 10:21:47 podhost15.test.pikapods.com systemd[112845]: Starting Podman container-main.service...
Feb 08 10:21:48 podhost15.test.pikapods.com podman[813215]: Error: OCI runtime error: unable to start container "155d489006c7cce4308ec63de210513c41a6df8d431de812d9a99735f5615355": crun: sd-bus call: Unit libpod-155d489006c7cce4308ec63de210513c41a6df8d431de812d9a99735f5615355.scope was already loaded or has a fragment file.: File exists
Feb 08 10:21:48 podhost15.test.pikapods.com systemd[112845]: container-main.service: Control process exited, code=exited, status=125/n/a
Feb 08 10:21:48 podhost15.test.pikapods.com podman[813241]: main
Feb 08 10:21:48 podhost15.test.pikapods.com systemd[112845]: container-main.service: Failed with result 'exit-code'.
Feb 08 10:21:48 podhost15.test.pikapods.com systemd[112845]: Failed to start Podman container-main.service.
Feb 08 10:21:58 podhost15.test.pikapods.com systemd[112845]: container-main.service: Scheduled restart job, restart counter is at 15.
Feb 08 10:21:58 podhost15.test.pikapods.com systemd[112845]: Stopped Podman container-main.service.
Feb 08 10:21:58 podhost15.test.pikapods.com systemd[112845]: Starting Podman container-main.service...
Feb 08 10:21:58 podhost15.test.pikapods.com podman[813387]: Error: OCI runtime error: unable to start container "155d489006c7cce4308ec63de210513c41a6df8d431de812d9a99735f5615355": crun: sd-bus call: Unit libpod-155d489006c7cce4308ec63de210513c41a6df8d431de812d9a99735f5615355.scope was already loaded or has a fragment file.: File exists
Feb 08 10:21:58 podhost15.test.pikapods.com systemd[112845]: container-main.service: Control process exited, code=exited, status=125/n/a
Feb 08 10:21:58 podhost15.test.pikapods.com podman[813413]: main
Feb 08 10:21:58 podhost15.test.pikapods.com systemd[112845]: container-main.service: Failed with result 'exit-code'.
Feb 08 10:21:58 podhost15.test.pikapods.com systemd[112845]: Failed to start Podman container-main.service.
Feb 08 10:22:08 podhost15.test.pikapods.com systemd[112845]: container-main.service: Scheduled restart job, restart counter is at 16.
Feb 08 10:22:08 podhost15.test.pikapods.com systemd[112845]: Stopped Podman container-main.service.
Feb 08 10:22:08 podhost15.test.pikapods.com systemd[112845]: Starting Podman container-main.service...
Feb 08 10:22:08 podhost15.test.pikapods.com podman[813533]: Error: OCI runtime error: unable to start container "155d489006c7cce4308ec63de210513c41a6df8d431de812d9a99735f5615355": crun: sd-bus call: Unit libpod-155d489006c7cce4308ec63de210513c41a6df8d431de812d9a99735f5615355.scope was already loaded or has a fragment file.: File exists
Feb 08 10:22:08 podhost15.test.pikapods.com systemd[112845]: container-main.service: Control process exited, code=exited, status=125/n/a
Feb 08 10:22:08 podhost15.test.pikapods.com podman[813560]: main
Feb 08 10:22:08 podhost15.test.pikapods.com systemd[112845]: container-main.service: Failed with result 'exit-code'.
Feb 08 10:22:08 podhost15.test.pikapods.com systemd[112845]: Failed to start Podman container-main.service.

$ podman start main
Error: OCI runtime error: unable to start container "155d489006c7cce4308ec63de210513c41a6df8d431de812d9a99735f5615355": crun: sd-bus call: Unit libpod-155d489006c7cce4308ec63de210513c41a6df8d431de812d9a99735f5615355.scope was already loaded or has a fragment file.: File exists

Describe the results you received

A container that was killed due to OOM can't be restarted automatically or manually.

Describe the results you expected

A container should be restarted automatically after an OOM kill, if restart policy is set to always.

podman info output

podman info
host:
  arch: amd64
  buildahVersion: 1.29.0
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.5-1.el9.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.5, commit: 48adb81a22c26f0660f0f37d984baebe7b9ade98'
  cpuUtilization:
    idlePercent: 91.91
    systemPercent: 3.41
    userPercent: 4.68
  cpus: 4
  distribution:
    distribution: '"centos"'
    version: "9"
  eventLogger: file
  hostname: podhost15.test.pikapods.com
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 6984
      size: 1
    - container_id: 1
      host_id: 2655904
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 6984
      size: 1
    - container_id: 1
      host_id: 2655904
      size: 65536
  kernel: 5.14.0-252.el9.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 2248404992
  memTotal: 16504320000
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.8-1.el9.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.8
      commit: 0356bf4aff9a133d655dc13b1d9ac9424706cac4
      rundir: /run/user/6984/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    path: /run/user/6984/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /bin/slirp4netns
    package: slirp4netns-1.2.0-2.el9.x86_64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 2530480128
  swapTotal: 4293914624
  uptime: 13h 56m 0.00s (Approximately 0.54 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /srv/pods/p6984/.config/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.imagestore: /var/lib/containers/storage
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.10-1.el9.x86_64
      Version: |-
        fusermount3 version: 3.10.2
        fuse-overlayfs: version 1.10
        FUSE library version 3.10.2
        using FUSE kernel interface version 7.31
  graphRoot: /srv/pods/p6984/.local/share/containers/storage
  graphRootAllocated: 104102100992
  graphRootUsed: 54225817600
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 65
  runRoot: /run/user/6984/containers
  transientStore: false
  volumePath: /srv/pods/p6984/.local/share/containers/storage/volumes
version:
  APIVersion: 4.4.0
  Built: 1675439377
  BuiltTime: Fri Feb  3 15:49:37 2023
  GitCommit: ""
  GoVersion: go1.19.4
  Os: linux
  OsArch: linux/amd64
  Version: 4.4.0

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

Container is running as non-priviledged user (rootless) on a physical machine.

Additional information

This worked about 2-3 months ago and could be related to some recent change. Maybe containers/podman#13731.

The text was updated successfully, but these errors were encountered:

rhatdan · 2023-02-08T22:25:28Z

@vrothberg @giuseppe PTAL

Ember-ruby · 2023-02-09T06:11:14Z

experiencing this too.

giuseppe · 2023-02-09T11:24:55Z

I am not able to reproduce on Fedora 37 (systemd-251.10-588.fc37.x86_64), on CentOS (systemd-252-3.el9.x86_64) and on RHEL (systemd-250-12.el9_1.1.x86_64):

$ podman create --name foo --memory 20M --memory-swap 20M fedora sh -c 'yes | shuf'
56af22839eececd8f5687e37e64ce73da6dd353b0185cc712a69868ca979dde7
$ podman start foo
foo
$ podman start foo
foo
$ podman start foo
foo

If you are using the cpu controller, it means you needed to tweak the default systemd configuration. Could you please share the changes you've done?

m3nu · 2023-02-09T20:48:54Z

Hey @giuseppe! Appreciate having a look and doing a minimal example. The issue happens only when the container is supervised by Systemd. So to expand on your example: (ran on Centos as ordinary user with SELinux enforcing)

$ podman create --name foo --memory 20M --memory-swap 20M fedora sh -c 'yes | shuf'
185aaf9908f968d4e2c0d1e7880fe68dd3af185e2aad797cfb8b8384b4a5b234
$ cd ~/.config/systemd/user
$ podman generate systemd --files --restart-policy always --name foo
$  systemctl --user daemon-reload
$ systemctl --user enable --now container-foo

Now to review the error.

First the service is failed.

$ systemctl --user status container-foo
× container-foo.service - Podman container-foo.service
     Loaded: loaded (/srv/pods/p6953/.config/systemd/user/container-foo.service; enabled; preset: disabled)
     Active: failed (Result: exit-code) since Thu 2023-02-09 20:42:33 UTC; 12s ago
   Duration: 364ms
       Docs: man:podman-generate-systemd(1)
    Process: 2775588 ExecStart=/usr/bin/podman start foo (code=exited, status=125)
    Process: 2775614 ExecStopPost=/usr/bin/podman stop -t 10 foo (code=exited, status=0/SUCCESS)
        CPU: 169ms

Status in podman

$ podman ps -a
CONTAINER ID  IMAGE                                     COMMAND           CREATED        STATUS                           PORTS                     NAMES
185aaf9908f9  registry.fedoraproject.org/fedora:latest  sh -c yes | shuf  2 minutes ago  Exited (137) About a minute ago                            foo

Stop service and try to start manually => Get described error.

$ systemctl --user stop container-foo
$ podman start foo
Error: OCI runtime error: unable to start container "185aaf9908f968d4e2c0d1e7880fe68dd3af185e2aad797cfb8b8384b4a5b234": crun: sd-bus call: Unit libpod-185aaf9908f968d4e2c0d1e7880fe68dd3af185e2aad797cfb8b8384b4a5b234.scope was already loaded or has a fragment file.: File exists

I didn't do any CPU modifications and the error also happens without CPU limit set.

giuseppe · 2023-02-10T09:24:18Z

I am not still able to reproduce the issue. When I attempt to use the stress command, only the stress command itself is killed not the entire container. Have you tweaked the cgroup configuration in any way (e.g. set memory.oom.group`)?
Are you using oomd?

I don't think the configuration snippet you've provided is going to work.

[Service]
Restart=always
...
ExecStart=/usr/bin/podman start main
ExecStartPre=-/bin/systemctl --user reset-failed 'libpod-*.scope'

systemd will monitor the podman command that exits as soon as the container is started.

You need at least to use "start -a" so the podman command stays around while the container runs, and even in this case it will cause races with the cleanup process.

Even better if you use a template generated by podman generate systemd.

m3nu · 2023-02-10T10:57:17Z

No, I'm not using oomd and I haven't made any cgroup changes. I don't really know what that is.

The Systemd service is generated by podman. Below the full version. So you are saying podman 4.3.1 generates a wrong Systemd file and is missing the -a flag?

I'll try to reproduce this on a fresh VM now. That has no modification at all. Will report back in 15min roughly.

# container-main.service
# autogenerated by Podman 4.3.1
# Fri Jan 27 19:31:34 UTC 2023

[Unit]
Description=Podman container-main.service
Documentation=man:podman-generate-systemd(1)
Wants=network-online.target
After=network-online.target
RequiresMountsFor=/run/user/6953/containers
BindsTo=pod-mainpod.service
After=pod-mainpod.service

[Service]
Environment=PODMAN_SYSTEMD_UNIT=%n
Restart=always
TimeoutStopSec=70
ExecStart=/usr/bin/podman start main
ExecStop=/usr/bin/podman stop  \
        -t 10 main
ExecStopPost=/usr/bin/podman stop  \
        -t 10 main
PIDFile=/run/user/6953/containers/overlay-containers/f09a7167266e2376cff3d5a3a5d9df4b7b302c89ab295ab59fdcc81276c793c7/userdata/conmon.pid
Type=forking
ExecStartPre=-/bin/systemctl --user reset-failed 'libpod-*.scope'

[Install]
WantedBy=default.target

giuseppe · 2023-02-10T11:04:33Z

no the default service has no issue because it is using PIDFile and Type=forking.

m3nu · 2023-02-10T11:45:10Z

I was successful at reproducing this on a fresh VM that doesn't have any customizations. (CentOS 9 on Hetzner cloud)

Commands run:

# yum update -y && reboot
# yum install -y podman
# adduser pod2
# loginctl enable-linger pod2
# sudo -s -u pod2
$ export XDG_RUNTIME_DIR="/run/user/$UID"
$ mkdir -p ~/.config/systemd/user && cd ~/.config/systemd/user
$ podman create --name foo --memory 20M --memory-swap 20M fedora sh -c 'yes | shuf'
$ podman generate systemd --files --restart-policy always --name foo
$ systemctl --user daemon-reload
$ systemctl --user enable --now container-foo
$ systemctl --user stop container-foo
$ podman start foo
Error: OCI runtime error: unable to start container "089e5999feb092f23e8c269393aca90fe7e0606b4721838d377c6adf60d00e0e": crun: sd-bus call: Unit libpod-089e5999feb092f23e8c269393aca90fe7e0606b4721838d377c6adf60d00e0e.scope was already loaded or has a fragment file.: File exists

giuseppe · 2023-02-10T17:09:31Z

I am still not able to see the issue, I've tried on a fully updated CentOS Stream 9 VM (although not freshly installed).

$ podman --version
podman version 4.4.0

how have you installed it? I'll try with a fresh VM on Monday if it makes any difference

m3nu · 2023-02-10T19:39:22Z

This is just the image Hetzner provides. They wouldn't do many customizations. I tried 2x to make sure it's reproducible.

Our own physical machines are installed from the official ISO.

I'll try on a second cloud provider later.

giuseppe · 2023-02-10T20:23:30Z

thanks for the update, I am finally able to reproduce it on a freshly installed CentOS 9 Stream VM.

m3nu · 2023-02-10T20:41:46Z

Awesome! Excited to learn what you think and if there is a better workaround or potential fix in Podman.

if the unit could not be started, attempt to reset it first, and then start it again. Closes: containers#1138 Signed-off-by: Giuseppe Scrivano <[email protected]>

giuseppe · 2023-02-10T22:21:22Z

let's try to workaround this problem in crun: #1139

m3nu · 2023-02-11T13:58:59Z

Good stuff! Will try to compile and test this by Monday at the latest.

m3nu · 2023-02-12T10:26:22Z

The changes in this branch seem to solve the issue. Thanks for the quick fix and taking the time to reproduce @giuseppe! Tested as follows on CentOS 9:

yum update -y
yum --enablerepo='*' --disablerepo='media-*' --disablerepo='nfv-*' install -y make automake \
    autoconf gettext \
    libtool gcc libcap-devel systemd-devel yajl-devel libgcrypt-devel \
    glibc-static libseccomp-devel git podman
git clone https://github.com/giuseppe/crun/
cd crun && git checkout reset-systemd-unit-on-error
./autogen.sh
./configure --enable-shared --prefix=/
make install

Then steps from #1138 (comment)

As expected the container can still be started after an OOM kill: 👍

[   78.811050] Memory cgroup out of memory: Killed process 1564 (shuf) total-vm:29088kB, anon-rss:19512kB, file-rss:1280kB, shmem-rss:0kB, UID:1000 pgtables:92kB oom_score_adj:200
[pod2@centos-2gb-ash-1 user]$ systemctl --user stop container-foo
[pod2@centos-2gb-ash-1 user]$ podman start foo
foo

Is it working for you as well, @Ember-ruby?

m3nu changed the title ~~[Bug]:~~ [Bug]: Systemd libpod scope fails after container OOM exit Feb 9, 2023

giuseppe transferred this issue from containers/podman Feb 10, 2023

giuseppe added a commit to giuseppe/crun that referenced this issue Feb 10, 2023

cgroup: reset systemd unit if start fails

46fbeee

if the unit could not be started, attempt to reset it first, and then start it again. Closes: containers#1138 Signed-off-by: Giuseppe Scrivano <[email protected]>

giuseppe mentioned this issue Feb 10, 2023

cgroup: reset systemd unit if start fails #1139

Merged

rhatdan closed this as completed in #1139 Feb 13, 2023

edsantiago mentioned this issue Sep 18, 2024

unable to create pod cgroup: slice was already loaded or has a fragment file containers/podman#24010

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Systemd libpod scope fails after container OOM exit #1138

[Bug]: Systemd libpod scope fails after container OOM exit #1138

m3nu commented Feb 8, 2023 •

edited

Loading

rhatdan commented Feb 8, 2023

Ember-ruby commented Feb 9, 2023

giuseppe commented Feb 9, 2023

m3nu commented Feb 9, 2023

giuseppe commented Feb 10, 2023 •

edited

Loading

m3nu commented Feb 10, 2023

giuseppe commented Feb 10, 2023

m3nu commented Feb 10, 2023 •

edited

Loading

giuseppe commented Feb 10, 2023

m3nu commented Feb 10, 2023

giuseppe commented Feb 10, 2023

m3nu commented Feb 10, 2023

giuseppe commented Feb 10, 2023

m3nu commented Feb 11, 2023

m3nu commented Feb 12, 2023

[Bug]: Systemd libpod scope fails after container OOM exit #1138

[Bug]: Systemd libpod scope fails after container OOM exit #1138

Comments

m3nu commented Feb 8, 2023 • edited Loading

Issue Description

Steps to reproduce the issue

Describe the results you received

Describe the results you expected

podman info output

Podman in a container

Privileged Or Rootless

Upstream Latest Release

Additional environment details

Additional information

rhatdan commented Feb 8, 2023

Ember-ruby commented Feb 9, 2023

giuseppe commented Feb 9, 2023

m3nu commented Feb 9, 2023

giuseppe commented Feb 10, 2023 • edited Loading

m3nu commented Feb 10, 2023

giuseppe commented Feb 10, 2023

m3nu commented Feb 10, 2023 • edited Loading

giuseppe commented Feb 10, 2023

m3nu commented Feb 10, 2023

giuseppe commented Feb 10, 2023

m3nu commented Feb 10, 2023

giuseppe commented Feb 10, 2023

m3nu commented Feb 11, 2023

m3nu commented Feb 12, 2023

m3nu commented Feb 8, 2023 •

edited

Loading

giuseppe commented Feb 10, 2023 •

edited

Loading

m3nu commented Feb 10, 2023 •

edited

Loading