podman.service: KillMode=process leaks pause process, breaks subsequent API invocations #7021

martinpitt · 2020-07-20T05:37:21Z

/kind bug
Description

podman.service, in particular the user one, uses KillMode=process. When the unit stops, this leaves behind a process

admin      20185  1.3  2.4 1281680 50484 ?       Ssl  05:31   0:00 /usr/bin/podman system service

Not only is this a resource leak, it actively breaks the next time the API gets invoked.

Steps to reproduce the issue:

Make sure user's podman service is enabled:

systemctl  --user enable --now podman.socket

Make an API request to start podman.service:

curl --unix $XDG_RUNTIME_DIR/podman/podman.sock http://localhost/

Log out the user
Ensure in loginctl (as root) that the user session is gone. But pgrep -au username still shows a leftover process:

# pgrep -au admin
17536 podman

Log back in as the user
Do another API request, same curl command

Describe the results you received:

In (6), the API request hangs. In journalctl you see an repeated restart/failure:

systemd[19935]: Starting Podman API Service...
systemd[19935]: podman.service: Succeeded.
systemd[19935]: podman.service: Unit process 19981 (podman pause) remains running after unit stopped.
systemd[19935]: Finished Podman API Service.
systemd[19935]: podman.service: Found left-over process 19981 (podman pause) in control group while starting unit. Ignoring.
systemd[19935]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.

Describe the results you expected:

podman.service stops cleanly, and restarts work.

Additional information you deem important (e.g. issue happens only occasionally):

I noticed this when enabling the user-container tests in cockpit-podman in Fedora dist-git gating.
pull request

As a workaround, I drop KillMode from the unit:

# avoid leftover user podman processes between login sessions
mkdir -p /etc/systemd/user/podman.service.d
printf '[Service]\nKillMode=\n' > /etc/systemd/user/podman.service.d/cleanup.conf

Output of podman version:

Version:      2.1.0-dev
API Version:  1
Go Version:   go1.14.4
Built:        Thu Jan  1 00:00:00 1970
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.16.0-dev
  cgroupVersion: v2
  conmon:
    package: conmon-2.0.19-0.5.dev.giteff699e.fc33.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.19-dev, commit: 3c47d3797172bffa8ab02661ac4805b593cfb4ba'
  cpus: 3
  distribution:
    distribution: fedora
    version: "33"
  eventLogger: file
  hostname: localhost
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
  kernel: 5.8.0-0.rc5.1.fc33.x86_64
  linkmode: dynamic
  memFree: 261386240
  memTotal: 2075725824
  ociRuntime:
    name: crun
    package: crun-0.14-1.fc33.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 0.14
      commit: ebc56fc9bcce4b3208bb0079636c80545122bf58
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1001/podman/podman.sock
  rootless: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.4-2.dev.git4c6befe.fc33.x86_64
    version: |-
      slirp4netns version 1.1.4+dev
      commit: 4c6befe05c3137232cf06a5c2879daf4c20be6b1
      libslirp: 4.3.1
      SLIRP_CONFIG_VERSION_MAX: 3
  swapFree: 0
  swapTotal: 0
  uptime: 32m 57.27s
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - docker.io
store:
  configFile: /home/admin/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.1.0-6.dev.git50ab2c2.fc33.x86_64
      Version: |-
        fusermount3 version: 3.9.2
        fuse-overlayfs: version 1.1.0
        FUSE library version 3.9.2
        using FUSE kernel interface version 7.31
  graphRoot: /home/admin/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 3
  runRoot: /run/user/1001/containers
  volumePath: /home/admin/.local/share/containers/storage/volumes
version:
  APIVersion: 1
  Built: 0
  BuiltTime: Thu Jan  1 00:00:00 1970
  GitCommit: ""
  GoVersion: go1.14.4
  OsArch: linux/amd64
  Version: 2.1.0-dev

Package info (e.g. output of rpm -q podman or apt list podman):

podman-2.1.0-0.77.dev.git60127cf.fc33.x86_64

Additional environment details (AWS, VirtualBox, physical, etc.):

Local Fedora Rawhide VM (from dist-git gating)

The text was updated successfully, but these errors were encountered:

vrothberg · 2020-07-20T09:06:54Z

Thanks for opening the issue, @martinpitt! I'll take a look.

vrothberg · 2020-07-20T09:21:44Z

Opened #7023 along with some other clean ups.

Do not set the killmode to process as it only kills the main process and leaves other processes untouched. Just remove the line and use the default cgroup killmode which will kill all processes in the service's cgroup. Fixes: containers#7021 Signed-off-by: Valentin Rothberg <[email protected]>

more about flaky cache: - containers/podman#7021 - containers/podman#7294 using mv does not work across boots, running rm -rf seems too adventorous

openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 20, 2020

vrothberg self-assigned this Jul 20, 2020

vrothberg mentioned this issue Jul 20, 2020

contrib/systemd cleanups #7023

Merged

openshift-merge-robot closed this as completed in #7023 Jul 20, 2020

vrothberg mentioned this issue Aug 12, 2020

Containers started using socket-activated APIv2 die from systemd activation timeout #7294

Closed

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

podman.service: KillMode=process leaks pause process, breaks subsequent API invocations #7021

podman.service: KillMode=process leaks pause process, breaks subsequent API invocations #7021

martinpitt commented Jul 20, 2020 •

edited

Loading

vrothberg commented Jul 20, 2020

vrothberg commented Jul 20, 2020

podman.service: KillMode=process leaks pause process, breaks subsequent API invocations #7021

podman.service: KillMode=process leaks pause process, breaks subsequent API invocations #7021

Comments

martinpitt commented Jul 20, 2020 • edited Loading

vrothberg commented Jul 20, 2020

vrothberg commented Jul 20, 2020

martinpitt commented Jul 20, 2020 •

edited

Loading