Zombie `slirp4netns` processes left on the system when using podman API service #9777

henryhchchc · 2021-03-22T02:21:51Z

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

podman REST API service will left zombie slirp4netns processes until it is stopped.

I think the reason is that at Line 468 of libpod/networking_linux.go, the slirp4netns process is started but never waited by podman.

Steps to reproduce the issue:

Run podman system service -t 0 to start the API service and keep it running
Start and stop N containers via the podman REST API

Describe the results you received:

There will be N zombie slirp4netns processes left on the system until the API service is stopped.

Describe the results you expected:

There is no zombie slirp4netns process.

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

Version:      3.0.0-dev
API Version:  3.0.0
Go Version:   go1.15.7
Built:        Wed Feb  3 06:06:33 2021
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.19.2
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: conmon-2.0.25-1.module_el8.4.0+673+eabfc99d.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.25, commit: 897f4ebd69b9e9c725621fabf1d7c918ef635a68'
  cpus: 36
  distribution:
    distribution: '"centos"'
    version: "8"
  eventLogger: file
  hostname: <host name>
  idMappings:
    gidmap:
    - container_id: 0
      host_id: <gid>
      size: 1
    - container_id: 1
      host_id: 200000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: <uid>
      size: 1
    - container_id: 1
      host_id: 200000
      size: 65536
  kernel: 4.18.0-277.el8.x86_64
  linkmode: dynamic
  memFree: 21868609536
  memTotal: 269860601856
  ociRuntime:
    name: crun
    package: crun-0.17-1.module_el8.4.0+673+eabfc99d.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 0.17
      commit: 0e9229ae34caaebcb86f1fde18de3acaf18c6d9a
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/<id>/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    selinuxEnabled: false
  slirp4netns:
    executable: /bin/slirp4netns
    package: slirp4netns-1.1.8-1.module_el8.4.0+641+6116a774.x86_64
    version: |-
      slirp4netns version 1.1.8
      commit: d361001f495417b880f20329121e3aa431a8f90f
      libslirp: 4.3.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.1
  swapFree: 137420599296
  swapTotal: 137438949376
  uptime: 58h 9m 49.4s (Approximately 2.42 days)
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /data/<user>/.config/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.4.0-2.module_el8.4.0+673+eabfc99d.x86_64
      Version: |-
        fusermount3 version: 3.2.1
        fuse-overlayfs: version 1.4
        FUSE library version 3.2.1
        using FUSE kernel interface version 7.26
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /data/<user>/ssd/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 39
  runRoot: /run/user/<id>/containers
  volumePath: /data/<user>/ssd/containers/storage/volumes
version:
  APIVersion: 3.0.0
  Built: 1612303593
  BuiltTime: Wed Feb  3 06:06:33 2021
  GitCommit: ""
  GoVersion: go1.15.7
  OsArch: linux/amd64
  Version: 3.0.0-dev

Package info (e.g. output of rpm -q podman or apt list podman):

podman-3.0.0-0.33rc2.module_el8.4.0+673+eabfc99d.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

CentOS Stream 8
i9 10980XE

The text was updated successfully, but these errors were encountered:

lsm5 · 2021-03-22T14:22:55Z

btw, the podman 3.0.0-rc2 mentioned is directly from CentOS Stream repos, not Kubic. @vrothberg @mheon @jnovy do you know what's the status of update on that one?

vrothberg · 2021-03-23T07:32:41Z

btw, the podman 3.0.0-rc2 mentioned is directly from CentOS Stream repos, not Kubic. @vrothberg @mheon @jnovy do you know what's the status of update on that one?

No idea why Stream hasn't seen v3.0.1.

vrothberg · 2021-03-23T07:33:08Z

@Luap99 do you know what's going on with the slirp processes?

Luap99 · 2021-03-23T09:14:26Z

This looks pretty clear to me. The slirp process is terminated with a pipe file descriptor. One end is send to conmon the other to the slirp process. Once conmon exists slirp will exit as well. However the parent process (podman service) is still alive and thus linux is expecting it to read the process status with wait.

I am not sure what a good solution is, maybe fork exec the slirp process?

@giuseppe @AkihiroSuda WDYT?

mheon · 2021-03-23T13:44:54Z

We could consider having podman-system-service set the Subreaper PRCTL? Everything should be a direct child of system service so the zombies will reparent on us and we can wait for them.

mheon · 2021-03-23T13:45:11Z

(Well, everything that would cause this problem, I mean...)

github-actions · 2021-04-23T00:07:12Z

A friendly reminder that this issue had no activity for 30 days.

mheon · 2021-04-23T00:35:50Z

@rhatdan @baude We should probably prioritize this one higher, seems like a significant regression.

rhatdan · 2021-04-23T12:01:19Z

I agree this needs to be fixed ASAP. @Luap99 any chance you can look at this?

github-actions · 2021-05-24T00:05:03Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2021-05-24T13:36:26Z

@Luap99 were you ever able to look at this?

rhatdan · 2021-05-24T13:36:49Z

@mheon do you have time to look at this?

mheon · 2021-05-24T15:17:51Z

I will try and find some time this sprint

Luap99 · 2021-05-24T19:08:45Z

I tried to use something like unix.Wait4(-1, nil, unix.WNOHANG, nil) to wait for all child processes in a extra goroutine. However, this did not work because it causes a race condition with cmd.Wait() from the os/exec package which is used in many places.

github-actions · 2021-06-30T00:03:08Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2021-06-30T13:20:31Z

@Luap99 @mheon @AkihiroSuda @giuseppe @vrothberg We still have this issue, Ideas on a solution?

Luap99 · 2021-07-02T13:57:37Z

I take this.

Add a new service reaper package. Podman currently does not reap all child processes. The slirp4netns and rootlesskit processes are not reaped. The is not a problem for local podman since the podman process dies before the other processes and then init will reap them for us. However with podman system service it is possible that the podman process is still alive after slirp died. In this case podman has to reap it or the slirp process will be a zombie until the service is stopped. The service reaper will listen in an extra goroutine on SIGCHLD. Once it receives this signal it will try to reap all pids that were added with `AddPID()`. While I would like to just reap all children this is not possible because many parts of the code use `os/exec` with `cmd.Wait()`. If we reap before `cmd.Wait()` things can break, so reaping everything is not an option. [NO TESTS NEEDED] Fixes containers#9777 Signed-off-by: Paul Holzinger <[email protected]>

henryhchchc changed the title ~~Zombie slirp4netns processed left on the system when using podman API service~~ Zombie slirp4netns processes left on the system when using podman API service Mar 22, 2021

openshift-ci-robot added the kind/bug label Mar 22, 2021

github-actions bot added the stale-issue label Apr 23, 2021

rhatdan removed the stale-issue label Apr 23, 2021

github-actions bot added the stale-issue label May 24, 2021

rhatdan removed the stale-issue label May 24, 2021

mheon self-assigned this May 24, 2021

github-actions bot added the stale-issue label Jun 30, 2021

rhatdan removed the stale-issue label Jun 30, 2021

Luap99 assigned Luap99 and unassigned mheon Jul 2, 2021

Luap99 mentioned this issue Jul 2, 2021

podman service reaper #10851

Merged

openshift-merge-robot closed this as completed in #10851 Jul 3, 2021

github-actions bot added the locked - please file new issue/PR label Sep 21, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zombie `slirp4netns` processes left on the system when using podman API service #9777

Zombie `slirp4netns` processes left on the system when using podman API service #9777

henryhchchc commented Mar 22, 2021 •

edited

Loading

lsm5 commented Mar 22, 2021

vrothberg commented Mar 23, 2021

vrothberg commented Mar 23, 2021

Luap99 commented Mar 23, 2021 •

edited

Loading

mheon commented Mar 23, 2021

mheon commented Mar 23, 2021

github-actions bot commented Apr 23, 2021

mheon commented Apr 23, 2021

rhatdan commented Apr 23, 2021

github-actions bot commented May 24, 2021

rhatdan commented May 24, 2021

rhatdan commented May 24, 2021

mheon commented May 24, 2021

Luap99 commented May 24, 2021

github-actions bot commented Jun 30, 2021

rhatdan commented Jun 30, 2021

Luap99 commented Jul 2, 2021

Zombie slirp4netns processes left on the system when using podman API service #9777

Zombie slirp4netns processes left on the system when using podman API service #9777

Comments

henryhchchc commented Mar 22, 2021 • edited Loading

lsm5 commented Mar 22, 2021

vrothberg commented Mar 23, 2021

vrothberg commented Mar 23, 2021

Luap99 commented Mar 23, 2021 • edited Loading

mheon commented Mar 23, 2021

mheon commented Mar 23, 2021

github-actions bot commented Apr 23, 2021

mheon commented Apr 23, 2021

rhatdan commented Apr 23, 2021

github-actions bot commented May 24, 2021

rhatdan commented May 24, 2021

rhatdan commented May 24, 2021

mheon commented May 24, 2021

Luap99 commented May 24, 2021

github-actions bot commented Jun 30, 2021

rhatdan commented Jun 30, 2021

Luap99 commented Jul 2, 2021

Zombie `slirp4netns` processes left on the system when using podman API service #9777

Zombie `slirp4netns` processes left on the system when using podman API service #9777

henryhchchc commented Mar 22, 2021 •

edited

Loading

Luap99 commented Mar 23, 2021 •

edited

Loading