Skip to content
This repository has been archived by the owner on Oct 16, 2020. It is now read-only.

Zombie processes on 1688.5.3 #2410

Closed
louismunro opened this issue Apr 18, 2018 · 2 comments
Closed

Zombie processes on 1688.5.3 #2410

louismunro opened this issue Apr 18, 2018 · 2 comments

Comments

@louismunro
Copy link

louismunro commented Apr 18, 2018

Issue Report

Zombie processes (sshd) increasing until no new process can be created.

Bug

I have a container running the alpine-sshd image (running openssh 7.5) used to transfer some files which are then written to an EFS volume. Every time a connection to its sshd is opened and then closed a zombie sshd process is left behind.
I have reproduced the problem using another alpine based image running openssh 7.7 as well as a centos based image running openssh 7.4.

Furthermore, this problem does not exist for the same images on previous coreos versions, as tested using 1520.8.0 (ami-a89d3ad2 in us-east1).

Container Linux Version

NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1688.5.3
VERSION_ID=1688.5.3
BUILD_ID=2018-04-03-0547
PRETTY_NAME="Container Linux by CoreOS 1688.5.3 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"

Environment

AWS EC2 (ami-3f061b45 in us-east1)
Kubernetes 1.7.10
nodeInfo:
architecture: amd64
containerRuntimeVersion: docker://17.12.1-ce
kernelVersion: 4.14.32-coreos
kubeProxyVersion: v1.7.10
kubeletVersion: v1.7.10
operatingSystem: linux
osImage: Container Linux by CoreOS 1688.5.3 (Rhyolite)

Expected Behavior

sshd should reap it's children and the zombie count should stay at or near 0.

Actual Behavior

sshd defunct processes accumulate in the container for every closed session.

Reproduction Steps

  1. kubectl run testzombies --image sickp/alpine-sshd (or run in docker directly. Feel free to build your own image. I tested that too.)
  2. connect to the sshd daemon (even an incorrect login will trigger this)
  3. Inside the container, check the status for the sshd processes. There will be a zombie for each closed connection.

Killing the container will reap the zombies created by sshd.

Other Information

As mentioned above this does not happen for the same container image running in CoreOS 1520.8.0.
I can't tell if the issue is with docker or CoreOS, but I believe that since the parent sshd process presumably "wait"s correctly for it's children, the kernel must be broken.

@euank
Copy link
Contributor

euank commented Apr 18, 2018

This is actually a kinda weird Kubernetes interaction, not a change in docker or the kernels' behaviour (I think).

On both AMIs referenced, the following will produce zombies with that container:

$ docker run --name pause -d gcr.io/google_containers/pause-amd64:3.0
$ docker run --pid=container:pause --rm --publish=2222:22 -it sickp/alpine-sshd:7.5-r2

$ for i in $(seq 1 10); do ssh -p 2222 -o BatchMode=yes -o StrictHostKeyChecking=no root@localhost; done
$ ps aux | grep sshd 
# many defunct processes

What's I suspect you're observing here is that the Kubernetes shared-pid feature turns itself on when it detects the Docker version is >= 1.13.1, so on the newer AMI Kubernetes is launching pods in a different way (similar to the above).

This can be most easily worked around in one of the following ways:

  1. Update the pause container to gcr.io/google_containers/pause-amd64:3.1 to get this code change; this can be configured with the --pod-infra-container-image flag.
  2. Update to k8s 1.10 where the above pause container was made the default.
  3. Pass --docker-disable-shared-pid to the kubelet to opt out of the different behaviour

@louismunro
Copy link
Author

Looks like you are right. Upgrading the pause container fixes this.
I had always assumed (incorrectly) that the pause container reaped defunct processes.
I hadn't realized that that was not the case prior to version 3.1.

Thank you for your help.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants