You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 16, 2020. It is now read-only.
Zombie processes (sshd) increasing until no new process can be created.
Bug
I have a container running the alpine-sshd image (running openssh 7.5) used to transfer some files which are then written to an EFS volume. Every time a connection to its sshd is opened and then closed a zombie sshd process is left behind.
I have reproduced the problem using another alpine based image running openssh 7.7 as well as a centos based image running openssh 7.4.
Furthermore, this problem does not exist for the same images on previous coreos versions, as tested using 1520.8.0 (ami-a89d3ad2 in us-east1).
Container Linux Version
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1688.5.3
VERSION_ID=1688.5.3
BUILD_ID=2018-04-03-0547
PRETTY_NAME="Container Linux by CoreOS 1688.5.3 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"
Environment
AWS EC2 (ami-3f061b45 in us-east1)
Kubernetes 1.7.10
nodeInfo:
architecture: amd64
containerRuntimeVersion: docker://17.12.1-ce
kernelVersion: 4.14.32-coreos
kubeProxyVersion: v1.7.10
kubeletVersion: v1.7.10
operatingSystem: linux
osImage: Container Linux by CoreOS 1688.5.3 (Rhyolite)
Expected Behavior
sshd should reap it's children and the zombie count should stay at or near 0.
Actual Behavior
sshd defunct processes accumulate in the container for every closed session.
Reproduction Steps
kubectl run testzombies --image sickp/alpine-sshd (or run in docker directly. Feel free to build your own image. I tested that too.)
connect to the sshd daemon (even an incorrect login will trigger this)
Inside the container, check the status for the sshd processes. There will be a zombie for each closed connection.
Killing the container will reap the zombies created by sshd.
Other Information
As mentioned above this does not happen for the same container image running in CoreOS 1520.8.0.
I can't tell if the issue is with docker or CoreOS, but I believe that since the parent sshd process presumably "wait"s correctly for it's children, the kernel must be broken.
The text was updated successfully, but these errors were encountered:
This is actually a kinda weird Kubernetes interaction, not a change in docker or the kernels' behaviour (I think).
On both AMIs referenced, the following will produce zombies with that container:
$ docker run --name pause -d gcr.io/google_containers/pause-amd64:3.0
$ docker run --pid=container:pause --rm --publish=2222:22 -it sickp/alpine-sshd:7.5-r2
$ foriin$(seq 1 10);do ssh -p 2222 -o BatchMode=yes -o StrictHostKeyChecking=no root@localhost;done
$ ps aux | grep sshd
# many defunct processes
What's I suspect you're observing here is that the Kubernetes shared-pid feature turns itself on when it detects the Docker version is >= 1.13.1, so on the newer AMI Kubernetes is launching pods in a different way (similar to the above).
This can be most easily worked around in one of the following ways:
Update the pause container to gcr.io/google_containers/pause-amd64:3.1 to get this code change; this can be configured with the --pod-infra-container-image flag.
Update to k8s 1.10 where the above pause container was made the default.
Pass --docker-disable-shared-pid to the kubelet to opt out of the different behaviour
Looks like you are right. Upgrading the pause container fixes this.
I had always assumed (incorrectly) that the pause container reaped defunct processes.
I hadn't realized that that was not the case prior to version 3.1.
Issue Report
Zombie processes (sshd) increasing until no new process can be created.
Bug
I have a container running the alpine-sshd image (running openssh 7.5) used to transfer some files which are then written to an EFS volume. Every time a connection to its sshd is opened and then closed a zombie sshd process is left behind.
I have reproduced the problem using another alpine based image running openssh 7.7 as well as a centos based image running openssh 7.4.
Furthermore, this problem does not exist for the same images on previous coreos versions, as tested using 1520.8.0 (ami-a89d3ad2 in us-east1).
Container Linux Version
Environment
AWS EC2 (ami-3f061b45 in us-east1)
Kubernetes 1.7.10
nodeInfo:
architecture: amd64
containerRuntimeVersion: docker://17.12.1-ce
kernelVersion: 4.14.32-coreos
kubeProxyVersion: v1.7.10
kubeletVersion: v1.7.10
operatingSystem: linux
osImage: Container Linux by CoreOS 1688.5.3 (Rhyolite)
Expected Behavior
sshd should reap it's children and the zombie count should stay at or near 0.
Actual Behavior
sshd defunct processes accumulate in the container for every closed session.
Reproduction Steps
Killing the container will reap the zombies created by sshd.
Other Information
As mentioned above this does not happen for the same container image running in CoreOS 1520.8.0.
I can't tell if the issue is with docker or CoreOS, but I believe that since the parent sshd process presumably "wait"s correctly for it's children, the kernel must be broken.
The text was updated successfully, but these errors were encountered: