Skip to content
This repository has been archived by the owner on Jan 31, 2025. It is now read-only.

High CPU usage on zombie agetty processes when using with molecule #9

Closed
percygrunwald opened this issue Feb 6, 2019 · 1 comment
Closed

Comments

@percygrunwald
Copy link
Contributor

percygrunwald commented Feb 6, 2019

This is an issue I noticed with the docker-ubuntu1804-ansible, docker-ubuntu1604-ansible, docker-debian8-ansible and docker-debian9-ansible images. When launching multiple containers at the same time with molecule create, the four distros I mentioned have very high CPU usage caused by the agetty process:

top - 09:56:06 up 7 min,  0 users,  load average: 18.23, 6.93, 2.59
Tasks:  11 total,   6 running,   5 sleeping,   0 stopped,   0 zombie
%Cpu(s): 41.7 us, 58.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  2046748 total,    93972 free,   308028 used,  1644748 buff/cache
KiB Swap:  1048572 total,  1048032 free,      540 used.  1490492 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
   66 root      20   0   13020   1792   1656 R  21.9  0.1   0:29.76 agetty
   64 root      20   0   13020   1804   1672 R  21.2  0.1   0:30.83 agetty
   67 root      20   0   13020   1764   1628 R  21.2  0.1   0:31.02 agetty
   65 root      20   0   13020   1812   1684 R  18.9  0.1   0:30.20 agetty
   68 root      20   0   13020   1876   1748 R  18.2  0.1   0:30.11 agetty
   78 root      20   0   36644   3044   2612 R   0.3  0.1   0:00.01 top
    1 root      20   0   37040   5048   3988 S   0.0  0.2   0:00.13 systemd
   23 root      20   0   35268   6764   6480 S   0.0  0.3   0:00.06 systemd-jo+
   31 systemd+  20   0  100320   2504   2296 S   0.0  0.1   0:00.01 systemd-ti+
   37 syslog    20   0  256388   3100   2692 S   0.0  0.2   0:00.01 rsyslogd
   76 root      20   0   13020   1804   1676 S   0.0  0.1   0:00.00 agetty

This issue seems to have popped up around the internet in various places:

Environment

macOS 10.14 (18A391)
Docker Desktop for Mac Version 2.0.0.2
Docker Engine 18.09.1

Steps to reproduce

Create an empty role with molecule init and edit the default molecule.yml to include platforms like this:

# molecule/defaults/molecule.yml
...
platforms:
  - name: debian9
    image: "geerlingguy/docker-debian9-ansible:latest"
    command: ""
    volumes:
      - /sys/fs/cgroup:/sys/fs/cgroup:ro
    privileged: true
    pre_build_image: true
  - name: debian8
    image: "geerlingguy/docker-debian8-ansible:latest"
    command: ""
    volumes:
      - /sys/fs/cgroup:/sys/fs/cgroup:ro
    privileged: true
    pre_build_image: true
  - name: ubuntu1804
    image: "geerlingguy/docker-ubuntu1804-ansible:latest"
    command: ""
    volumes:
      - /sys/fs/cgroup:/sys/fs/cgroup:ro
    privileged: true
    pre_build_image: true
  - name: ubuntu1604
    image: "geerlingguy/docker-ubuntu1604-ansible:latest"
    command: ""
    volumes:
      - /sys/fs/cgroup:/sys/fs/cgroup:ro
    privileged: true
    pre_build_image: true
  - name: centos7
    image: "geerlingguy/docker-centos7-ansible:latest"
    command: ""
    volumes:
      - /sys/fs/cgroup:/sys/fs/cgroup:ro
    privileged: true
    pre_build_image: true
  - name: centos6
    image: "geerlingguy/docker-centos6-ansible:latest"
    command: ""
    volumes:
      - /sys/fs/cgroup:/sys/fs/cgroup:ro
    privileged: true
    pre_build_image: true
...

Run molecule create and once the containers have been launched, the com.docker.hyperkit process will max out all CPUs allocated. Running top on the Debian-based containers (u1804, u1604, d9, d8) shows multiple agetty processes chewing up CPU.

Note that this issue only seems to happen when the number of platforms is >2 and includes Debian-based images.

Expected behavior

It should be possible to launch all those containers with molecule create without the high CPU usage.

Proposed solution

I tested one of the proposed solutions from this comment:

Here's one workaround for this. Sharing what I've learned.
Adding the following to the docker file

rm -f /lib/systemd/system/systemdudev ;
rm -f /lib/systemd/system/getty.target;

causes both the runnaway agetty and the spike in systemd-udevd processes to go away. I understand there is no need for udev or getty in containers.

I did also try:

RUN systemctl disable getty.target
RUN systemctl disable systemd-udevd.service

and although the Dockerfile built fine, all the getty and udev services were still running as they were previously. So those two "systemctl disable" appear to be no-ops. Perhaps they get run before /sbin/init (systemd) is invoked.

I added a Dockerfile.j2 to the default scenario:

# molecule/default/Dockerfile.j2

FROM {{ item.image }}

RUN rm -f /lib/systemd/system/systemd*udev* \
  && rm -f /lib/systemd/system/getty.target

Then updating the molecule.yml to use pre_build_image: false:

# molecule/default/molecule.yml
...
platforms:
  - name: debian9
    image: "geerlingguy/docker-debian9-ansible:latest"
    command: ""
    volumes:
      - /sys/fs/cgroup:/sys/fs/cgroup:ro
    privileged: true
    pre_build_image: false
  - name: debian8
    image: "geerlingguy/docker-debian8-ansible:latest"
    command: ""
    volumes:
      - /sys/fs/cgroup:/sys/fs/cgroup:ro
    privileged: true
    pre_build_image: false
  - name: ubuntu1804
    image: "geerlingguy/docker-ubuntu1804-ansible:latest"
    command: ""
    volumes:
      - /sys/fs/cgroup:/sys/fs/cgroup:ro
    privileged: true
    pre_build_image: false
  - name: ubuntu1604
    image: "geerlingguy/docker-ubuntu1604-ansible:latest"
    command: ""
    volumes:
      - /sys/fs/cgroup:/sys/fs/cgroup:ro
    privileged: true
    pre_build_image: false
  - name: centos7
    image: "geerlingguy/docker-centos7-ansible:latest"
    command: ""
    volumes:
      - /sys/fs/cgroup:/sys/fs/cgroup:ro
    privileged: true
    pre_build_image: false
  - name: centos6
    image: "geerlingguy/docker-centos6-ansible:latest"
    command: ""
    volumes:
      - /sys/fs/cgroup:/sys/fs/cgroup:ro
    privileged: true
    pre_build_image: false
...

If I run molecule create with the platforms above, it completely resolves the high CPU usage issue.

I tested running molecule test with these 6 images concurrently on your ansible-role-ntp Galaxy role and everything passed.

This issue most likely won't affect your Travis CI setup for your Ansible roles, since the concurrency is coming from multiple Travis runners calling molecule on a single platform. However, this issue may present itself for anyone using these images with platforms >2, such as rapid local development running molecule converge against multiple distros.

I'm happy to test this change against your other roles and create a PR if it's something that you think is acceptable.

Thanks

@geerlingguy
Copy link
Owner

Fixed via PR #10

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants