This repository provides a wrapper which improves the handling of Docker containers run as systemd
services.
If a Docker container is started as a systemd
service using the "usual" docker run ...
instruction, f.ex.
ExecStart=docker run ...
, systemd
interacts with the Docker client process instead of the container
process, which can lead to situations where systemd
's capacity to monitor process health is affected:
- the client can detach or crash while the container is doing fine, yet
systemd
would trigger failure handling - worse, the container crashes and should be taken care of, but the client stalled -
systemd
is blind and won't do
anything - when a container is stopped with
docker stop ...
, attached client processes exit with error code 143, not 0/success, which triggerssystemd
's failure handling unless it's explicitely configured to ignore this usingSuccessExitStatus=143
, but that's a workaround. The problem is well explained in this issue description
The key thing that this wrapper does is that it moves the container process from the cgroups set up by Docker
to the service unit's cgroup to give systemd
the supervision of the actual Docker container process.
It's written in Golang and allows to leverage all the cgroup functionality of systemd
and systemd-notify
.
- the code was written by @ibuildthecloud and his co-contributors in this repository. The motivation is explained in this Docker issue #6791 and this mailing list thread.
- @agend07 and co-contributors fixed outdated dependancies and did a first clean-up
- I removed all outdated and broken elements and created a new compilation docker container which can be found here
Supposing that a Go environment is available, the build instruction is go get github.com/DonTseTse/systemd-docker
. The
executable can then be found in the Go binary directory (usually something like $GO_ROOT/bin
) and it's called
systemd-docker
.
It can also be build using a stand-alone docker image, see here
Both
systemctl
to managesystemd
services, and- the
docker
CLI
can be used and everything should stay in sync.
In the systemd
unit files, the instruction to launch the Docker container takes the form
ExecStart=/path/to/systemd-docker [<systemd-docker_options>] run <docker-run_parameters>
where
/path/to/systemd-docker
is the absolute path of thesystemd-docker
executable<systemd-docker_options>
are the flags to configure systemd-docker<docker-run_parameters>
are forwarded todocker run
. A few restrictions apply, see section Docker run restrictions
The example below shows a typical systemd
unit file using systemd-docker
(supposed to be in /usr/bin
), running a
Nginx container:
[Unit]
Description=Nginx
After=docker.service
Requires=docker.service
[Service]
#--- if systemd-notify is used
Type=notify
NotifyAccess=all
#------------------------
ExecStart=/usr/bin/systemd-docker run --rm --name %n nginx
Restart=always
RestartSec=10s
TimeoutStartSec=120
TimeoutStopSec=15
[Install]
WantedBy=multi-user.target
The use of %n
is a systemd
feature explained here. Supposing that the unit file example
given above is stored under the likely path /etc/systemd/system/nginx.service
, the container is named nginx.
For the details about Type=notify
and NotifyAccess=all
and systemd-notify
, see
systemd notifications.
For a general documentation of all systemd
unit file configurations
options, see this documentation.
Container names are compulsory to make sure that each systemd
service always relates to/acts upon the same container(s).
While it may seem as if that could be omitted as long as the --rm
flag is used to make Docker remove any stopped
container, that's misleading: the deletion process triggered by this flag is actually part of the Docker client logic and
if the client detaches for whatever reason from the running container, the information is lost (even if another client is
re-attached later) and the container will not be deleted upon termination. systemd-docker
adds an additional check
and looks for the named container when systemd-docker ... run ...
is called - if a stopped container exists, it's removed.
While it processes unit files, systemd
populates a range of variables among which %n
stands for the name of service,
derived from it's filename. This allows to write a self-configuring ExecStart
instruction using the parameters
ExecStart=/path/to/systemd-docker ... run ... --name %n --rm ...
systemd
handles environment variables with the instructions Environment=...
and EnvironmentFile=...
. To inject
variables into other instructions, the pattern is ${variable_name}. With the docker run
flag -e
they can be passed
from systemd
to the Docker container
Example: ExecStart=/path/to/systemd-docker ... run -e ABC=${ABC} -e XYZ=${XYZ} ...
systemd-docker
has an option to pass on all defined environment variables using the --env
flag, explained
here
systemd-notify
can be used to schedule and sequence the launch of different services. The systemd
documentation explains the configuration optionss
available in unit files:
Type=notify
: "... it is expected that the daemon sends a notification message via sd_notify(3) or an equivalent call when it has finished starting up. systemd will proceed with starting follow-up units after this notification message has been sent."NotifyAccess=all
: "Controls access to the service status notification socket, as accessible via the sd_notify(3) call. ... If all, all services updates from all members of the service's control group are accepted."
By default systemd-docker
will send READY=1 to the systemd
notification socket but it can also be configured to delegate
this to the container as explained here.
Please be aware that systemd-notify
comes with its own quirks - more info can be found in this
mailing list thread. In short, systemd-notify
is not reliable because often
the child dies before systemd
has time to determine which cgroup it is a member of.
By default all application cgroups are moved to systemd. It's also possible to control individually which cgroups are
transfered using a --cgroups
flags for each cgroup to transfer. -cgroups name=systemd
is the strict minimum to have
systemd
supervise the container.
This implies that the docker run
flags --cpuset
and/or -m
are incompatible.
Example: ExecStart=/path/to/systemd-docker ... --cgroups name=systemd --cgroups=cpu ... run ...
The above command will use the name=systemd
and cpu
cgroups of systemd but then use Docker's cgroups for all the
others, like the freezer cgroup.
By default the container's stdout/stderr is written to the system journal. This may be disabled with --logs=false
.
Example: ExecStart=/path/to/systemd-docker ... --logs=false ... run ...
The systemd
environment variables are automatically passed through to the Docker container if the --env
flag is set.
It will essentially read all the current environment variables and add the appropriate -e ...
flags to the
docker run
command.
EnvironmentFile=/etc/environment
ExecStart=systemd-docker ... --env ... run ...
In the example above, all environment variables defined in /etc/environment
will be passed to the docker run
command.
To create a PID file for the container, use the flag --pid-file=</path/to/pid_file>
.
Example: ExecStart=/path/to/systemd-docker ... --pid-file=/var/run/%n.pid ... run ...
The systemd-docker
flag --notify
makes systemd-docker
delegate the systemd-notify
READY=1
call to the container
itself. To allow the container to achieve this, systemd-docker
bind mounts the systemd
notification socket into the
container and sets the NOTIFY_SOCKET environment variable.
Example: ExecStart=/path/to/systemd-docker ... --notify ... run ...
To disable systemd-docker
's "remove stopped container" procedure, the flag ... --rm=false ...
can be used.
Example: ExecStart=/path/to/systemd-docker ... --rm=false ... run ...
These flags can't be used because they are incompatible with the cgroup migration(s) inherent to systemd-docker
.
The -d
flag provided to docker run
has no effect under systemd-docker
. To cause the Docker client to detach after the container is running, use
the systemd-docker
options --logs=false --rm=false
. If either --logs
or --rm
is true, the Docker client instance used by systemd-docker
is kept
alive until the systemd
service is stopped or the container exits.
CentOS 7 is inconsistent in the way it handles some cgroups. It has 3:cpuacct,cpu:/user.slice
in /proc/[pid]/cgroups
but the corresponding path
/sys/fs/cgroup/cpu,cpuacct/
doesn't exist. This causes systemd-docker
to fail when it tries to move the PIDs there. To solve this the name=systemd
cgroup must be explicitely mentioned:
/path/to/systemd-docker ... --cgroups name=systemd ... run ...
See ibuildthecloud#15 for details.
See repository history and credits for acknowledgments. The work on this repository was done in 2018 by DonTseTse.
Licensed under the Apache License, Version 2.0