Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support systemd in containers with podman-style --systemd flag #2785

Merged
merged 1 commit into from
Feb 5, 2024

Conversation

sazzy4o
Copy link
Contributor

@sazzy4o sazzy4o commented Feb 1, 2024

Adds support for systemd to nerdctl with a --systemd based on the flag used in podman

Fixes #2784

Usage:

$ sudo nerdctl run --systemd=always --rm -it registry.hub.docker.com/sazzy4o/build
:systemd
systemd v246.15-1.fc33 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
Detected virtualization container-other.
Detected architecture x86-64.

Welcome to Fedora 33 (Container Image)!

Set hostname to <9eff88d04af6>.
Queued start job for default target Graphical Interface.
[  OK  ] Created slice Slice /system/getty.
[  OK  ] Created slice Slice /system/modprobe.
[  OK  ] Created slice User and Session Slice.
[  OK  ] Started Dispatch Password Requests to Console Directory Watch.
[  OK  ] Started Forward Password Requests to Wall Directory Watch.
[  OK  ] Reached target Local File Systems.
[  OK  ] Reached target Network is Online.
[  OK  ] Reached target Paths.
[  OK  ] Reached target Remote File Systems.
[  OK  ] Reached target Slices.
[  OK  ] Reached target Swap.
[  OK  ] Listening on Process Core Dump Socket.
[  OK  ] Listening on initctl Compatibility Named Pipe.
[  OK  ] Listening on Journal Socket (/dev/log).
[  OK  ] Listening on Journal Socket.
[  OK  ] Listening on User Database Manager Socket.
         Starting Rebuild Dynamic Linker Cache...
         Starting Journal Service...
         Starting Create System Users...
[  OK  ] Finished Rebuild Dynamic Linker Cache.
[  OK  ] Started Journal Service.
         Starting Flush Journal to Persistent Storage...
[  OK  ] Finished Create System Users.
[  OK  ] Finished Flush Journal to Persistent Storage.
         Starting Create Volatile Files and Directories...
[  OK  ] Finished Create Volatile Files and Directories.
         Starting Rebuild Journal Catalog...
         Starting Network Name Resolution...
         Starting Update UTMP about System Boot/Shutdown...
[  OK  ] Finished Update UTMP about System Boot/Shutdown.
[  OK  ] Finished Rebuild Journal Catalog.
         Starting Update is Completed...
[  OK  ] Finished Update is Completed.
[  OK  ] Reached target System Initialization.
[  OK  ] Started dnf makecache --timer.
[  OK  ] Started Daily rotation of log files.
[  OK  ] Started Daily Cleanup of Temporary Directories.
[  OK  ] Reached target Timers.
[  OK  ] Listening on D-Bus System Message Bus Socket.
[  OK  ] Reached target Sockets.
[  OK  ] Reached target Basic System.
         Starting MariaDB 10.4 database server...
         Starting The PHP FastCGI Process Manager...
         Starting Home Area Manager...
         Starting User Login Management...
         Starting Permit User Sessions...
         Starting D-Bus System Message Bus...
[  OK  ] Started Network Name Resolution.
[  OK  ] Reached target Host and Network Name Lookups.
         Starting The nginx HTTP and reverse proxy server...
[  OK  ] Finished Permit User Sessions.
[  OK  ] Started D-Bus System Message Bus.
[  OK  ] Started Console Getty.
[  OK  ] Reached target Login Prompts.
[  OK  ] Started Home Area Manager.
[  OK  ] Started User Login Management.
[  OK  ] Started The PHP FastCGI Process Manager.
[  OK  ] Started The nginx HTTP and reverse proxy server.
[  OK  ] Started MariaDB 10.4 database server.
[  OK  ] Reached target Multi-User System.
[  OK  ] Reached target Graphical Interface.
         Starting Update UTMP about System Runlevel Changes...
[  OK  ] Finished Update UTMP about System Runlevel Changes.

Fedora 33 (Container Image)
Kernel 6.5.0-14-generic on an x86_64 (console)

9eff88d04af6 login: 

@@ -184,6 +184,7 @@ func setCreateFlags(cmd *cobra.Command) {
cmd.Flags().StringSlice("cap-drop", []string{}, "Drop Linux capabilities")
cmd.RegisterFlagCompletionFunc("cap-drop", capShellComplete)
cmd.Flags().Bool("privileged", false, "Give extended privileges to this container")
cmd.Flags().String("systemd", "false", "Enable systemd integration (default: false)")
Copy link
Contributor Author

@sazzy4o sazzy4o Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This differs from the podman flag, which has a default value of true

@@ -173,7 +175,6 @@ func Create(ctx context.Context, client *containerd.Client, args []string, netMa
return nil, nil, err
}
cOpts = append(cOpts, restartOpts...)
cOpts = append(cOpts, withStop(options.StopSignal, options.StopTimeout, ensuredImage))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

systemdPaths := []string{
"/usr/sbin/init",
"/sbin/init",
"/usr/local/sbin/init",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any image that has /usr/local/sbin/init?

Copy link
Contributor Author

@sazzy4o sazzy4o Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set these to be the same as podman:

true enables systemd mode only when the command executed inside the container is systemd, /usr/sbin/init, /sbin/init or /usr/local/sbin/init.

(But, most images I have seen use /sbin/init)

https://docs.podman.io/en/latest/markdown/podman-run.1.html#systemd-true-false-always


systemdPaths := []string{
"/usr/sbin/init",
"/sbin/init",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if /sbin should have higher priority over /usr/sbin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, /sbin/init is more common. I'll move higher


// See: https://github.com/containers/podman/issues/15878
if !privilegedWithoutHostDevices {
return nil, nil, errors.New("If --privileged is used with systemd `--security-opt privileged-without-host-devices` must also be used")
Copy link
Contributor Author

@sazzy4o sazzy4o Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the podman implementation all /dev/tty* devices are unmounted to prevent causing host to crash

I could not find an easy to achieve this, so instead I return an error (also prevents causing host to crash). If that functionality is required, maybe it can be added in a future PR

@@ -27,6 +27,7 @@ import (
"path"
"path/filepath"
"runtime"
"slices"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work with Go 1.20.
I guess we can drop the support for Go 1.20 though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch to using or statement to maintain compatibility with 1.20

@@ -232,6 +232,8 @@ Security flags:
- :whale: `--cap-add=<CAP>`: Add Linux capabilities
- :whale: `--cap-drop=<CAP>`: Drop Linux capabilities
- :whale: `--privileged`: Give extended privileges to this container
- :nerd_face: `--systemd=(true|false|always)`: Enable systemd compatibility (default: false).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does always differ from true?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some more to docs about the options and added a note to nerdctl specific features in README:
https://github.com/containerd/nerdctl/pull/2785/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R206

{Type: "tmpfs", Source: "tmpfs", Destination: "/var/lib/journal"},
}),
)
stopSignal = "SIGRTMIN+3"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIGTERM causes restart in systemd (This functionality is the same as podman)

See:
https://www.freedesktop.org/software/systemd/man/latest/systemd.html#Signals

@AkihiroSuda
Copy link
Member

Thanks,

  • please add tests
  • please squash commits

@AkihiroSuda AkihiroSuda added this to the v2.0.0 milestone Feb 1, 2024
@AkihiroSuda AkihiroSuda added the enhancement New feature or request label Feb 1, 2024
} else if len(ensured.ImageConfig.Cmd) > 0 {
entrypointPath = ensured.ImageConfig.Cmd[0]
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be an easier way to determine the entrypoint executable path, if there is I am open to updating

.github/workflows/test.yml Outdated Show resolved Hide resolved
@@ -84,5 +84,6 @@ while [ $quit -ne 1 ]; do
done
echo "signal quit"`).AssertOK()
base.Cmd("stop", testContainerName).AssertOK()
base.Cmd("inspect", "--format", "{{json .Config.Labels}}", testContainerName).AssertOutContains("SIGQUIT")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saw that this test was not checking container labels, so added assert

@@ -41,6 +41,7 @@ var (
DockerAuthImage = mirrorOf("cesanta/docker_auth:1.7")
FluentdImage = mirrorOf("fluent/fluentd:v1.14-1")
KuboImage = mirrorOf("ipfs/kubo:v0.16.0")
SystemdImage = "ghcr.io/containerd/stargz-snapshotter:0.15.1-kind"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those this image since it has a working systemd and is controlled by containerd

(with --systemd flag)

Signed-off-by: Spencer von der Ohe <[email protected]>
@sazzy4o
Copy link
Contributor Author

sazzy4o commented Feb 4, 2024

@AkihiroSuda I have added some tests and squashed the commits. Could you please have another look?

Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@AkihiroSuda AkihiroSuda merged commit 9b76bcc into containerd:main Feb 5, 2024
22 checks passed
}

opts = append(opts,
oci.WithoutMounts("/sys/fs/cgroup"),
Copy link

@jfernandez jfernandez Nov 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sazzy4o I found your change while looking into supporting containers with systemd using k8s + containerd. I did the tmpfs mounts for /run, /tmp/, etc., but I was mounting the host's /sys/fs/cgroup as ready-only, which didn't work.

Here, you are removing the mount, which caught my attention. Is this so that systemd creates /sys/fs/cgroup when it initializes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jfernandez Yes, this allow systemd to run inside the container and create /sys/fs/cgroup

This was based on the podman --systemd flag

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support running systemd in a container
3 participants