From f284ad3d1b3b66189c1a02d763f04c66734da3d8 Mon Sep 17 00:00:00 2001 From: Fraser Tweedale Date: Fri, 10 Sep 2021 20:28:44 +1000 Subject: [PATCH] specify cgroup ownership semantics cgroups v2 supports secure delegation of cgroups. Accordingly, control over a cgroup (that is, creation of new child cgroups and movement of processes and threads among the cgroup subtree exposed to a container) can be safely delegated to a container. Adjusting the ownership enables real-world use cases like systemd-based containers fully isolated in user namespaces. To encourage adoption of this feature, and secure implementation, define the semantics of cgroup ownership. Changing/setting the cgroup ownership is only allowed on cgroups v2, and the specific files whose ownership can be change are mentioned. In terms of current practice, this is already the behaviour of crun (which also chown's the memory.oom.group file), and there is a pull request for runc: https://github.com/opencontainers/runc/pull/3057 (the behaviour is enabled by an annotation). Signed-off-by: Fraser Tweedale --- config-linux.md | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/config-linux.md b/config-linux.md index 37ea951f7..8cfdf4452 100644 --- a/config-linux.md +++ b/config-linux.md @@ -196,6 +196,27 @@ For example, to run a new process in an existing container without updating limi Runtimes MAY attach the container process to additional cgroup controllers beyond those necessary to fulfill the `resources` settings. +### Cgroup ownership + +Runtimes MAY change (or cause to be changed) the owner of the +container's cgroup to the host uid that maps to uid 0 in the +container's user namespace, if cgroups v2 is in use. + +Runtimes MUST NOT change the ownership of container cgroups when +cgroups v1 is in use. Cgroup delegation is not secure in cgroups +v1. + +Runtimes MUST only change the ownership of the container's cgroup +directory and the following files within that directory, and MUST +NOT change the ownership of any other files: + +- `cgroup.procs` +- `cgroup.subtree_control` +- `cgroup.threads` + +Changing other files may allow the container to elevate its own +resource limits or perform other unwanted behaviour. + ### Example ```json