-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix cgroup2 mount for rootless case #2818
Fix cgroup2 mount for rootless case #2818
Conversation
Not sure if this is rc94 of post-1.0 material -- WDYT @cyphar @mrunalp @AkihiroSuda ? |
Good fit for rc94 IMO |
df16ed8
to
08d657e
Compare
Checked that the added test case is failing without the fix: $ sudo ROOTLESS_TESTPATH=/cgroups.bats ./tests/rootless.sh
...
not ok 13 runc run (cgroupv2 mount inside container)
# (in test file tests/integration/cgroups.bats, line 296)
# `[ "$(wc -l <<<"$output")" -eq 1 ]' failed
# runc spec --rootless (status=0):
#
# runc run -d --console-socket /tmp/bats-run-3210676/runc.4mlpW2/tty/sock test_cgroups_unified (status=0):
#
# runc exec test_cgroups_unified find /sys/fs/cgroup/ -type d (status=0):
# /sys/fs/cgroup/
# /sys/fs/cgroup/user.slice
# /sys/fs/cgroup/user.slice/user-1000.slice
# /sys/fs/cgroup/user.slice/user-1000.slice/[email protected]
# /sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/vte-spawn-98b64ba6-eca3-41f7-b7e4-2f94a0613230.scope
# /sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/gvfs-goa-volume-monitor.service
# /sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/vte-spawn-a5d79e49-117d-4728-b261-ccba710e136d.scope
... |
08d657e
to
9accc2f
Compare
OK, I have totally missed the fact that rootless spec removes the /sys/fs/cgroup mount and therefore my code doesn't work (it worked locally because I have /sys/fs/cgroup mount in my spec). This needs more work. |
9accc2f
to
c0b2be6
Compare
Failure in centos 7 rootful fs test:
Looks unrelated but worrisome. |
Fedora VM fails to run dnf transaction:
CI restarted |
@cyphar @AkihiroSuda @mrunalp PTAL |
I can't repro the issue with the current master (497cd0c).
|
@kolyshkin PTAL↑ |
Need more time to look into this. Removed rc94 milestone for now. |
Need more time to have another look into this; converting to draft for now. |
@AkihiroSuda I There are other similar cases (for example, in case we use host pidns, the /proc mounts needs to be a bind mount, not the fstype=proc one), and ultimately the question is -- do we want runc to handle this (modify/interpret the mounts accordingly), or do we want the upper layers to provide the correct OCI runtime spec for these cases (the third way, I guess, is to overload I think this was discussed before and @crosbymichael was for literal spec interpretation (i.e. runc should not adopt mounts), and yet in some cases runc already does that (don't have an example from the top of my mind, will look it up later). |
@AkihiroSuda is it crucial that we use upper-level tools to repro? I don't know how to correctly make them add userns into OCI spec (so we can hit the codepath added by 9c81440). I can easily reproduce this with bare runc (see last two commits). Here is a full manual repro (Fedora 33, cgroup v2): [kir@kir-rhat runc]$ mkdir foo
[kir@kir-rhat runc]$ cd foo
[kir@kir-rhat foo]$ mkdir rootfs
[kir@kir-rhat foo]$ eval `../tests/integration/get-images.sh`
[kir@kir-rhat foo]$ tar -C rootfs -xf $BUSYBOX_IMAGE
[kir@kir-rhat foo]$ ../runc spec --rootless
[kir@kir-rhat foo]$ ../runc run xx4
/ # ls -l /sys/fs/cgroup
total 0
-r--r--r-- 1 nobody nobody 0 Apr 19 23:56 cgroup.controllers
-rw-r--r-- 1 nobody nobody 0 Apr 19 23:56 cgroup.max.depth
-rw-r--r-- 1 nobody nobody 0 Apr 19 23:56 cgroup.max.descendants
-rw-rw-r-- 1 nobody nobody 0 Apr 6 22:02 cgroup.procs
-r--r--r-- 1 nobody nobody 0 Apr 19 23:56 cgroup.stat
-rw-r--r-- 1 nobody nobody 0 Apr 15 00:13 cgroup.subtree_control
-rw-r--r-- 1 nobody nobody 0 Apr 19 23:56 cgroup.threads
-rw-r--r-- 1 nobody nobody 0 Apr 19 23:56 cpu.pressure
-r--r--r-- 1 nobody nobody 0 Apr 19 23:56 cpu.stat
-r--r--r-- 1 nobody nobody 0 Apr 19 23:56 cpuset.cpus.effective
-r--r--r-- 1 nobody nobody 0 Apr 19 23:56 cpuset.mems.effective
drwxr-xr-x 2 nobody nobody 0 Mar 5 00:53 init.scope
-rw-r--r-- 1 nobody nobody 0 Apr 19 23:56 io.cost.model
-rw-r--r-- 1 nobody nobody 0 Apr 19 23:56 io.cost.qos
-rw-r--r-- 1 nobody nobody 0 Apr 19 23:56 io.pressure
-r--r--r-- 1 nobody nobody 0 Apr 19 23:56 io.stat
drwxr-xr-x 2 nobody nobody 0 Apr 6 20:35 machine.slice
-r--r--r-- 1 nobody nobody 0 Apr 19 23:56 memory.numa_stat
-rw-r--r-- 1 nobody nobody 0 Apr 19 23:56 memory.pressure
-r--r--r-- 1 nobody nobody 0 Apr 19 23:56 memory.stat
drwxr-xr-x 3 nobody nobody 0 Mar 19 21:03 pod_123.slice
drwxr-xr-x 2 nobody nobody 0 Mar 10 23:20 sdet35
drwxr-xr-x 138 nobody nobody 0 Apr 19 23:51 system.slice
drwxr-xr-x 2 nobody nobody 0 Apr 14 23:31 test
drwxr-xr-x 3 nobody nobody 0 Apr 15 17:45 user.slice
drwxr-xr-x 3 nobody nobody 0 Mar 9 03:04 woo
/ # We see all the cgroups, while we should only see ours. Another repro is via our integration tests -- just modify this test case (from c3ffd2e#diff-df7d610d8608d0748bfcbaab39894c9e189211cddf54df5a955811287d199d5d) so it can be run as rootless (same as in one of this PR commits): diff --git a/tests/integration/cgroups.bats b/tests/integration/cgroups.bats
index c5a927bd..2c49dfef 100644
--- a/tests/integration/cgroups.bats
+++ b/tests/integration/cgroups.bats
@@ -138,7 +138,8 @@ function setup() {
}
@test "runc run (blkio weight)" {
- requires root cgroups_v2
+ requires cgroups_v2
+ [[ "$ROOTLESS" -ne 0 ]] && requires rootless_cgroup
set_cgroups_path
update_config '.linux.resources.blockIO |= {"weight": 750}' and run it in a rootless mode: [kir@kir-rhat runc]$ sudo make localrootlessintegration ROOTLESS_TESTPATH=/cgroups.bats
....
[02] run rootless tests ... (cgroup)
++ cat /sys/fs/cgroup/cgroup.controllers
+ for f in $(cat "$CGROUP_MOUNT/cgroup.controllers")
+ echo +cpuset
+ for f in $(cat "$CGROUP_MOUNT/cgroup.controllers")
+ echo +cpu
+ for f in $(cat "$CGROUP_MOUNT/cgroup.controllers")
+ echo +io
+ for f in $(cat "$CGROUP_MOUNT/cgroup.controllers")
+ echo +memory
+ for f in $(cat "$CGROUP_MOUNT/cgroup.controllers")
+ echo +hugetlb
+ for f in $(cat "$CGROUP_MOUNT/cgroup.controllers")
+ echo +pids
+ set +x
path: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/var/lib/snapd/snap/bin
1..10
ok 1 runc create (no limits + no cgrouppath + no permission) succeeds
ok 2 runc create (rootless + no limits + cgrouppath + no permission) fails with permission error # skip test requires rootless_no_cgroup
ok 3 runc create (rootless + limits + no cgrouppath + no permission) fails with informative error # skip test requires rootless_no_cgroup
ok 4 runc create (limits + cgrouppath + permission on the cgroup dir) succeeds
ok 5 runc exec (limits + cgrouppath + permission on the cgroup dir) succeeds
ok 6 runc exec (cgroup v2 + init process in non-root cgroup) succeeds # skip test requires root
ok 7 runc run (cgroup v1 + unified resources should fail) # skip test requires root
not ok 8 runc run (blkio weight)
# (in test file tests/integration/cgroups.bats, line 155)
# `[ "$output" = 'default 7475' ]' failed
# runc spec --rootless (status=0):
#
# runc run -d --console-socket /tmp/bats-run-1256838/runc.ZUIMwn/tty/sock test_cgroups_unified (status=0):
#
# runc exec test_cgroups_unified sh -c cat /sys/fs/cgroup/io.bfq.weight (status=1):
# cat: can't open '/sys/fs/cgroup/io.bfq.weight': No such file or directory
# runc exec test_cgroups_unified sh -c cat /sys/fs/cgroup/io.weight (status=1):
# cat: can't open '/sys/fs/cgroup/io.weight': No such file or directory This PR fixes the above test case, and also adds a separate test case to make sure we don't see the full cgroup hierarchy. |
Move the initialization of Console* fields as they are unconditional. Signed-off-by: Kir Kolyshkin <[email protected]>
The code is already passing three parameters around from mountToRootfs to mountCgroupV* to mountToRootfs again. I am about to add another parameter, so let's introduce and use struct mountConfig to pass around. Signed-off-by: Kir Kolyshkin <[email protected]>
1. s/cgroupPath/dest/ 2. don't hardcode /sys/fs/cgroup Signed-off-by: Kir Kolyshkin <[email protected]>
In case of rootless, cgroup2 mount is not possible (see [1] for more details), so since commit 9c81440 runc bind-mounts the whole /sys/fs/cgroup into container. Problem is, if cgroupns is enabled, /sys/fs/cgroup inside the container is supposed to show the cgroup files for this cgroup, not the root one. The fix is to pass through and use the cgroup path in case cgroup2 mount failed, cgroupns is enabled, and the path is non-empty. Surely this requires the /sys/fs/cgroup mount in the spec, so modify runc spec --rootless to keep it. Before: $ ./runc run aaa # find /sys/fs/cgroup/ -type d /sys/fs/cgroup /sys/fs/cgroup/user.slice /sys/fs/cgroup/user.slice/user-1000.slice /sys/fs/cgroup/user.slice/user-1000.slice/[email protected] ... # ls -l /sys/fs/cgroup/cgroup.controllers -r--r--r-- 1 nobody nogroup 0 Feb 24 02:22 /sys/fs/cgroup/cgroup.controllers # wc -w /sys/fs/cgroup/cgroup.procs 142 /sys/fs/cgroup/cgroup.procs # cat /sys/fs/cgroup/memory.current cat: can't open '/sys/fs/cgroup/memory.current': No such file or directory After: # find /sys/fs/cgroup/ -type d /sys/fs/cgroup/ # ls -l /sys/fs/cgroup/cgroup.controllers -r--r--r-- 1 root root 0 Feb 24 02:43 /sys/fs/cgroup/cgroup.controllers # wc -w /sys/fs/cgroup/cgroup.procs 2 /sys/fs/cgroup/cgroup.procs # cat /sys/fs/cgroup/memory.current 577536 [1] opencontainers#2158 Signed-off-by: Kir Kolyshkin <[email protected]>
As the rootless cgroup2 mount is now fixed, we can enable this test for rootless as well. Signed-off-by: Kir Kolyshkin <[email protected]>
This checks that in-container view of /sys/fs/cgroup does not contain any extra cgroups (which was the case for rootless before the previous commit). Signed-off-by: Kir Kolyshkin <[email protected]>
c0b2be6
to
0216716
Compare
The upstream patch (itself backported for rc93) was backported to rc92. Original cover letter: From 14faf1c20948688a48edb9b41367ab07ac11ca91 Mon Sep 17 00:00:00 2001 From: Aleksa Sarai <[email protected]> Date: Wed, 28 Apr 2021 15:44:36 +1000 Subject: [PATCH 0/5] rootfs: add mount destination validation This is a backport of the fix for CVE-2021-30465 to the v1.0.0-rc93 release. However, because the patch does not apply cleanly it was necessary to backport the following commits (from [1]): * deb8a8dd7767 ("libct/newInitConfig: nit") * 1e476578b6cd ("libct/rootfs: introduce and use mountConfig") * 3826db196d59 ("libct/rootfs/mountCgroupV2: minor refactor") * ff692f289b60 ("Fix cgroup2 mount for rootless case") And the patch itself was modified to remove hardenings for code which didn't exist in v1.0.0-rc93 (in particular, the mount changes in [2]). [1]: opencontainers/runc#2818 [2]: opencontainers/runc#2798 Aleksa Sarai (1): rootfs: add mount destination validation Kir Kolyshkin (4): libct/newInitConfig: nit libct/rootfs: introduce and use mountConfig libct/rootfs/mountCgroupV2: minor refactor Fix cgroup2 mount for rootless case libcontainer/container_linux.go | 11 +- libcontainer/init_linux.go | 1 + libcontainer/rootfs_linux.go | 291 +++++++++++++++++-------------- libcontainer/specconv/example.go | 18 +- libcontainer/utils/utils.go | 54 ++++++ libcontainer/utils/utils_test.go | 35 ++++ 6 files changed, 263 insertions(+), 147 deletions(-) -- 2.31.1
The upstream patch (itself backported for rc93) was backported to rc92. Original cover letter: From 14faf1c20948688a48edb9b41367ab07ac11ca91 Mon Sep 17 00:00:00 2001 From: Aleksa Sarai <[email protected]> Date: Wed, 28 Apr 2021 15:44:36 +1000 Subject: [PATCH 0/5] rootfs: add mount destination validation This is a backport of the fix for CVE-2021-30465 to the v1.0.0-rc93 release. However, because the patch does not apply cleanly it was necessary to backport the following commits (from [1]): * deb8a8dd7767 ("libct/newInitConfig: nit") * 1e476578b6cd ("libct/rootfs: introduce and use mountConfig") * 3826db196d59 ("libct/rootfs/mountCgroupV2: minor refactor") * ff692f289b60 ("Fix cgroup2 mount for rootless case") And the patch itself was modified to remove hardenings for code which didn't exist in v1.0.0-rc93 (in particular, the mount changes in [2]). [1]: opencontainers/runc#2818 [2]: opencontainers/runc#2798 Aleksa Sarai (1): rootfs: add mount destination validation Kir Kolyshkin (4): libct/newInitConfig: nit libct/rootfs: introduce and use mountConfig libct/rootfs/mountCgroupV2: minor refactor Fix cgroup2 mount for rootless case libcontainer/container_linux.go | 11 +- libcontainer/init_linux.go | 1 + libcontainer/rootfs_linux.go | 291 +++++++++++++++++-------------- libcontainer/specconv/example.go | 18 +- libcontainer/utils/utils.go | 54 ++++++ libcontainer/utils/utils_test.go | 35 ++++ 6 files changed, 263 insertions(+), 147 deletions(-) -- 2.31.1
In case of rootless, cgroup2 mount is not possible (see #2158
for more details), so since commit 9c81440 runc bind-mounts the whole
/sys/fs/cgroup into container.
Problem is, if cgroupns is enabled, /sys/fs/cgroup inside the container
is supposed to show the cgroup files for this cgroup, not the root one.
The fix is to pass through and use the cgroup path in case cgroup2
mount failed, cgroupns is enabled, and the path is non-empty.
Before:
After:
PS found while trying to enable rootless for the test case from #2786
Suggested changelog entry