Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runc run: fix readonly path error for rootless + host pidns #2897

Merged
merged 2 commits into from
Apr 19, 2021

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Apr 8, 2021

Currently, runc fails like this when used from rootless podman
with host PID namespace:

$ podman --runtime=runc run --pid=host --rm -it busybox sh
WARN[0000] additional gid=10 is not present in the user namespace, skip setting it
Error: container_linux.go:380: starting container process caused:
process_linux.go:545: container init caused: readonly path /proc/asound:
operation not permitted: OCI permission denied

(Here /proc/asound is the first path from OCI spec's readonlyPaths).

The code uses MS_BIND|MS_REMOUNT flags that have a special meaning in
the kernel ("keep the flags like nodev, nosuid, noexec as is").
For some reason, this "special meaning" trick is not working for the
above use case (rootless + host PID namespace), and I don't know
how to reproduce this without podman.

Instead of relying on the kernel feature, let's just get the current
mount flags using fstatfs(2) and add these. This fixes the issue observed.

Add a repro test case which fails like this (before the fix):

not ok 5 runc run [rootless with host pidns]
# (in test file tests/integration/start_hello.bats, line 78)
#   `[ "$status" -eq 0 ]' failed
# runc spec --rootless (status=0):
# 
# runc run test_hello (status=1):
# time="2021-04-14T17:12:35-07:00" level=warning msg="unable to get oom kill count" error="openat2 /sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/test_hello/memory.events: no such file or directory"
# time="2021-04-14T17:12:35-07:00" level=warning msg="freezer not supported: openat2 /sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/test_hello/cgroup.freeze: no such file or directory"
# time="2021-04-14T17:12:35-07:00" level=warning msg="lstat /sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/test_hello: no such file or directory"
# time="2021-04-14T17:12:35-07:00" level=error msg="container_linux.go:380: starting container process caused: process_linux.go:545: container init caused

(the warnings are there because of rootless with no cgroup access)

Originally reported in https://bugzilla.redhat.com/show_bug.cgi?id=1947432

@kolyshkin
Copy link
Contributor Author

The issue is easy to repro using any recent runc (I tried rc91 up to latest git) and podman >= 3

$ podman --runtime=runc run --pid=host --rm -it busybox sh

@TomSweeneyRedHat
Copy link

@mheon PTAL

@mrunalp
Copy link
Contributor

mrunalp commented Apr 8, 2021

@giuseppe ptal

@cyphar
Copy link
Member

cyphar commented Apr 8, 2021

My guess as to why you can't reproduce this outside of podman is because I fixed this bug on the Docker side some time ago. moby/moby#35205 From memory there was a particular reason why I opted to solve this in Docker rather than runc -- I believe it's because I felt that it was unclear whether runc should be overriding the flags specified by the caller, but it's been ~4 years so my memory of this bug is quite foggy at this point.

@cyphar
Copy link
Member

cyphar commented Apr 8, 2021

Though I guess this is being run in a slightly different scenario, it's in the "make everything read-only recursively" step not in the more general mount setup where we're just applying the configuration we were given. Maybe this does make sense in runc...

(As an aside, MS_REC is being ignored by the kernel here -- you cannot make things recursively readonly in one-shot, at least not until very recently with mount_setattr(2).)

@giuseppe
Copy link
Member

giuseppe commented Apr 8, 2021

@giuseppe ptal

I think the code is similar to what crun does: https://github.com/containers/crun/blob/master/src/libcrun/linux.c#L470-L497

It is a bit more articulated as it covers other cases as well, but for the MS_RDONLY I think it is equivalent, even if attempted only on failures: https://github.com/containers/crun/blob/master/src/libcrun/linux.c#L488-L492

@chuanchang
Copy link

The issue is easy to repro using any recent runc (I tried rc91 up to latest git) and podman >= 3

@kolyshkin hi, I gave a tests on RHEL-8.4 VM environment, but I got different result,
for more, please see the following details, thanks!

In my VM environment.

$ podman unshare cat /proc/self/uid_map
         0       1000          1
         1     100000      65536

$ rpm -q podman runc kernel
podman-3.0.1-6.module+el8.4.0+10487+af324045.x86_64
runc-1.0.0-70.rc92.module+el8.4.0+10487+af324045.x86_64
kernel-4.18.0-287.el8.dt4.x86_64
$ podman --runtime=runc run --pid=host --rm -it busybox sh

I got different error like below

[test@kvm-08-guest05 runc]$ git rev-parse HEAD
bb28c44f12bf24ea64590edfb4f23a4b4d2eaae8

[test@kvm-08-guest05 runc]$ podman --runtime=/home/test/go/src/github.com/opencontainers/runc/runc run --pid=host --rm -it quay.io/libpod/busybox sh
Error: time="2021-04-08T08:48:52-04:00" level=error msg="container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:60: mounting \"sysfs\" to rootfs at \"/sys\" caused: operation not permitted": OCI permission denied

I also applied your PR2897, but I still got above error.

[test@kvm-08-guest05 runc]$ git rev-parse HEAD
3addcd4717d677653b58ae1cc86c952dbaf7581b

[test@kvm-08-guest05 runc]$ git log -1 | grep "rootless + host pidns"
    runc run: fix start for rootless + host pidns

[test@kvm-08-guest05 runc]$ make all
go build -trimpath "-buildmode=pie"  -tags "seccomp" -ldflags "-X main.gitCommit="3addcd4717d677653b58ae1cc86c952dbaf7581b" -X main.version=1.0.0-rc93+dev " -o runc .
go build -trimpath "-buildmode=pie"  -tags "seccomp" -ldflags "-X main.gitCommit="3addcd4717d677653b58ae1cc86c952dbaf7581b" -X main.version=1.0.0-rc93+dev " -o contrib/cmd/recvtty/recvtty ./contrib/cmd/recvtty

[test@kvm-08-guest05 runc]$ podman --runtime=/home/test/go/src/github.com/opencontainers/runc/runc run --pid=host --rm -it quay.io/libpod/busybox sh
Error: time="2021-04-08T08:54:54-04:00" level=error msg="container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:60: mounting \"sysfs\" to rootfs at \"/sys\" caused: operation not permitted": OCI permission denied

@mheon
Copy link
Contributor

mheon commented Apr 8, 2021

Code LGTM

@giuseppe
Copy link
Member

giuseppe commented Apr 8, 2021

@kolyshkin hi, I gave a tests on RHEL-8.4 VM environment, but I got different result,
for more, please see the following details, thanks!

[test@kvm-08-guest05 runc]$ podman --runtime=/home/test/go/src/github.com/opencontainers/runc/runc run --pid=host --rm -it quay.io/libpod/busybox sh
Error: time="2021-04-08T08:48:52-04:00" level=error msg="container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:60: mounting "sysfs" to rootfs at "/sys" caused: operation not permitted": OCI permission denied

that could be a regression in the RHEL 8.4 kernel, that is not caused by runc (https://bugzilla.redhat.com/show_bug.cgi?id=1903983), please make sure the kernel version is updated.

@chuanchang
Copy link

that could be a regression in the RHEL 8.4 kernel, that is not caused by runc (https://bugzilla.redhat.com/show_bug.cgi?id=1903983), please make sure the kernel version is updated.

Thank you @giuseppe, I can reproduce this issue after upgrading kernel to 4.18.0-293,
and this PR works for me, altough I got some warning, thanks!

[test@hpe-dl380pgen8-02-vm-3 runc]$ git rev-parse HEAD
3addcd4717d677653b58ae1cc86c952dbaf7581b

[test@hpe-dl380pgen8-02-vm-3 runc]$ podman --runtime=/home/test/go/src/github.com/opencontainers/runc/runc run --pid=host --rm -it quay.io/libpod/busybox sh
/ # ls
bin   dev   etc   home  proc  root  run   sys   tmp   usr   var
/ # exit
WARN[0000] cannot toggle freezer: cgroups not configured for container
WARN[0000] cannot toggle freezer: cgroups not configured for container
WARN[0000] lstat : no such file or directory

return unix.Mount(path, path, "", unix.MS_BIND|unix.MS_REMOUNT|unix.MS_RDONLY|unix.MS_REC, "")

var s unix.Statfs_t
if err := unix.Statfs(path, &s); err != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about trying mount first and only going statfs route on EPERM, but this way the code path will rarely be hit (and thus tested), and a single statfs doesn't add too much overhead.

@kolyshkin
Copy link
Contributor Author

I am still working on a test case for that, hope to finish this week.

@kolyshkin
Copy link
Contributor Author

Added a test case, which fails like this (before the fix):

not ok 5 runc run [rootless with host pidns]
# (in test file tests/integration/start_hello.bats, line 78)
#   `[ "$status" -eq 0 ]' failed
# runc spec --rootless (status=0):
# 
# runc run test_hello (status=1):
# time="2021-04-14T17:12:35-07:00" level=warning msg="unable to get oom kill count" error="openat2 /sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/test_hello/memory.events: no such file or directory"
# time="2021-04-14T17:12:35-07:00" level=warning msg="freezer not supported: openat2 /sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/test_hello/cgroup.freeze: no such file or directory"
# time="2021-04-14T17:12:35-07:00" level=warning msg="lstat /sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/test_hello: no such file or directory"
# time="2021-04-14T17:12:35-07:00" level=error msg="container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: readonly path /proc/bus: operation not permitted"

NB: maybe we need to address those rootless warnings // cc @AkihiroSuda

Currently, runc fails like this when used from rootless podman
with host PID namespace:

> $ podman --runtime=runc run --pid=host --rm -it busybox sh
> WARN[0000] additional gid=10 is not present in the user namespace, skip setting it
> Error: container_linux.go:380: starting container process caused:
> process_linux.go:545: container init caused: readonly path /proc/asound:
> operation not permitted: OCI permission denied

(Here /proc/asound is the first path from OCI spec's readonlyPaths).

The code uses MS_BIND|MS_REMOUNT flags that have a special meaning in
the kernel ("keep the flags like nodev, nosuid, noexec as is").
For some reason, this "special meaning" trick is not working for the
above use case (rootless podman + no PID namespace), and I don't know
how to reproduce this without podman.

Instead of relying on the kernel feature, let's just get the current
mount flags using fstatfs(2) and add those that needs to be preserved.

While at it, wrap errors from unix.Mount into os.PathError to make
errors a bit less cryptic.

Signed-off-by: Kir Kolyshkin <[email protected]>
For the fix, see previous commit. Without the fix, this test case fails:

> container_linux.go:380: starting container process caused:
> process_linux.go:545: container init caused: readonly path /proc/bus:
> operation not permitted

Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin
Copy link
Contributor Author

NB: maybe we need to address those rootless warnings // cc @AkihiroSuda

Opened #2910 about the OOM kill count warning, as it was recently introduced.

@kolyshkin kolyshkin added this to the 1.0.0-rc94 milestone Apr 15, 2021
@mrunalp mrunalp merged commit 3a20ccb into opencontainers:master Apr 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants