-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runc run: fix readonly path error for rootless + host pidns #2897
Conversation
The issue is easy to repro using any recent runc (I tried rc91 up to latest git) and podman >= 3 $ podman --runtime=runc run --pid=host --rm -it busybox sh |
@mheon PTAL |
@giuseppe ptal |
My guess as to why you can't reproduce this outside of podman is because I fixed this bug on the Docker side some time ago. moby/moby#35205 From memory there was a particular reason why I opted to solve this in Docker rather than runc -- I believe it's because I felt that it was unclear whether runc should be overriding the flags specified by the caller, but it's been ~4 years so my memory of this bug is quite foggy at this point. |
Though I guess this is being run in a slightly different scenario, it's in the "make everything read-only recursively" step not in the more general mount setup where we're just applying the configuration we were given. Maybe this does make sense in runc... (As an aside, |
I think the code is similar to what crun does: https://github.com/containers/crun/blob/master/src/libcrun/linux.c#L470-L497 It is a bit more articulated as it covers other cases as well, but for the |
@kolyshkin hi, I gave a tests on RHEL-8.4 VM environment, but I got different result, In my VM environment.
I got different error like below
I also applied your PR2897, but I still got above error.
|
Code LGTM |
that could be a regression in the RHEL 8.4 kernel, that is not caused by runc (https://bugzilla.redhat.com/show_bug.cgi?id=1903983), please make sure the kernel version is updated. |
Thank you @giuseppe, I can reproduce this issue after upgrading kernel to 4.18.0-293,
|
return unix.Mount(path, path, "", unix.MS_BIND|unix.MS_REMOUNT|unix.MS_RDONLY|unix.MS_REC, "") | ||
|
||
var s unix.Statfs_t | ||
if err := unix.Statfs(path, &s); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about trying mount first and only going statfs route on EPERM, but this way the code path will rarely be hit (and thus tested), and a single statfs doesn't add too much overhead.
I am still working on a test case for that, hope to finish this week. |
Added a test case, which fails like this (before the fix):
NB: maybe we need to address those rootless warnings // cc @AkihiroSuda |
Currently, runc fails like this when used from rootless podman with host PID namespace: > $ podman --runtime=runc run --pid=host --rm -it busybox sh > WARN[0000] additional gid=10 is not present in the user namespace, skip setting it > Error: container_linux.go:380: starting container process caused: > process_linux.go:545: container init caused: readonly path /proc/asound: > operation not permitted: OCI permission denied (Here /proc/asound is the first path from OCI spec's readonlyPaths). The code uses MS_BIND|MS_REMOUNT flags that have a special meaning in the kernel ("keep the flags like nodev, nosuid, noexec as is"). For some reason, this "special meaning" trick is not working for the above use case (rootless podman + no PID namespace), and I don't know how to reproduce this without podman. Instead of relying on the kernel feature, let's just get the current mount flags using fstatfs(2) and add those that needs to be preserved. While at it, wrap errors from unix.Mount into os.PathError to make errors a bit less cryptic. Signed-off-by: Kir Kolyshkin <[email protected]>
For the fix, see previous commit. Without the fix, this test case fails: > container_linux.go:380: starting container process caused: > process_linux.go:545: container init caused: readonly path /proc/bus: > operation not permitted Signed-off-by: Kir Kolyshkin <[email protected]>
Opened #2910 about the OOM kill count warning, as it was recently introduced. |
Currently, runc fails like this when used from rootless podman
with host PID namespace:
(Here /proc/asound is the first path from OCI spec's readonlyPaths).
The code uses
MS_BIND|MS_REMOUNT
flags that have a special meaning inthe kernel ("keep the flags like nodev, nosuid, noexec as is").
For some reason, this "special meaning" trick is not working for the
above use case (rootless + host PID namespace)
, and I don't knowhow to reproduce this without podman.Instead of relying on the kernel feature, let's just get the current
mount flags using
fstatfs(2)
and add these. This fixes the issue observed.Add a repro test case which fails like this (before the fix):
(the warnings are there because of rootless with no cgroup access)
Originally reported in https://bugzilla.redhat.com/show_bug.cgi?id=1947432