-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
--userns-remap=default and --ipc=host: operation not permitted on /dev/mqueue #36674
Comments
The new behaviour is what is meant to happen. Ubuntu has been well-known to break The patch you posted is quite interesting though, because it'll fix that long-standing issue with SELinux that caused permission errors. @mrunalp @rhatdan are probably interested in that (it'll be a kernel-side fix for opencontainers/runc#1562 which was the internal mqueue mount problem on SELinux -- if it hasn't already been backported to RHEL I would recommend it). |
Thanks @cyphar.
To clarify, I can use
Any chance of detecting which case we have at runtime in runc/moby and provide a cleaner error message for users? I can do the patch, but not sure how to detect that. |
Oh, I might've misread a part of your initial question.
I looked at your output again, and what I would actually expect is that you don't have permission to mount mqueue if you aren't in an IPC namespace where you have privileges (which is what you initially pointed to). So the fact the change allows this now is ... odd. Sorry, I was a little mixed up in the world of SELinux permission issues that I completely missed that you didn't Looking at the kernel patch again with (slightly) fresher eyes, I now understand why this is permitted. Effectively, if you are in an IPC namespace that already has a mount set up (which will be true in the host) then attempting to mount it will give you the already-created mount. I am actually not sure if this is intentional -- because without doing a permission check I would think that you would allow mounts of a host-side resource inside a container (which is usually very bad). But I've never used I will send a mail to Eric Biederman and Al Viro to see if they have an opinion on this. |
I've posted a patch that should fix the issue to LKML (I've only compile-tested it, so if you can test it that'd be great -- I'm going to be busy until Sunday). https://lkml.org/lkml/2018/3/23/25 |
This reverts commit 36735a6. Aleksa Sarai <[email protected]> writes: > [REGRESSION v4.16-rc6] [PATCH] mqueue: forbid unprivileged user access to internal mount > > Felix reported weird behaviour on 4.16.0-rc6 with regards to mqueue[1], > which was introduced by 36735a6 ("mqueue: switch to on-demand > creation of internal mount"). > > Basically, the reproducer boils down to being able to mount mqueue if > you create a new user namespace, even if you don't unshare the IPC > namespace. > > Previously this was not possible, and you would get an -EPERM. The mount > is the *host* mqueue mount, which is being cached and just returned from > mqueue_mount(). To be honest, I'm not sure if this is safe or not (or if > it was intentional -- since I'm not familiar with mqueue). > > To me it looks like there is a missing permission check. I've included a > patch below that I've compile-tested, and should block the above case. > Can someone please tell me if I'm missing something? Is this actually > safe? > > [1]: moby/moby#36674 The issue is a lot deeper than a missing permission check. sb->s_user_ns was is improperly set as well. So in addition to the filesystem being mounted when it should not be mounted, so things are not allow that should be. We are practically to the release of 4.16 and there is no agreement between Al Viro and myself on what the code should looks like to fix things properly. So revert the code to what it was before so that we can take our time and discuss this properly. Fixes: 36735a6 ("mqueue: switch to on-demand creation of internal mount") Reported-by: Felix Abecassis <[email protected]> Reported-by: Aleksa Sarai <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
The relevant patch has been reverted upstream in torvalds/linux@cfb2f6f. My patch fixed the basic issue, but it turns out that the problem runs much deeper within VFS and so Eric and Al decided to revert the patch and they'll fix it properly in the This can now be closed I think. |
Shouldn't we explicitly prevent this case? Like |
Maybe, though really those restrictions are more to reduce user confusion than anything else -- I'm not sure that blocking Then again, it's very unlikely that you'll be able to enable |
Same for But, it seems in the current status it can't work with |
Yes, I think that blocking the other options ( ping @kolyshkin @estesp any thoughts? |
Well there is nothing to state that a shared IPC between two containers is not possible if they have overrlapping UID's. |
@rhatdan this would not be affected, this is fine with user namespaces today:
|
It looks like this issue went stale. Let me close it. |
On ubuntu 16.04 with
4.13.0-37-generic
and docker18.03.0-ce
, with--userns-remap=default
in my configuration:Looks like the kernel path is rather simple
https://github.com/torvalds/linux/blob/v4.13/ipc/mqueue.c#L338-L340
then
https://github.com/torvalds/linux/blob/v4.13/fs/super.c#L1023-L1027
At first I thought it was just an oversight, and that this case should be disabled, for instance like this:
But, it doesn't seem so simple, the code has changed for 4.16, for instance:
torvalds/linux@36735a6
And with 4.16.0-rc6 (debian buster), it works:
With 4.15.0 also on debian buster:
Any kernel maintainer to check if it's a kernel regression? Or is it intended and might be backported to distro kernels? Or are both behaviors valid and something changed in 4.16?
If the
4.15
behavior is the right one, we will have to disable--ipc=host
with userns, similarly to--net=host
and--pid=host
.@crosbymichael @cyphar?
The text was updated successfully, but these errors were encountered: