Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to join the user and pid namespaces of an existing runc container #960

Closed
haiyanmeng opened this issue Jul 20, 2016 · 22 comments
Closed

Comments

@haiyanmeng
Copy link
Contributor

I started a runc container first, then started another container. The second
container tried to join the user and pid namespace of the first container.

The first container got started successfully, however, the second container was
failed to be started with the following error:

[hmeng@localhost c2]$ sudo $(which runc) run test1
process_linux.go:245: running exec setns process for init caused "exit status 1"

The strace log of the second container shows that setns failed to join the
user namespace of the first container.

Here is config.json of the first container:
https://github.com/hmeng-19/logs/blob/master/runc_ns/c1/config.json

Here is config.json of the second container:
https://github.com/hmeng-19/logs/blob/master/runc_ns/c2/config.json

Here is the strace log of the second container:
https://github.com/hmeng-19/logs/blob/master/runc_ns/strace.log

The rootfs I used for both of the two container are the busybox one:

docker export $(docker create busybox) | tar -C rootfs -xvf -

I am running Fedora 23 (Workstation Edition), and here is the kernel info:

[hmeng@localhost c2]$ uname -a
Linux localhost.localdomain 4.6.4-201.fc23.x86_64 #1 SMP Tue Jul 12 11:43:59 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

I am testing runc using the latest master branch:

commit 7b06cc02c7e777ac2cf2013910fcd487089a7055
Merge: bd1d3ac f2c4c4a
Author: Mrunal Patel <[email protected]>
Date:   Wed Jul 20 08:35:45 2016 -0700

    Merge pull request #957 from zhaoleidd/fix_exec_test_output

    integration_testing: Fix a output typo
@haiyanmeng
Copy link
Contributor Author

@mrunalp , PTAL.

@cyphar
Copy link
Member

cyphar commented Jul 20, 2016

@hmeng-19 Can you please try running your test again with the #950 patchset applied (it adds some debugging information so we can tell where nsenter fails). Also, if you want to use strace with runc, please always use strace -f.

@haiyanmeng
Copy link
Contributor Author

@cyphar , thanks for pointing out. I will give it a try.

@mrunalp
Copy link
Contributor

mrunalp commented Jul 20, 2016

I just tried this out on Fedora 23 and it worked for me with runc master. Looking at your configs to see if anything is amiss.

@mrunalp
Copy link
Contributor

mrunalp commented Jul 20, 2016

You can see that they are sharing user/pid namespaces.

Container 1:

[root@localhost test]# ocitools generate --tty --output=config.json
[root@localhost test]# ocitools generate --tty --output=config.json --uidmappings=0:0:1024 --gidmappings=0:0:1024
[root@localhost test]# runc run 1234
/ # id
uid=0(root) gid=0(root)
/ # cat /proc/self/uid_map 
         0          0       1024
/ # 
/ # 
/ # ps
PID   USER     COMMAND
    1 root     sh
    7 root     sh
   15 root     ps
/ # ls -l /proc/1/ns
total 0
lrwxrwxrwx    1 root     root             0 Jul 20 22:52 ipc -> ipc:[4026532230]
lrwxrwxrwx    1 root     root             0 Jul 20 22:52 mnt -> mnt:[4026532228]
lrwxrwxrwx    1 root     root             0 Jul 20 22:52 net -> net:[4026532233]
lrwxrwxrwx    1 root     root             0 Jul 20 22:52 pid -> pid:[4026532231]
lrwxrwxrwx    1 root     root             0 Jul 20 22:52 user -> user:[4026532226]
lrwxrwxrwx    1 root     root             0 Jul 20 22:52 uts -> uts:[4026532229]
/ # 
/ # 
/ # ps
PID   USER     COMMAND
    1 root     sh
   21 root     sh
   27 root     ps
/ # 

Container 2:

[root@localhost enter]#  ocitools generate --tty --output=config.json --uidmappings=0:0:1024 --gidmappings=0:0:1024  --output=config.json --pid /proc/15200/ns/pid  --user /proc/15200/ns/user
[root@localhost enter]# runc run 3434
/ # ps -ef
PID   USER     COMMAND
    1 root     sh
   21 root     sh
   25 root     ps -ef
/ # ls -l /proc/21/ns
total 0
lrwxrwxrwx    1 root     root             0 Jul 20 22:56 ipc -> ipc:[4026532295]
lrwxrwxrwx    1 root     root             0 Jul 20 22:56 mnt -> mnt:[4026532293]
lrwxrwxrwx    1 root     root             0 Jul 20 22:56 net -> net:[4026532297]
lrwxrwxrwx    1 root     root             0 Jul 20 22:56 pid -> pid:[4026532231]
lrwxrwxrwx    1 root     root             0 Jul 20 22:56 user -> user:[4026532226]
lrwxrwxrwx    1 root     root             0 Jul 20 22:56 uts -> uts:[4026532294]
/ # 

@cyphar
Copy link
Member

cyphar commented Jul 20, 2016

@hmeng-19 I think the issue is that you're manually setting the /proc/pid/ns/... path in the config. This will change with every run of the first container:

"path": "/proc/5736/ns/pid"

@haiyanmeng
Copy link
Contributor Author

@cyphar , every time I would use ps to find the pid of the first container, and replace the pid in the config.json file for the second container.

@wking
Copy link
Contributor

wking commented Jul 20, 2016

On Wed, Jul 20, 2016 at 04:02:58PM -0700, Aleksa Sarai wrote:

@hmeng-19 I think the issue is that you're manually setting the
/proc/pid/ns/... path in the config.

This will get easier if we can land opencontainers/runtime-tools#54 ;).

@haiyanmeng
Copy link
Contributor Author

@mrunalp , I tried the commands you provided and still got the same error.

@wking
Copy link
Contributor

wking commented Jul 20, 2016

On Wed, Jul 20, 2016 at 03:27:14PM -0700, hmeng-19 wrote:

[hmeng@localhost c2]$ sudo $(which runc) run test1
process_linux.go:245: running exec setns process for init caused "exit status 1"

The strace log of the second container shows that setns failed to
join the user namespace of the first container.

Shouldn't runC be joining the user namespace first? The strace has
1:

5874 open("/proc/5736/ns/pid", O_RDONLY) = 5
5874 open("/proc/5736/ns/user", O_RDONLY) = 6
5874 setns(5, 0) = 0
5874 close(5) = 0
5874 setns(6, 0) = -1 EINVAL (Invalid argument)

See also 2 about join order. On the other hand, if you're root in
the host namespace, then that probably doesn't matter.

setns(2) has a few possible reasons for EINVAL, including:

  • The caller attempted to join the user namespace in which it is
    already a member.

  • The caller shares filesystem (CLONE_FS) state (in particular, the
    root directory) with other processes and tried to join a new user
    namespace.

  • The caller is multithreaded and tried to join a new user namespace.

    Subject: [Linux] Specify namespace-joining order? (can be important for user
    namespaces)
    Date: Mon, 18 Apr 2016 21:53:00 -0700
    Message-ID: [email protected]

@cyphar
Copy link
Member

cyphar commented Jul 20, 2016

@wking Okay, then this is tangentially related to #959. As I said in that PR, there are quite a few nsenter fixes that are still in my rootless containers patchset but will definitely be useful for others. I plan to port all of these fixes to #950 very soon.

@haiyanmeng
Copy link
Contributor Author

@cyphar , I tested my problem using #950 . It seems that the problem still exists. I am wondering whether my kernel is broken somehow.

@cyphar
Copy link
Member

cyphar commented Jul 21, 2016

@hmeng-19 I just ported the fix. Please try again with 8a454e5.

@haiyanmeng
Copy link
Contributor Author

@cyphar , I tried 8a454e5 .
unshare succeeded, but it failed to write uid_map into proc/self/uid_map. Therefore, runc run test failed when I tried to start the first container.
Here is the strace log:
https://github.com/hmeng-19/logs/blob/master/runc_ns/log

@haiyanmeng
Copy link
Contributor Author

@cyphar , in the current master, it looks the parent process responds to update the uid map and gid map for the child process.
However, in 8a454e5, it seems that the child process tries to update its own uid map and gid map.

@cyphar
Copy link
Member

cyphar commented Jul 21, 2016

Yeah, sorry. The code doesn't work currently if you're a privileged user (weirdly, it does work if you're setting up a rootless container). Part of that patch will fix your problem, but I'm also trying to fix another bug at the same time. The idea is that you want to unshare the user namespace before anything else, then set up all of the mappings and only then do we clone.

@haiyanmeng
Copy link
Contributor Author

@cyphar , got it. It is okay. Let me know if you want to retest another version of #950 .
BTW, I am just curious whether make test succeeds on your machine.

@cyphar
Copy link
Member

cyphar commented Jul 21, 2016

The only make test failure on my machine (which also happens on master) is due to the memcg changes that we're still in the process of fixing. Which is very odd, because I swear that I used to be able to reproduce the if-youre-root-weird-things-happen-with-this-code case locally :/.

@haiyanmeng
Copy link
Contributor Author

@cyphar , glad to hear that. Then I have less doubts with my machine.
The only make test failure on my machine with the upstream master branch is ``TestRunWithKernelMemory` , as mentioned in #946.

@cyphar
Copy link
Member

cyphar commented Jul 21, 2016

I'm quite worried that I can't seem to reproduce the failures with our Jenkins instance (even though the failures are definitely real). It's probably a kernel version thing. :/

@haiyanmeng
Copy link
Contributor Author

haiyanmeng commented Jul 21, 2016

@mrunalp , @cyphar , @wking , I found the problem.
I used the pid of the runc process to set nsPath by mistake.
I should use the pid of the sh process instead. What a drama.

@haiyanmeng
Copy link
Contributor Author

@mrunalp , @cyphar , @wking , thanks for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants