Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

main: support rootless mode in userns #1688

Merged
merged 4 commits into from
May 29, 2018

Conversation

AkihiroSuda
Copy link
Member

@AkihiroSuda AkihiroSuda commented Jan 13, 2018

Running rootless containers in userns is useful for mounting
filesystems (e.g. overlay) with mapped euid 0, but without actual root
privilege.

Usage: (Note that unshare --mount requires --map-root-user)

  user$ mkdir lower upper work rootfs
  user$ curl http://dl-cdn.alpinelinux.org/alpine/v3.7/releases/x86_64/alpine-minirootfs-3.7.0-x86_64.tar.gz | tar Cxz ./lower || ( true; echo "mknod errors were ignored" )
  user$ unshare --mount --map-root-user
  mappedroot# runc spec --rootless
  mappedroot# sed -i 's/"readonly": true/"readonly": false/g' config.json
  mappedroot# mount -t overlay -o lowerdir=./lower,upperdir=./upper,workdir=./work overlayfs ./rootfs
  mappedroot# runc run foo

Note: unprivileged overlay is only supported in Ubuntu and a few distros (http://kernel.ubuntu.com/git/ubuntu/ubuntu-artful.git/commit/fs/overlayfs?h=Ubuntu-4.13.0-25.29&id=0a414bdc3d01f3b61ed86cfe3ce8b63a9240eba7)

Signed-off-by: Akihiro Suda [email protected]

@AkihiroSuda
Copy link
Member Author

cc @cyphar

@AkihiroSuda AkihiroSuda force-pushed the unshare-m-r branch 3 times, most recently from ee8195d to 9104981 Compare January 13, 2018 09:01
utils_linux.go Outdated
// especially when running within `unshare -m -r`.
// So we use system.GetParentNSeuid() here.
//
// TODO(AkihiroSuda): how to support nested userns?
Copy link
Member Author

@AkihiroSuda AkihiroSuda Jan 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than checking UID, it might be better to attempt some lightweight operation that requires "real" root privilege.

e.g.

  • chown(tmpFile, nonZeroUID, nonZeroGID)
  • readdir(opendir(getpwuid(0).pw_dir)))

Any thought?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simplest test would be to do mknod, but I'm not sure I think this is the best idea in the world to be honest. There are plans in the future to provide the ability for previously disallowed syscalls to be allowed through the whole SECCOMP syscall emulation mechanism that's being worked on, so trying to pin down "we are in a user namespace because X doesn't work" is a bit of a stretch.

Copy link
Member Author

@AkihiroSuda AkihiroSuda Jan 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func isRootless() { return os.Geteuid() != 0 || readFile("/proc/self/setgroups") == "deny") } is fine then?
EDIT: even /proc/self/setgroups == alow, it might still require rootless... any thought?

@frezbo
Copy link

frezbo commented Jan 13, 2018

Does this PR address this issue: #1658

@AkihiroSuda
Copy link
Member Author

@frezbo no

@@ -130,6 +130,33 @@ func RunningInUserNS() bool {
return true
}

// GetParentNSeuid returns the euid within the parent user namespace
func GetParentNSeuid() int {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @brauner had a patch like this for Ubuntu? This code should also check /proc/self/setgroups and ensure that it's set to deny.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As commented above, I think we need to find an alternative check to support nested userns

@AkihiroSuda
Copy link
Member Author

Updated PR

I think this PR is mergeable now.

@frezbo
Copy link

frezbo commented Jan 16, 2018

@AkihiroSuda I can't seem to build your branch, here;s the error:

go build -buildmode=pie  -ldflags "-X main.gitCommit="6f6720a754f7cc69274762586d678b85aecfbeba" -X main.version=1.0.0-rc4+dev " -tags "seccomp" -o runc .
# github.com/AkihiroSuda/runc
./signals.go:137:28: cannot use ws (type "github.com/AkihiroSuda/runc/vendor/golang.org/x/sys/unix".WaitStatus) as type "github.com/opencontainers/runc/vendor/golang.org/x/sys/unix".WaitStatus in argument to utils.ExitStatus
./utils_linux.go:230:9: undefined: system.GetParentNSeuid
./utils_linux.go:239:7: cannot use spec (type *"github.com/AkihiroSuda/runc/vendor/github.com/opencontainers/runtime-spec/specs-go".Spec) as type *"github.com/opencontainers/runc/vendor/github.com/opencontainers/runtime-spec/specs-go".Spec in field value
make: *** [Makefile:32: runc] Error 2
runc (aws:default)(git:unshare-m-r)$ 

@AkihiroSuda
Copy link
Member Author

@frezbo mv AkihiroSuda opencontainers

@AkihiroSuda
Copy link
Member Author

Any thought?
This is the (only) blocker for rootless containerd: containerd/containerd#2006

@AkihiroSuda
Copy link
Member Author

On second thought I came up with a ternary CLI flag --rootless=auto/true/false rather than boolean --force-rootless=true/false. But I'm not sure such a ternary flag is acceptable.

@cyphar cyphar self-assigned this Feb 4, 2018
@cyphar cyphar assigned cyphar and unassigned cyphar Feb 23, 2018
@cyphar
Copy link
Member

cyphar commented Feb 23, 2018

I am currently on vacation unfortunately. I will do a review of this when I get back.

@AkihiroSuda
Copy link
Member Author

rebased

@AkihiroSuda
Copy link
Member Author

On second thought I came up with a ternary CLI flag --rootless=auto/true/false rather than boolean --force-rootless=true/false. But I'm not sure such a ternary flag is acceptable.

Any thought about this?

Although isRootless() auto-detection might not be perfect, I think this PR can be mergeable when we agree on the CLI flag.
The auto-detection logic can be improved later.

cc @ehotinger (genuinetools/img#69 (comment))

@cyphar
Copy link
Member

cyphar commented Mar 23, 2018

I think a tertiary flag is okay. I will need to re-review this.

@AkihiroSuda
Copy link
Member Author

@cyphar updated to use ternary flag

@cyphar
Copy link
Member

cyphar commented May 8, 2018

LGTM, I've played around with this and it works pretty well. As you noted in your comment there isn't really a nice way of detecting whether we are in a user namespace in a non-destructive way (unfortunately) -- so this will have to do for now.

/cc @opencontainers/runc-maintainers

Approved with PullApprove

utils_linux.go Outdated
// So we use system.GetParentNSeuid() here.
//
// TODO(AkihiroSuda): how to support nested userns?
return system.GetParentNSeuid() != 0, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be safer to assume "yes" everytime it is running in an userNS (i.e. doesn't have the full 0-4294967295 range available in /proc/self/uid_map)?

Copy link
Member

@cyphar cyphar May 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a good point. I think doing both (and if either is true, we assume we are in rootless mode) would work as well. The main problem with the 4294967295 check is that you can create user namespaces that have a full mapping but still require rootless tricks (and as I said above there's no real way of checking if that is true from userspace).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that because it is still theoretically possible to setup the full range and have root mapped to some other user in the host namespace? If I am not missing something, that sounds quite difficult to achieve as it requires an intermediate user namespace with at least two mappings (root in the intermediate namespace mapped to non root in the host) that still has enough privileges to setup the full range for the final userNS.

In any case, I fully agree that having both checks is safer

Copy link
Member

@cyphar cyphar May 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that because it is still theoretically possible to setup the full range and have root mapped to some other user in the host namespace?

That is also possible, but it's actually much simpler than that. Even if you have a 1-to-1 mapping of all users in a new user namespace certain operations will still fail purely because you are not in &init_user_ns (such as mknod(2) or most of the mounting we do) -- because capabilities are scoped to your user namespace and your host UID is irrelevant. So we'd need to apply rootless tricks even in that case.

While this is an edge-case it means that in general we have no way of checking whether we are in a user namespace or the host user namespace. The closest thing we have is ioctl(NS_GET_PARENT) but returns -EPERM even if there is no parent -- I would argue that it should return -ENOENT in that case but @ebiederm might not agree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the kernel the inode for the initial user namespace is hardcoded, and it seems like a reliable (and undocumented AFAICS) way to detect if we are running in the init userNS:

$ cat > userns_checker.c << EOF
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

int main ()
{
  struct stat sb;
  if (stat ("/proc/self/ns/user", &sb) < 0)
    return -1;
  printf ("is main userns? %s\n", 0xEFFFFFFDU == sb.st_ino ? "yes" : "no");
  return 0;
}
EOF

$ gcc -o userns_checker userns_checker.c

$ ./userns_checker 
is main userns? yes
$ unshare -r ./userns_checker 
is main userns? no

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased PR and added the 4294967295 check.
The 0xEFFFFFFD check can be follow-up PR after discussion.

Copy link
Member

@cyphar cyphar May 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell this is not guaranteed -- it just happens to be that ns_alloc_inum returns the same thing on each boot. But any kernel change or module that results in proc_alloc_inum being called before &init_user_ns is instantiated is going to result in that inode number changing. I checked again and it is hardcoded as PROC_USER_INIT_INO .

Not to mention that (as Eric said) special inode numbers in general are not something you should depend on (a perfect example is "detecting" chroot(2) by the inode number of / -- something that always breaks every time someone tries it with a new filesystem because they all use different inode numbers for /).

@ebiederm
Copy link

ebiederm commented May 10, 2018 via email

Running rootless containers in userns is useful for mounting
filesystems (e.g. overlay) with mapped euid 0, but without actual root
privilege.

Usage: (Note that `unshare --mount` requires `--map-root-user`)

  user$ mkdir lower upper work rootfs
  user$ curl http://dl-cdn.alpinelinux.org/alpine/v3.7/releases/x86_64/alpine-minirootfs-3.7.0-x86_64.tar.gz | tar Cxz ./lower || ( true; echo "mknod errors were ignored" )
  user$ unshare --mount --map-root-user
  mappedroot# runc spec --rootless
  mappedroot# sed -i 's/"readonly": true/"readonly": false/g' config.json
  mappedroot# mount -t overlay -o lowerdir=./lower,upperdir=./upper,workdir=./work overlayfs ./rootfs
  mappedroot# runc run foo

Signed-off-by: Akihiro Suda <[email protected]>
@cyphar
Copy link
Member

cyphar commented May 10, 2018

@ebiederm How do you feel about making NS_GET_PARENT return -ENOENT if the user is in the namespace and the namespace is the init_ns?

@cyphar
Copy link
Member

cyphar commented May 10, 2018

LGTM on the updated patchset.

Approved with PullApprove

@ebiederm
Copy link

ebiederm commented May 10, 2018 via email

@AkihiroSuda
Copy link
Member Author

Added commit c938157 for allowing setgroups.
(Already reviewed by @cyphar in genuinetools/img#96 (comment) )

@AkihiroSuda
Copy link
Member Author

@dqminh PTAL?

@cyphar
Copy link
Member

cyphar commented May 29, 2018

LGTM again.

Approved with PullApprove

@crosbymichael
Copy link
Member

crosbymichael commented May 29, 2018

LGTM

Approved with PullApprove

@crosbymichael crosbymichael merged commit 0e56164 into opencontainers:master May 29, 2018
AkihiroSuda added a commit to AkihiroSuda/runc that referenced this pull request May 30, 2018
@AkihiroSuda AkihiroSuda mentioned this pull request May 30, 2018
AkihiroSuda added a commit to AkihiroSuda/runc that referenced this pull request May 30, 2018
/*
* We assume we are in the initial user namespace if we have a full
* range - 4294967295 uids starting at uid 0.
*/
if a == 0 && b == 0 && c == 4294967295 {
if len(uidmap) == 1 && uidmap[0].ID == 0 && uidmap[0].ParentID == 0 && uidmap[0].Count == 4294967295 {
return false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this broke ARM builds in Moby;

linux_amd64_netgo:/usr/local/go/pkg/linux_amd64_netgo"    -e GOARM=6 "docker-dev:master" hack/make.sh binary
04:21:31 
04:21:32 Removing bundles/
04:21:32 
04:21:32 ---> Making bundle: binary (in bundles/binary)
04:21:32 Building: bundles/binary-daemon/dockerd-18.06.0-ce-dev
04:22:29 # github.com/docker/docker/vendor/github.com/opencontainers/runc/libcontainer/system
04:22:29 vendor/github.com/opencontainers/runc/libcontainer/system/linux.go:119:89: constant 4294967295 overflows int

AkihiroSuda added a commit to AkihiroSuda/go-runc that referenced this pull request Jul 2, 2018
The --rootless flag was introduced in opencontainers/runc#1688.

In most cases runc itself can detect the appropriate value,
but it is considered to be there are some corner cases.

Signed-off-by: Akihiro Suda <[email protected]>
AkihiroSuda added a commit to AkihiroSuda/go-runc that referenced this pull request Jul 3, 2018
The --rootless flag was introduced in opencontainers/runc#1688.

In most cases runc itself can detect the appropriate value,
but it is considered to be there are some corner cases.

Signed-off-by: Akihiro Suda <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants