Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't always enable rootless mode in userns #1837

Closed
AkihiroSuda opened this issue Jul 3, 2018 · 5 comments
Closed

Don't always enable rootless mode in userns #1837

AkihiroSuda opened this issue Jul 3, 2018 · 5 comments

Comments

@AkihiroSuda
Copy link
Member

AkihiroSuda commented Jul 3, 2018

Problem

In #1688 we broke "Docker-in-LXD":

$ lxc launch ubuntu:18.04 foo -c security.nesting=true
$ lxc shell foo
foo# curl -fsSL get.docker.com -o get-docker.sh && sh get-docker.sh
foo# exit
$ lxc file push /usr/local/sbin/runc foo/usr/bin/docker-runc
$ lxc shell foo
foo# cat /proc/self/uid_map
(we are in userns here)
foo# docker run -it --rm busybox
docker: Error response from daemon: OCI runtime create failed: cannot specify gid= mount options for unmapped gid in rootless containers: unknown.

This is caused because runc enables the "rootless mode" when running in user namespace, but "Docker-in-LXD" does not expect runc to enable the rootless mode.

What "rootless mode" does actually

  1. Honor $XDG_RUNTIME_DIR
  2. Switch the cgroup manager to libcontainer.RootlessCgroupfs
  3. Disable cgroup-specific features such as runc ps and OOM notification
  4. Disable runc checkpoint and runc restore
  5. Make sure config.json contains userns and id mappings if euid != 0
  6. Make sureconfig.json does not contain uid= and gid= for mounts
  7. Write "deny" to /proc/$PID/setgroups if single-entry mapping is specified
  8. Disable additional groups, but actually we don't need to do this. ([TODO] rootless: support spec.Process.User.AdditionalGids #1835)

For "Docker-in-LXD", we need none of them, because runc is already executed in userns and cgroups is also available.

In #1688, we enabled the "rootless mode" in userns so as to support rootless img/buildkit/buildah/containerd/docker/podman, but actually we only need 1, 2, and 3 for these usecases.

Proposal

Step 1: fix Docker-in-LXD regression (PR: #1833 / Closed)

Change isRootless as follows:

func isRootless(context *cli.Context) (bool, error) {
  if context != nil {
  ...
  }
  u := os.Getenv("USER")
  return u != "" && u != "root"
}
  • When runc is executed in userns via Docker-in-LXD, isRootless() returns false, because LXD would set $USER to "root".
  • When runc is executed in userns via rootless img/buildkit/buildah/containerd/docker/podman, isRootless() returns true, because we don't change environment variables after unsharing the userns and mapping UID=0 to the current user.
  • When runc is executed as a regular user in the initial namespace, isRootless() returns true
  • When runc is executed as the root in the initial namespace, isRootless() returns false

Corner cases:

  • When runc is executed in userns via Docker-in-rootless-Docker ("rootless dind"), as the root in the contaienr, isRootless() returns false, and unlikely to work. The dockerd in the container would need to specify runc --rootless explicitly in this case. (And user would need to launch dockerd with --rootless explicitly, probably)
  • When runc is executed in "rootless dind", as a non-root in the contaienr, isRootless() returns true and likely to work.

Step 2: refactor the rootless mode: (PR: #1862)

  1. Honor $XDG_RUNTIME_DIR

Probably, this needs to be hornored when u := os.Getenv("USER"); u != "" && u != "root".
(Note that we shouldn't check the UID in the current namespace, because we still want to honor $XDG_RUNTIME_DIR after unsharing the userns and mappping UID=0 to the current user)

Or maybe we can always honor this variable, but potentially it breaks compatibility, when runc is executed as UID=0, $USER=root, $XDG_RUNTIME_DIR=/run/user/0..

  1. Switch the cgroup manager to libcontainer.RootlessCgroupfs

We should detect cgroup availability explicitly by probably trying mkdir /sys/fs/cgroup/foo/bar or something similar. I guess the overhead is negligible.
Or just remove libcontainer.RootlessCgroupfs manager and ignore all errors.

We can just safely use libcontainer.RootlessCgroupfs when we are not the root in the initial namespace.

  1. Disable cgroup-specific features such as runc ps and OOM notification

We could implement runc ps without using cgroups. (in another PR in future)

  1. Disable runc checkpoint and runc restore

I'm not familar with CRIU, but I guess we only need to disable them when runc is executed as non-zero UID, regardless of whether we are in the initial namespace or in a userns.

  1. Make sure config.json contains userns and id mappings if euid != 0
  2. Make sureconfig.json does not contain uid= and gid= for mounts
  3. Write "deny" to /proc/$PID/setgroups if single-entry mapping is specified

We need to disable them only when runc is executed as non-zero UID (TODO: check capabilities instead?), regardless of whether we are in the initial namespace or in a userns.

@AkihiroSuda
Copy link
Member Author

@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Jul 5, 2018

Step 1: fix Docker-in-LXD regression (PR: #1833)
Step 2: refactor the rootless mode

If #1833 is not mergeable, maybe we should skip Step 1 and go to Step 2 directly

( UPDATE: POC for Step 2 is ready: https://github.com/AkihiroSuda/runc/commits/decompose-rootless )

@danail-branekov
Copy link
Contributor

@AkihiroSuda
With the #1833 updated your POC branch looks good with CF Garden - our acceptance tests are green

@AkihiroSuda
Copy link
Member Author

Closed the "Step 1" PR and opened the "Step 2" PR: #1862

@dhiltonp
Copy link

I've been working on getting docker on ChromeOS.

I merged #1862 into master (2adb837), and I'm up and running!

Thanks for your work, @cyphar and @AkihiroSuda!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants