Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable rootless mode except RootlessCgMgr when executed as the root in userns (fix Docker-in-LXD regression) #1862

Merged
merged 1 commit into from
Oct 16, 2018

Conversation

AkihiroSuda
Copy link
Member

This PR decomposes libcontainer/configs.Config.Rootless bool into RootlessEUID bool and
RootlessCgroups bool, so as to make "runc-in-userns" to be more compatible with "rootful" runc.

RootlessEUID denotes that runc is being executed as a non-root user (euid != 0) in
the current user namespace. RootlessEUID is almost identical to the former Rootless
except cgroups stuff.

RootlessCgroups denotes that runc is unlikely to have the full access to cgroups.
RootlessCgroups is set to false if runc is executed as the root (euid == 0) in the initial namespace.
Otherwise RootlessCgroups is set to true.
(Hint: if RootlessEUID is true, RootlessCgroups becomes true as well)

When runc is executed as the root (euid == 0) in an user namespace (e.g. by Docker-in-LXD, Podman, Usernetes),
RootlessEUID is set to false but RootlessCgroups is set to true.
So, "runc-in-userns" behaves almost same as "rootful" runc except that cgroups errors are ignored.

This PR does not have any impact on CLI flags and state.json.

Note about CLI:

  • Now runc --rootless=(auto|true|false) CLI flag is only used for setting RootlessCgroups.
  • Now runc spec --rootless is only required when RootlessEUID is set to true.
    For runc-in-userns, runc spec without --rootless should work, when sufficient numbers of
    UID/GID are mapped.

Note about $XDG_RUNTIME_DIR (e.g. /run/user/1000):

  • $XDG_RUNTIME_DIR is ignored if runc is being executed as the root (euid == 0) in the initial namespace, for backward compatibility.
    (/run/runc is used)
  • If runc is executed as the root (euid == 0) in an user namespace, $XDG_RUNTIME_DIR is honored if $USER != "" && $USER != "root".
    This allows unprivileged users to allow execute runc as the root in userns, without mounting writable /run/runc.

Note about state.json:

  • rootless is set to true when RootlessEUID == true && RootlessCgroups == true.

Signed-off-by: Akihiro Suda [email protected]


@AkihiroSuda AkihiroSuda force-pushed the decompose-rootless-pr branch 4 times, most recently from 89d6837 to fceaba2 Compare August 10, 2018 09:31
@AkihiroSuda AkihiroSuda changed the title Disable rootless mode except RootlessCgMgr when executed as the root in userns Disable rootless mode except RootlessCgMgr when executed as the root in userns (fix Docker-in-LXD regression) Aug 10, 2018
@AkihiroSuda AkihiroSuda force-pushed the decompose-rootless-pr branch 3 times, most recently from 26e5251 to 3c63ff4 Compare August 10, 2018 12:24
@rhatdan
Copy link
Contributor

rhatdan commented Aug 21, 2018

@giuseppe PTAL

// IntelRdt specifies settings for Intel RDT/CAT group that the container is placed into
// to limit the resources (e.g., L3 cache) the container has available
IntelRdt *IntelRdt `json:"intel_rdt,omitempty"`

// RootlessEUID specifies is set when the runc was launched with non-zero EUID.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please drop "specifies" or "is set"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

}
// euid = 0, in a userns.
u := os.Getenv("USER")
return u != "" && u != "root"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this be something like?

	u, ok := os.LookupEnv("USER")
	return !ok || u != "root"

Otherwise it breaks current users like Podman (and probably Buildah as well) that don't pass $USER down to the runtime

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@kolyshkin
Copy link
Contributor

this needs a rebase

@AkihiroSuda AkihiroSuda force-pushed the decompose-rootless-pr branch from 0851385 to 3334b2e Compare August 24, 2018 05:19
@AkihiroSuda
Copy link
Member Author

rebased

@cyphar cyphar self-requested a review August 27, 2018 22:52
@cyphar cyphar self-assigned this Aug 27, 2018
@AkihiroSuda
Copy link
Member Author

@giuseppe LGTY?

@giuseppe
Copy link
Member

@AkihiroSuda yes, I've tested it with podman and it works well for me

@thaJeztah
Copy link
Member

/cc @crosbymichael ptal

@giuseppe
Copy link
Member

an error I've seen (not introduced by this patch but that could probably be fixed as part of it), is:

create a rootless user namespace with multiple IDs mapped in, then create a runc container that has:

    "uidMappings": [
      {
        "hostID": 1,
        "containerID": 0,
        "size": 1
      },
      {
        "hostID": 0,
        "containerID": 1,
        "size": 1
      }
    ],
    "gidMappings": [
      {
        "hostID": 1,
        "containerID": 0,
        "size": 1
      },
      {
        "hostID": 0,
        "containerID": 1,
        "size": 1
      }
    ],

if I try to exec (from the first user namespace) into the container, I get:

$ runc exec foo echo hi
exec failed: container_linux.go:337: starting container process caused "process_linux.go:90: adding pid 5652 to cgroups caused \"failed to write 5652 to cgroup.procs: write /sys/fs/cgroup/systemd/user.slice/user-1000.slice/[email protected]/gnome-terminal-server.service/foo/cgroup.procs: permission denied\""

…in userns

This PR decomposes `libcontainer/configs.Config.Rootless bool` into `RootlessEUID bool` and
`RootlessCgroups bool`, so as to make "runc-in-userns" to be more compatible with "rootful" runc.

`RootlessEUID` denotes that runc is being executed as a non-root user (euid != 0) in
the current user namespace. `RootlessEUID` is almost identical to the former `Rootless`
except cgroups stuff.

`RootlessCgroups` denotes that runc is unlikely to have the full access to cgroups.
`RootlessCgroups` is set to false if runc is executed as the root (euid == 0) in the initial namespace.
Otherwise `RootlessCgroups` is set to true.
(Hint: if `RootlessEUID` is true, `RootlessCgroups` becomes true as well)

When runc is executed as the root (euid == 0) in an user namespace (e.g. by Docker-in-LXD, Podman, Usernetes),
`RootlessEUID` is set to false but `RootlessCgroups` is set to true.
So, "runc-in-userns" behaves almost same as "rootful" runc except that cgroups errors are ignored.

This PR does not have any impact on CLI flags and `state.json`.

Note about CLI:
* Now `runc --rootless=(auto|true|false)` CLI flag is only used for setting `RootlessCgroups`.
* Now `runc spec --rootless` is only required when `RootlessEUID` is set to true.
  For runc-in-userns, `runc spec`  without `--rootless` should work, when sufficient numbers of
  UID/GID are mapped.

Note about `$XDG_RUNTIME_DIR` (e.g. `/run/user/1000`):
* `$XDG_RUNTIME_DIR` is ignored if runc is being executed as the root (euid == 0) in the initial namespace, for backward compatibility.
  (`/run/runc` is used)
* If runc is executed as the root (euid == 0) in an user namespace, `$XDG_RUNTIME_DIR` is honored if `$USER != "" && $USER != "root"`.
  This allows unprivileged users to allow execute runc as the root in userns, without mounting writable `/run/runc`.

Note about `state.json`:
* `rootless` is set to true when `RootlessEUID == true && RootlessCgroups == true`.

Signed-off-by: Akihiro Suda <[email protected]>
@AkihiroSuda AkihiroSuda force-pushed the decompose-rootless-pr branch from 3334b2e to 06f789c Compare September 7, 2018 06:05
@AkihiroSuda
Copy link
Member Author

@giuseppe should be fixed in the latest revision

@giuseppe
Copy link
Member

@giuseppe should be fixed in the latest revision

yes thanks, I've verified that it works now

debarshiray added a commit to containers/toolbox that referenced this pull request Sep 20, 2018
Assuming a host UID of 1000, the UID mapping inside the user namespace
created by rootless podman for the toolbox container was:
         0       1000          1
         1     100000      65536

... which was the same as seen from the host:
         0       1000          1
         1     100000      65536

Therefore, when running with an UID of 1000 inside the container, it
got mapped to UID 100999 on the host. That means, for example, files
created by the user inside the container end up looking funny from the
host.

This is addressed by creating another user namespace that's a child of
the initial user namespace created by rootless podman. Assuming a host
UID of 1000, the UID mapping inside this child namespace is:
      1000          0          1
         0          1       1000
      1001       1001      64536

... which when seen from the host is:
      1000       1000          1
         0     100000       1000
      1001     101000      64536

This means that UID 1000 inside the child namespace is mapped to the
same UID 1000 on the host via the intermediate namespace created by
rootless podman. UIDs 0 to 999 inside the child namespace are mapped
to UIDs 100000 to 100999 in the host.

This change requires this runc pull request to work:
opencontainers/runc#1862

As suggested by Giuseppe Scrivano.
@AkihiroSuda
Copy link
Member Author

ping @cyphar

@AkihiroSuda
Copy link
Member Author

This PR seems also needed for Docker-on-Chromebook https://www.reddit.com/r/Crostini/comments/9jabhq/docker_now_working/

@debarshiray
Copy link

Without this PR one can't use rootless podman create to map the host UID into a container, which is a deal-breaker for fedora-toolbox.

@caniszczyk
Copy link
Contributor

ping @opencontainers/runc-maintainers

@AkihiroSuda
Copy link
Member Author

This is being discussed at the dev ML: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/jOaNYre5M40

@crosbymichael
Copy link
Member

I'm reviewing and testing this today

@crosbymichael
Copy link
Member

crosbymichael commented Oct 11, 2018

LGTM

Approved with PullApprove

@mrunalp
Copy link
Contributor

mrunalp commented Oct 15, 2018

I'll review this today.

@mrunalp
Copy link
Contributor

mrunalp commented Oct 16, 2018

LGTM

Approved with PullApprove

@mrunalp mrunalp merged commit a00bf01 into opencontainers:master Oct 16, 2018
@AkihiroSuda
Copy link
Member Author

Thanks Michael and Mrunal!

Could you also review #1880 (very easy) and #1869 (complicated) when you have time?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants