Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgroup namespaces: ignore the mount.Root if we have cgroup namespaces #617

Closed
wants to merge 1 commit into from

Conversation

hallyn
Copy link
Contributor

@hallyn hallyn commented Mar 3, 2016

In a cgroup namespace, you can mount cgroupfs, and your namespace
root (say /docker1) becomes the root of the cgroup filesystem. This
shows up as field 3 in the mountinfo. This is unfortunately
ambiguous with a cgroupfs bind mount, and in this case we cannot use
that root as a prefix for our cgroup, since as far as we are concerned
our cgroup is '/', not '/docker1'.

So if cgroup namespaces are enabled (/proc/$$/ns/cgroup exists), then
assume that we haven't done any silly cgroupfs bind mount trickery,
and ignore the fs root field.

Signed-off-by: Serge Hallyn [email protected]

@mrunalp
Copy link
Contributor

mrunalp commented Mar 3, 2016

LGTM

@mrunalp
Copy link
Contributor

mrunalp commented Mar 3, 2016

@hallyn Are cgroups namespaces expected to be in 4.6? (BTW, thanks for your upstream work in reviving the patchset).

@hallyn
Copy link
Contributor Author

hallyn commented Mar 3, 2016

@mrunalp That's the hope :) They're in linux-next at the moment.

@mrunalp
Copy link
Contributor

mrunalp commented Mar 3, 2016

Awesome :) Looking forward to it!

Sent from my iPhone

On Mar 2, 2016, at 11:05 PM, Serge Hallyn [email protected] wrote:

@mrunalp That's the hope :) They're in linux-next at the moment.


Reply to this email directly or view it on GitHub.

@mlaventure
Copy link
Contributor

Shouldn't this also check that cgroups ns was enabled for that container? Or does this work even in the initial cgroup namespace?

@hallyn
Copy link
Contributor Author

hallyn commented Mar 3, 2016 via email

@hqhq
Copy link
Contributor

hqhq commented Mar 5, 2016

I simply assume that if cgroup namespaces are available, then they will be used

I'm afraid we don't have this assumption now, namespaces are configurable, this assumption probably would break backward compatibility on new kernel, right?

@cyphar
Copy link
Member

cyphar commented Mar 7, 2016

@hqhq I'm fairly namespace joining used to be kernel-version backwards compatible (we don't join namespaces which don't exist and only emit warnings). I'm not sure now (reading over the code, it doesn't look like that's the case anymore).

@hqhq
Copy link
Contributor

hqhq commented Mar 7, 2016

@cyphar I'm not worry about the kernel, but concern we can't use old configs.json on kernel which support cgroup namespace, because old json file has no cgroup ns configed, but runC will just join it and change the cgroup root.
Maybe not a big deal because we are not 1.0 yet, just think we can have a better way to check if we should ignore mount.Root.

If we gonna do a follow up PR to fix that, this PR LGTM.

@mlaventure
Copy link
Contributor

I don't think runC should automatically join namespaces that are not listed in the config. That would be quite un-intuitive and limit some valid use cases.

If I'm not mistaken, atm we only join the namespaces that are actually listed within the config file. Which means that this PR should probably be updated to also check that a cgroup ns was joined before changing the value of m.Root.

@mrunalp
Copy link
Contributor

mrunalp commented Mar 7, 2016

I think we should probably wait for cgroups namespaces to be available before we merge this and other required support to runc.

@cyphar
Copy link
Member

cyphar commented Mar 8, 2016

@hqhq Sorry, when I said "kernel-version backwards compatible" I was referring to runc being compatible with older kernels. I'm not sure that this is the case anymore (reading the code for joining namespaces, it looks like we error out if we can't join any one of them). But yeah, I understand what you meant.

tianon added a commit to tianon/debian-docker that referenced this pull request Mar 9, 2016
@hallyn hallyn force-pushed the 2016-03-02/userns branch 2 times, most recently from fc26d19 to 9285fbd Compare March 26, 2016 07:03
If our cgroups were mounted in a cgroup namespace rooted at /a/b,
then a task in namespaced cgroup / will see / in /proc/self/mountinfo
and /sys/fs/cgroup/freezer/x will actually point to /a/b/x - but
the 'root' field (field 3) in mountinfo will show /a/b.  So as to
not confuse the cgroup calculation, check for nsroot=/a/b in the
last field, which will allow us to disambiguate between a mount
like above, and a bind mount of the /a/b cgroup directory.

Signed-off-by: Serge Hallyn <[email protected]>
@hallyn hallyn force-pushed the 2016-03-02/userns branch from 9285fbd to 76935ff Compare March 29, 2016 03:24
@hallyn
Copy link
Contributor Author

hallyn commented Apr 20, 2016

(closing this until upstream churn settles down)

@cyphar
Copy link
Member

cyphar commented Jun 3, 2016

@hallyn So, cgroup namespaces have been merged (in a very different state to the current proposal here). We can reopen this once #781 and the dependent PRs all get merged.

@hallyn
Copy link
Contributor Author

hallyn commented Jun 3, 2016 via email

@SpComb
Copy link

SpComb commented Jun 16, 2016

It seems that this patch is still required when running docker-in-lxd on Ubuntu xenial's linux 4.4 kernel, with the parent nsroot in /proc/self/mountinfo. Ubuntu's own docker.io packages carry this patch and work, but the upstream docker-engine package fails, note the broken /sys/fs/docker/... cgroup path:

level=error msg="containerd: start container" error="oci runtime error: could not synchronise with container process: stat /sys/fs/docker/13bd9f844f91631c0459d5fabdad8b2e555d7765eefbc7158b4db428b118b008: no such file or directory

This (or some other related issue) causes docker run debian:jessie bash to fail with errors like: docker: Error response from daemon: Container command 'bash' not found or does not exist..

@hallyn
Copy link
Contributor Author

hallyn commented Jun 16, 2016 via email

@SpComb
Copy link

SpComb commented Jun 16, 2016

Cool, do you have a linux-image-generic version number for the package with the fix? I've been testing with 4.4.0-22-generic. I'd be happy to test against the upstream docker-engine package with the fixed kernel.

@hallyn
Copy link
Contributor Author

hallyn commented Jun 16, 2016 via email

@SpComb
Copy link

SpComb commented Jun 17, 2016

I can confirm that with Ubuntu xenial-proposed linux 4.4.0-25-generic including the cgroup namespace updates, /proc/self/mountinfo now shows a path of / within the lxd cgroup namespace, and the unpatched upstream docker-engine 1.11.2 package works correctly within a privileged lxd container.

So assuming it's the same behaviour as on linux 4.6, this should be fine now.

@hallyn
Copy link
Contributor Author

hallyn commented Jun 17, 2016 via email

stefanberger pushed a commit to stefanberger/runc that referenced this pull request Sep 8, 2017
…ional-case

config: Fix 'optional' -> 'OPTIONAL' for process.terminal
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants