-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cgroup2: how can we support nested containers with domain controllers? #2356
Comments
I think that is the correct solution and what systemd does. Do you think it should be the OCI runtime responsibility to set it up? |
@brauner flagged this issue several years ago. Christian, what does LXC/LXD do in this case? I know we discussed a whole host of possible cgroup trees and ways to resolve them -- but I don't remember what the conclusion was.
This is definitely a philosophical question. In theory I would say no, because it affects other processes on the system in potentially bad ways (first of all, systemd -- but even if the cgroup has been delegated the other processes may be managing it and you'll hit a whole bunch of race conditions). But if there's no way to reasonably support this without doing it, then we don't really have much of a choice... |
a related issue is how to exec into a container that creates a sub-cgroup. I've seen the issue with systemd containers, as systemd automatically creates What cgroup should we join on |
👍 |
For the moment (given runc's architecture) it should be the pid1, which we generally treat as "the source of truth" for the container state. This does mean that the information we currently store in But for the sake of posterity (and I have mentioned this a few times over the years), using pid1 as the source of truth directly does have quite a few disadvantages. Most notably, the container process can try to trick you into (for instance) joining a user namespace it created where it has more privileges than you expected. A nicer solution would be to stash information in a temporary mount namespace, but this is currently non-trivial on modern Linux. I'm mostly just mentioning this because it's something I'd like to change, even though it doesn't really affect this particular issue. |
How is a cgroup associated with a user namespace? |
It looks like this change requires modifying the state file for containers and switching to trusting the pid1 of the container. My comment was more generally about why it can be a little hairy to trust pid1 in that manner, and that we should work on improving that at some point. I wasn't saying that there's a particular issue in this case. |
PR: #2416 |
https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#no-internal-process-constraint
http://man7.org/linux/man-pages/man7/cgroups.7.html
The constraint seems blocker toward supporting nested containers like dind and kind.
The situation is same on crun as well.
@giuseppe @kolyshkin @vbatts Thoughts?
A workaround is to specify an entrypoint script that moves the processes in the namespaced-root cgroup to another cgroup.
The text was updated successfully, but these errors were encountered: