-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: add support for cgroup namespace #397
Conversation
seems fine. Is this needing to wait on upstream linux or runc? |
This has been merged into Linux and will be released in |
oh nice. That smooth the transition for sure. On Mon, Apr 25, 2016 at 11:35 AM, Aleksa Sarai [email protected]
|
👍 We can merge this and add it to runc once released. |
On Mon, Apr 25, 2016 at 08:14:24AM -0700, Aleksa Sarai wrote:
We also want docs in config-linux.md#namespaces, and possibly an |
3e7752b
to
bc0d866
Compare
@wking I've added examples and explanations in the docs. |
@@ -33,6 +33,7 @@ The following parameters can be specified to setup namespaces: | |||
* **`ipc`** processes inside the container will only be able to communicate to other processes inside the same container via system level IPC | |||
* **`uts`** the container will be able to have its own hostname and domain name | |||
* **`user`** the container will be able to remap user and group IDs from the host to local users and groups within the container | |||
* **`cgroup`** the container will have an isolated cgroup hierarchy that it can manage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference 4.6 as the release that landed this feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
How do we handle incompatibility with new kernel features as of now? Is the Spec attempting to provide some means of discovery of the features that a given kernel supports? |
@vishh I think probably leave that to the runtime. |
@mrunalp: That would affect client portability right? Is cross-runtime On Mon, Apr 25, 2016 at 2:30 PM, Mrunal Patel [email protected]
|
@vishh We can make it clear in the spec that a runtime implementation must fail if it can't create all of the requested namespaces. Anything outside of that should be left to the runtime to handle (such as figuring out whether or not it can create the namespaces). |
bc0d866
to
9926619
Compare
* **`ipc`** processes inside the container will only be able to communicate to other processes inside the same container via system level IPC. Support for this was added in Linux 2.6.19. | ||
* **`uts`** the container will be able to have its own hostname and domain name. Support for this was added in Linux 2.6.19. | ||
* **`user`** the container will be able to remap user and group IDs from the host to local users and groups within the container. Support for this was added in Linux 3.8. | ||
* **`cgroup`** the container will have an isolated cgroup hierarchy that it can manage. Support for this was added in Linux 4.6. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we modify the text here - that it sees an isolated/virtualized view of the cgroup heirarchy? The manage portion doesn't quite work well yet ;)
We can use text from here https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d4021f6cd41f03017f831b3d40b0067bed54893d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, it was a bit ambitious for a description. ;)
9926619
to
f0e4d33
Compare
|
||
* **`path`** *(string, optional)* - path to namespace file in the [runtime mount namespace](glossary.md#runtime-namespace) | ||
|
||
If a path is specified, that particular file is used to join that type of namespace. | ||
Also, when a path is specified, a runtime MUST assume that the setup for that particular namespace has already been done and error out if the config specifies anything else related to that namespace. | ||
|
||
If the runtime is unable to create or join all of the requested namespaces, it | ||
MUST fail so as not to lull the user into a false sense of security. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can drop 'so as not to lull the user into a false sense of security'. Otherwise looks fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix'd.
79313a4
to
76205d0
Compare
LGTM |
|
||
* **`path`** *(string, optional)* - path to namespace file in the [runtime mount namespace](glossary.md#runtime-namespace) | ||
|
||
If a path is specified, that particular file is used to join that type of namespace. | ||
Also, when a path is specified, a runtime MUST assume that the setup for that particular namespace has already been done and error out if the config specifies anything else related to that namespace. | ||
|
||
If the runtime is unable to create or join ALL of the requested namespaces, it | ||
MUST fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with
runtime.md`, we should probably use “generate an error”. Although @vishh's concern was broader than namespaces, so I think we might want to put this language in step 2 of the lifecycle.
Also, one line per sentence ;).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, one line per sentence ;).
Eww. 80 characters is objectively the right wrapping. ;)
But yes, I'll update the lifecycle documentation then.
76205d0
to
058860b
Compare
On Fri, May 27, 2016 at 05:02:18AM -0700, Aleksa Sarai wrote: Heh, f0e4d33 → a021e63 becomes much more excited about isolation (from Anyhow, the tip commit (058860b) looks good (and can be cherry-picked I'm not sure about the penultimate commit; will leave an inline comment. |
|
||
* **`path`** *(string, optional)* - path to namespace file in the [runtime mount namespace](glossary.md#runtime-namespace) | ||
|
||
If a path is specified, that particular file is used to join that type of namespace. | ||
Also, when a path is specified, a runtime MUST assume that the setup for that particular namespace has already been done and error out if the config specifies anything else related to that namespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is new with your recent reroll, and rolls back #158. What has changed since #158? The motivation for this line was “we may support join-and-tweak if someone gives a convincing use-case, but we're banning it for now”, but you're removing it (in a drive-by change?) while adding a cgroup
namespace, and I don't see the connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Urgh, looks like it was a dodgy rebase. I'll fix this up.
EDIT: Fixed.
@@ -35,6 +35,7 @@ The lifecycle describes the timeline of events that happen from when a container | |||
1. OCI compliant runtime's `create` command is invoked with a reference to the location of the bundle and a unique identifier. | |||
How these references are passed to the runtime is an implementation detail. | |||
2. The container's runtime environment MUST be created according to the configuration in [`config.json`](config.md). | |||
If the runtime is unable to create the runtimme environment specified in the [`config.json`](config.md) it MUST generate an error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“runtimme” → “runtime”.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe just drop that second “runtime” instances and say “… the environment specified…”.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, you probably want a comma after config.json
: “If … config.json
, it MUST generate an error.”
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. And I fixed up the email madness.
Everything through 5d3c351 looks pretty good to me (just a few
copy-edit suggestions for the tip commit).
And you're using two different emails, in case you wanted to adjust
those before this lands.
|
5d3c351
to
3ce281c
Compare
Everything through 3ce281c looks good to me.
|
* **`ipc`** processes inside the container will only be able to communicate to other processes inside the same container via system level IPC | ||
* **`uts`** the container will be able to have its own hostname and domain name | ||
* **`user`** the container will be able to remap user and group IDs from the host to local users and groups within the container | ||
* **`pid`** processes inside the container will only be able to see other processes inside the same container. Support for this was added in Linux 2.6.24. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One sentence per line. Otherwise looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Damn, I swore I fixed this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if you can do this with lists in md @mrunalp
As discussed in the call, my $0.02 on whether to include kernel versions of required features in the spec, I don't think it's really strictly necessary for the spec -- having a separate "cheat sheet" might make sense (mapping features to minimum kernel versions), but if the information is reasonably readily available elsewhere, then I think it's sufficient for the tools to simply error out when unsupported features are used (relying more on the runtime check than fuzzy kernel version matching, which RedHat has a habit of throwing a wrench in 😄). |
Yes, we are not recording the history of the linux kernel in this spec. |
@cyphar could you update the PR to address the comments? We kinda need this for 1.0 :) |
I wouldn't say that this is a blocker for 1.0. Its adding an option and does not affect schema or anything so it can come at any time. |
Yeah, not a hard blocker but will be nice to have since it is so close to being merged. |
So you want me to remove all of the "This is available since Linux X.Y.Z" sentences? |
@cyphar Yes. That's what we discussed and agreed to in the OCI call yesterday. |
@opencontainers/runtime-spec-maintainers There, I've removed the kernel versions. |
4eb65e7
to
d514aad
Compare
The cgroup namespace is a new kernel feature available in 4.6+ that allows a container to isolate its cgroup hierarchy. This currently only allows for hiding information from /proc/self/cgroup, and mounting cgroupfs as an unprivileged user. In the future, this namespace may allow for subtree management by a container. Signed-off-by: Aleksa Sarai <[email protected]>
Make it clear that if a runtime cannot set up an environment that *precisely* matches the config.json provided, it must generate an error. This is important because not doing this can cause security issues. Signed-off-by: Aleksa Sarai <[email protected]>
The cgroup namespace is a new kernel feature available in 4.6+ that
allows a container to isolate its cgroup hierarchy. This currently only
allows you to hide information from /proc/self/cgroup. But I'm currently
working with upstream on expanding it so that you can modify your
hierarchy inside a user namespace (even a rootless container).
Signed-off-by: Aleksa Sarai [email protected]