Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config: Make capabilities, noNewPrivileges, and rlimits Linux-only (again) #835

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 36 additions & 20 deletions config.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,33 +145,48 @@ For all platform-specific configuration values, the scope defined below in the [
* **`env`** (array of strings, OPTIONAL) with the same semantics as [IEEE Std 1003.1-2001's `environ`][ieee-1003.1-2001-xbd-c8.1].
* **`args`** (array of strings, REQUIRED) with similar semantics to [IEEE Std 1003.1-2001 `execvp`'s *argv*][ieee-1003.1-2001-xsh-exec].
This specification extends the IEEE standard in that at least one entry is REQUIRED, and that entry is used with the same semantics as `execvp`'s *file*.
* **`capabilities`** (object, OPTIONAL) is an object containing arrays that specifies the sets of capabilities for the process(es) inside the container.
Valid values are platform-specific.
For example, valid values for Linux are defined in the [capabilities(7)][capabilities.7] man page, such as `CAP_CHOWN`.
Any value which cannot be mapped to a relevant kernel interface MUST cause an error.
capabilities contains the following properties:
* **`effective`** (array of strings, OPTIONAL) - the `effective` field is an array of effective capabilities that are kept for the process.
* **`bounding`** (array of strings, OPTIONAL) - the `bounding` field is an array of bounding capabilities that are kept for the process.
* **`inheritable`** (array of strings, OPTIONAL) - the `inheritable` field is an array of inheritable capabilities that are kept for the process.
* **`permitted`** (array of strings, OPTIONAL) - the `permitted` field is an array of permitted capabilities that are kept for the process.
* **`ambient`** (array of strings, OPTIONAL) - the `ambient` field is an array of ambient capabilities that are kept for the process.
* **`rlimits`** (array of objects, OPTIONAL) allows setting resource limits for a process inside the container.

### <a name="configLinuxAndSolarisProcess" />Linux and Solaris Process

For POSIX-based systems (Linux and Solaris), the `process` object supports the following process-specific properties:

* **`rlimits`** (array of objects, OPTIONAL) allows setting resource limits for the process.
Each entry has the following structure:

* **`type`** (string, REQUIRED) - the platform resource being limited, for example on Linux as defined in the [setrlimit(2)][setrlimit.2] man page.
* **`soft`** (uint64, REQUIRED) - the value of the limit enforced for the corresponding resource.
* **`hard`** (uint64, REQUIRED) - the ceiling for the soft limit that could be set by an unprivileged process.
Only a privileged process (e.g. under Linux: one with the CAP_SYS_RESOURCE capability) can raise a hard limit.
* **`type`** (string, REQUIRED) the platform resource being limited.
* Linux: valid values are defined in the [`getrlimit(2)`][setrlimit.2] man page, such as `RLIMIT_MSGQUEUE`.
* Solaris: valid values are defined in the [`getrlimit(3)`][getrlimit.3] man page, such as `RLIMIT_CORE`.

If `rlimits` contains duplicated entries with same `type`, the runtime MUST error out.
The runtime MUST [generate an error](runtime.md#errors) for any values which cannot be mapped to a relevant kernel interface
For each entry in `rlimits`, a [`getrlimit(3)`][getrlimit.3] on `type` MUST succeed.
For the following properties, `rlim` refers to the status returned by the `getrlimit(3)` call.

* **`noNewPrivileges`** (bool, OPTIONAL) setting `noNewPrivileges` to true prevents the processes in the container from gaining additional privileges.
As an example, the ['no_new_privs'][no-new-privs] article in the kernel documentation has information on how this is achieved using a prctl system call on Linux.
* **`soft`** (uint64, REQUIRED) the value of the limit enforced for the corresponding resource.
`rlim.rlim_cur` MUST match the configured value.
* **`hard`** (uint64, REQUIRED) the ceiling for the soft limit that could be set by an unprivileged process.
`rlim.rlim_max` MUST match the configured value.
Only a privileged process (e.g. one with the `CAP_SYS_RESOURCE` capability) can raise a hard limit.

For Linux-based systems the process structure supports the following process-specific fields.
If `rlimits` contains duplicated entries with same `type`, the runtime MUST [generate an error](runtime.md#errors).

### <a name="configLinuxProcess" />Linux Process

For Linux-based systems, the `process` object supports the following process-specific properties.

* **`apparmorProfile`** (string, OPTIONAL) specifies the name of the AppArmor profile to be applied to processes in the container.
For more information about AppArmor, see [AppArmor documentation][apparmor].
* **`capabilities`** (object, OPTIONAL) is an object containing arrays that specifies the sets of capabilities for the process.
Valid values are defined in the [capabilities(7)][capabilities.7] man page, such as `CAP_CHOWN`.
Any value which cannot be mapped to a relevant kernel interface MUST cause an error.
`capabilities` contains the following properties:

* **`effective`** (array of strings, OPTIONAL) the `effective` field is an array of effective capabilities that are kept for the process.
* **`bounding`** (array of strings, OPTIONAL) the `bounding` field is an array of bounding capabilities that are kept for the process.
* **`inheritable`** (array of strings, OPTIONAL) the `inheritable` field is an array of inheritable capabilities that are kept for the process.
* **`permitted`** (array of strings, OPTIONAL) the `permitted` field is an array of permitted capabilities that are kept for the process.
* **`ambient`** (array of strings, OPTIONAL) the `ambient` field is an array of ambient capabilities that are kept for the process.
* **`noNewPrivileges`** (bool, OPTIONAL) setting `noNewPrivileges` to true prevents the process from gaining additional privileges.
As an example, the [`no_new_privs`][no-new-privs] article in the kernel documentation has information on how this is achieved using a `prctl` system call on Linux.
* **`oomScoreAdj`** *(int, OPTIONAL)* adjusts the oom-killer score in `[pid]/oom_score_adj` for the container process's `[pid]` in a [proc pseudo-filesystem][procfs].
If `oomScoreAdj` is set, the runtime MUST set `oom_score_adj` to the given value.
If `oomScoreAdj` is not set, the runtime MUST NOT change the value of `oom_score_adj`.
Expand Down Expand Up @@ -863,7 +878,8 @@ Here is a full example `config.json` for reference.
[mount.8]: http://man7.org/linux/man-pages/man8/mount.8.html
[mount.8-filesystem-independent]: http://man7.org/linux/man-pages/man8/mount.8.html#FILESYSTEM-INDEPENDENT_MOUNT%20OPTIONS
[mount.8-filesystem-specific]: http://man7.org/linux/man-pages/man8/mount.8.html#FILESYSTEM-SPECIFIC_MOUNT%20OPTIONS
[setrlimit.2]: http://man7.org/linux/man-pages/man2/setrlimit.2.html
[getrlimit.2]: http://man7.org/linux/man-pages/man2/getrlimit.2.html
[getrlimit.3]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/getrlimit.html
[stdin.3]: http://man7.org/linux/man-pages/man3/stdin.3.html
[uts-namespace.7]: http://man7.org/linux/man-pages/man7/namespaces.7.html
[zonecfg.1m]: http://docs.oracle.com/cd/E86824_01/html/E54764/zonecfg-1m.html
6 changes: 3 additions & 3 deletions specs-go/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ type Process struct {
// Capabilities are Linux capabilities that are kept for the process.
Capabilities *LinuxCapabilities `json:"capabilities,omitempty" platform:"linux"`
// Rlimits specifies rlimit options to apply to the process.
Rlimits []LinuxRlimit `json:"rlimits,omitempty" platform:"linux"`
Rlimits []POSIXRlimit `json:"rlimits,omitempty" platform:"linux,solaris"`
// NoNewPrivileges controls whether additional privileges could be gained by processes in the container.
NoNewPrivileges bool `json:"noNewPrivileges,omitempty" platform:"linux"`
// ApparmorProfile specifies the apparmor profile for the container.
Expand Down Expand Up @@ -214,8 +214,8 @@ type LinuxIDMapping struct {
Size uint32 `json:"size"`
}

// LinuxRlimit type and restrictions
type LinuxRlimit struct {
// POSIXRlimit type and restrictions
type POSIXRlimit struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if this should be moved out of the bunch of Linux structures which surround it so that there are POSIX, Linux, Solaris and Windows sections grouped together. Otherwise LGTM (not a maintainer)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if this should be moved out of the bunch of Linux structures which surround…

I don't care and am happy to do whatever maintainers want for type ordering.

// Type of the rlimit to set
Type string `json:"type"`
// Hard is the hard limit for the specified type
Expand Down