Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add features.md to formalize the runc features JSON #1130

Merged
merged 1 commit into from
Mar 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ DOC_FILES := \
config.md \
config-linux.md \
config-solaris.md \
features.md \
features-linux.md \
glossary.md

default: docs
Expand Down
211 changes: 211 additions & 0 deletions features-linux.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,211 @@
# <a name="linuxFeatures" />Linux Features Document
Copy link
Contributor

@flouthoc flouthoc Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a missing feature flag for IntelRDT , some runtimes supports restricting bandwidth using that. Also runtime-spec has a definition for it.

Not sure maybe something like

"intelrdt": {
  "supported": true
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added .linux.intelRdt.enabled field


This document describes the [Linux-specific section](features.md#platform-specific-features) of the [features document](features.md).

## <a name="linuxFeaturesNamespaces" />Namespaces

* **`namespaces`** (array of strings, OPTIONAL) The recognized names of the namespaces, including namespaces that might not be supported by the host operating system.
The runtime MUST recognize the elements in this array as the [`type` of `linux.namespaces` objects in `config.json`](config-linux.md#namespaces).

### Example

```json
"namespaces": [
"cgroup",
"ipc",
"mount",
"network",
"pid",
"user",
"uts"
]
```

## <a name="linuxFeaturesCapabilities" />Capabilities

* **`capabilities`** (array of strings, OPTIONAL) The recognized names of the capabilities, including capabilities that might not be supported by the host operating system.
The runtime MUST recognize the elements in this array in the [`process.capabilities` object of `config.json`](config.md#linux-process).

### Example

```json
"capabilities": [
"CAP_CHOWN",
"CAP_DAC_OVERRIDE",
"CAP_DAC_READ_SEARCH",
"CAP_FOWNER",
"CAP_FSETID",
"CAP_KILL",
"CAP_SETGID",
"CAP_SETUID",
"CAP_SETPCAP",
"CAP_LINUX_IMMUTABLE",
"CAP_NET_BIND_SERVICE",
"CAP_NET_BROADCAST",
"CAP_NET_ADMIN",
"CAP_NET_RAW",
"CAP_IPC_LOCK",
"CAP_IPC_OWNER",
"CAP_SYS_MODULE",
"CAP_SYS_RAWIO",
"CAP_SYS_CHROOT",
"CAP_SYS_PTRACE",
"CAP_SYS_PACCT",
"CAP_SYS_ADMIN",
"CAP_SYS_BOOT",
"CAP_SYS_NICE",
"CAP_SYS_RESOURCE",
"CAP_SYS_TIME",
"CAP_SYS_TTY_CONFIG",
"CAP_MKNOD",
"CAP_LEASE",
"CAP_AUDIT_WRITE",
"CAP_AUDIT_CONTROL",
"CAP_SETFCAP",
"CAP_MAC_OVERRIDE",
"CAP_MAC_ADMIN",
"CAP_SYSLOG",
"CAP_WAKE_ALARM",
"CAP_BLOCK_SUSPEND",
"CAP_AUDIT_READ",
"CAP_PERFMON",
"CAP_BPF",
"CAP_CHECKPOINT_RESTORE"
]
```

## <a name="linuxFeaturesCgroup" />Cgroup

**`cgroup`** (object, OPTIONAL) represents the runtime's implementation status of cgroup managers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be any field to list down which controllers are implemented by runtime for instance its not necessary that a given runtime must or must not implement support for a specific controller. Example RDMA is not implemented by all runtimes so entry for RDMA becomes no-op.

Again not sure about just adding it here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to define the set of the controller names.

e.g., "freezer" is a real controller in v1 but not a real controller in v2. And the name of the block I/O controller differs across V1 ("blkio") and V2 ("io").

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AkihiroSuda hmm I agree and fair point but just a small doubt if there is a feature info, container manager should be able to delegate task to the right runtime.

A practical use case would be.

  • Container Manger supports delegating container generation to multiple runtimes but if spec requests setting a particular controller example RDMA it queries all supported runtime and delegate spec to the runtime which implements RDMA for every other spec it keeps using the default runtime.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using an array to specify which controllers are supported by a cgroup manager instead? This would solve that the controller names are different. Also not all controllers might be supported by all cgroup manager implementations.

"cgroup": {
  "v1": ["cpu", "cpuset", "blkio"],
  "v2": ["io", "memory", "devices"]
  "systemd": ["cpu", "memory", "cpuset"],
  "systemdUser":  ["cpu", "memory", "cpuset"]
}

In youki we already have the concept of a pseudo controller for devices, unified and freezer, so in my opinion a name in that array must not always map to a real controller in all implementations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated PR to add rdma property.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using an array to specify which controllers are supported by a cgroup manager instead?

  • v1 is dying and probably won't have new controllers, so I don't think we need to track v1 controller list (except rdma)

  • v2 may have new controllers, but probably we will support such new controllers via the unified struct without defining a new dedicated struct, so we don't need to track v2 controller list, either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we also add a "disabled": "true|false" to indicate whether the runtime supports disabling cgroups? crun does it with crun --cgroup-manager=disabled .... That is useful when running as a systemd service to reuse the existing cgroup.

Irrelevant to the cgroup version of the host operating system.

* **`v1`** (bool, OPTIONAL) represents whether the runtime supports cgroup v1.
* **`v2`** (bool, OPTIONAL) represents whether the runtime supports cgroup v2.
* **`systemd`** (bool, OPTIONAL) represents whether the runtime supports system-wide systemd cgroup manager.
* **`systemdUser`** (bool, OPTIONAL) represents whether the runtime supports user-scoped systemd cgroup manager.
* **`rdma`** (bool, OPTIONAL) represents whether the runtime supports RDMA cgroup controller.

### Example

```json
"cgroup": {
"v1": true,
"v2": true,
"systemd": true,
"systemdUser": true,
"rdma": false
}
```

## <a name="linuxFeaturesSeccomp" />Seccomp

**`seccomp`** (object, OPTIONAL) represents the runtime's implementation status of seccomp.
Irrelevant to the kernel version of the host operating system.

* **`enabled`** (bool, OPTIONAL) represents whether the runtime supports seccomp.
* **`actions`** (array of strings, OPTIONAL) The recognized names of the seccomp actions.
The runtime MUST recognize the elements in this array in the [`syscalls[].action` property of the `linux.seccomp` object in `config.json`](config-linux.md#seccomp).
* **`operators`** (array of strings, OPTIONAL) The recognized names of the seccomp operators.
The runtime MUST recognize the elements in this array in the [`syscalls[].args[].op` property of the `linux.seccomp` object in `config.json`](config-linux.md#seccomp).
* **`archs`** (array of strings, OPTIONAL) The recognized names of the seccomp architectures.
The runtime MUST recognize the elements in this array in the [`architectures` property of the `linux.seccomp` object in `config.json`](config-linux.md#seccomp).
* **`knownFlags`** (array of strings, OPTIONAL) The recognized names of the seccomp flags.
The runtime MUST recognize the elements in this array in the [`flags` property of the `linux.seccomp` object in `config.json`](config-linux.md#seccomp).
* **`supportedFlags`** (array of strings, OPTIONAL) The recognized and supported names of the seccomp flags.
This list may be a subset of `knownFlags` due to some flags not supported by the current kernel and/or libseccomp.
The runtime MUST recognize and support the elements in this array in the [`flags` property of the `linux.seccomp` object in `config.json`](config-linux.md#seccomp).

### Example

```json
"seccomp": {
"enabled": true,
"actions": [
"SCMP_ACT_ALLOW",
"SCMP_ACT_ERRNO",
"SCMP_ACT_KILL",
"SCMP_ACT_LOG",
"SCMP_ACT_NOTIFY",
"SCMP_ACT_TRACE",
"SCMP_ACT_TRAP"
],
"operators": [
"SCMP_CMP_EQ",
"SCMP_CMP_GE",
"SCMP_CMP_GT",
"SCMP_CMP_LE",
"SCMP_CMP_LT",
"SCMP_CMP_MASKED_EQ",
"SCMP_CMP_NE"
],
"archs": [
"SCMP_ARCH_AARCH64",
"SCMP_ARCH_ARM",
"SCMP_ARCH_MIPS",
"SCMP_ARCH_MIPS64",
"SCMP_ARCH_MIPS64N32",
"SCMP_ARCH_MIPSEL",
"SCMP_ARCH_MIPSEL64",
"SCMP_ARCH_MIPSEL64N32",
"SCMP_ARCH_PPC",
"SCMP_ARCH_PPC64",
"SCMP_ARCH_PPC64LE",
"SCMP_ARCH_S390",
"SCMP_ARCH_S390X",
"SCMP_ARCH_X32",
"SCMP_ARCH_X86",
"SCMP_ARCH_X86_64"
],
Comment on lines +141 to +158
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is the runtime supposed to query this list from libseccomp? Is it expected to hardcode the list?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hardcode

Yes: https://github.com/opencontainers/runc/blob/v1.1.7/libcontainer/seccomp/config.go#L54-L71
But I guess it is not difficult to add a new function to libseccomp.

You can also keep the list nil.

"knownFlags": [
"SECCOMP_FILTER_FLAG_LOG"
],
"supportedFlags": [
"SECCOMP_FILTER_FLAG_LOG"
]
}
```

## <a name="linuxFeaturesApparmor" />AppArmor

**`apparmor`** (object, OPTIONAL) represents the runtime's implementation status of AppArmor.
Irrelevant to the availability of AppArmor on the host operating system.

* **`enabled`** (bool, OPTIONAL) represents whether the runtime supports AppArmor.

### Example

```json
"apparmor": {
"enabled": true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure but should this be supported instead of enabled.

Suggested change
"enabled": true
"supported": true

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSON is already implemented in runc (experimentally, though), so I'm reluctant to change the field names.

IIRC, I avoided "supported" to avoid confusion with enterprise commercial "support".
An enterprise distributor may provide commercial "support" to subset of the "enabled" features.

Copy link
Contributor

@flouthoc flouthoc Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh i see. This is a really small nit and i don't have a strong opinion on this. 😃

To me it just feels that apparmor is only enabled for a particular spec if spec defines apparmorProfile but by default its only supported by a runtime and not enabled for every spec.

Again i am thinking this too much and its not a big thing. 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think enabled is okay, though I think that we shouldn't care what runc has developed at the moment -- runc features was only merged a day or two ago, so literally nobody is using at the moment (and we can change it until runc 1.2).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, I avoided "supported" to avoid confusion with enterprise commercial "support". An enterprise distributor may provide commercial "support" to subset of the "enabled" features.

Perhaps "available" ? (IMO "enabled" feels like a configuration toggle where most/many of these will really be capabilities that are either usable or they aren't)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I guess https://github.com/opencontainers/runtime-spec/pull/1130/files#diff-445e6d0dc97b93a0f04d914ad7816114ddc175dc462c7eec7961921af17ff0d0R6-R7 is the context I was missing:

The features document is irrelevant to the actual availability of the features in the host operating system. Hence, the content of the feature document SHOULD be determined on the compilation time of the runtime, not on the execution time.

I redact my suggestion ("enabled" seems fine for what this describes), but then also wonder which specific problem this functionality is solving. 🙈

Copy link
Member Author

@AkihiroSuda AkihiroSuda Dec 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder which specific problem this functionality is solving.

Examples of specific problems solved in this PR:

  • A high-level engine can't be aware whether the low-level runtime supports seccomp, so a high-level engine can't avoid "seccomp not supported" error unless the user supplies a custom configuration.
  • A high-level engine can't be aware whether the low-level runtime supports rro mounts, so a high-level engine may accidentally have non-RRO mounts against the user's will. This may result in a severe vulnerability.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW we can have availability property that corresponds to actual availability of the feature on the host OS/hardware, in addition to the enabled property proposed in this PR, but that will be another PR.

As a maintainer of containerd/Moby and its relevant projects, I don't see necessity of having availability detection in the low-level runtime, though, because high-level engine implementations are already capable of detecting such feature availability on the host OS/hardware.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW we can have availability property that corresponds to actual availability of the feature on the host OS/hardware, in addition to the enabled property proposed in this PR, but that will be another PR.

As a maintainer of containerd/Moby and its relevant projects, I don't see necessity of having availability detection in the low-level runtime, though, because high-level engine implementations are already capable of detecting such feature availability on the host OS/hardware.

My original doubt was to replace enabled with supported or availability I don't think adding extra key just for availability would be good idea. So I agree with @AkihiroSuda we should not need extra key for this.

+1 for keeping enabled: true

}
```

## <a name="linuxFeaturesApparmor" />SELinux

**`selinux`** (object, OPTIONAL) represents the runtime's implementation status of SELinux.
Irrelevant to the availability of SELinux on the host operating system.

* **`enabled`** (bool, OPTIONAL) represents whether the runtime supports SELinux.

### Example

```json
"selinux": {
"enabled": true
AkihiroSuda marked this conversation as resolved.
Show resolved Hide resolved
}
```

## <a name="linuxFeaturesIntelRdt" />Intel RDT

**`intelRdt`** (object, OPTIONAL) represents the runtime's implementation status of Intel RDT.
Irrelevant to the availability of Intel RDT on the host operating system.

* **`enabled`** (bool, OPTIONAL) represents whether the runtime supports Intel RDT.

### Example

```json
"intelRdt": {
"enabled": true
}
```
Loading