Fix discrepancy between systemd and fs cgroup managers #2813

kolyshkin · 2021-02-20T02:52:06Z

All cgroup managers implement Apply() and Set() methods:

Apply is used to create a cgroup (and, in case of systemd, a systemd unit) and/or put a PID into the cgroup (and unit);
Set is used to set various cgroup resources and limits;

fs/fs2 cgroup managers implement the functionality as per the description above.

systemd v1/v2 managers are somewhat weird. They set most of cgroup limits (those that can be projected to systemd unit properties) in Apply(), and they set all cgroup limits in Set -- first indirectly via systemd properties -- same as in Apply, then via cgroupfs (actually backed by fs manager)).

A bit of recent history -- before #2287/#2343 systemd managers used to set unit properties (incl. resources) in Apply, and cgroup "raw" properties in Set. After that PR, they set properties in both Apply and Set.

To reiterate, systemd managers are peculiar since:

they set some properties in Apply() (fs managers do not);
they set all properties again in set.

Since runc calls both Apply() and Set(), this is not really a problem, except for some curious side effects (such as #2812 (comment)). This might be worse for other users (need to be looked at).

The proposed solution is not set and resources in Apply (the actual fix is surprisingly tidy -- see PR #2813).

TODO:

look at how crun does it (maybe @giuseppe can shed some light)
look at how kubernetes uses cgroup manager's Set and Apply.

The text was updated successfully, but these errors were encountered:

odinuge · 2021-02-22T11:15:24Z

+1 on this one!

I have been working with this from the k8s side, so I can help with insight if needed. Anything special you need?

I also think we should strive to adhere to the systemd delegation a bit better as well (eg we don't need to write to cpu.shares/cpu.weight when systemd already will do that): https://systemd.io/CGROUP_DELEGATION/

I think crun only create the slice/scope, and has a separate sub-cgroup (for v2 at least) where they enforce limits. I do however the runc approach is better, since we actually use the systemd-api.

odinuge · 2021-02-22T11:22:06Z

look at how kubernetes uses cgroup manager's Set and Apply.

The changes you are talking about here make sense for k8s, since we always use Set after Apply in order to enforce limits. The systemd-part is however currently broken/has never really made sense, but that should be fixed with: kubernetes/kubernetes#98374 (feel free to look at that PR and/or review if you have any thoughts. The PR will also not be a part of 1.21, so targeting 1.22).

kolyshkin · 2021-02-23T01:41:11Z

look at how kubernetes uses cgroup manager's Set and Apply.

kubelet does NOT use systemd's Set() method
cri-o does not use it either

So we're good

kolyshkin · 2021-02-23T01:48:40Z

One particular thing this fixes is, with #2812 applied, while trying to start container with very low memory limit on cgroup v1, we get:

fs driver:

$ sudo ./runc run --bundle ./tst -d xe3
ERRO[0000] container_linux.go:367: starting container process caused: process_linux.go:505: container init caused: process_linux.go:468: setting cgroup config for procHooks process caused: unable to set memory limit to 140424 (current usage: 2879488, peak usage: 3059712)

systemd driver:

$ sudo ./runc --systemd-cgroup run --bundle ./tst -d xe3
ERRO[0000] container_linux.go:367: starting container process caused: process_linux.go:328: container init was OOM-killed (memory limit too low?) caused: process_linux.go:360: getting the final child's pid from pipe caused: EOF

(this is actually how I found this)

kolyshkin added area/cgroupv1 area/cgroupv2 area/systemd and removed area/cgroupv1 area/cgroupv2 labels Feb 20, 2021

This was referenced Feb 20, 2021

libct/cgroups/systemd: don't set limits in Apply #2814

Merged

Better handle hitting the memory limit #2812

Merged

mrunalp closed this as completed in #2814 Feb 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix discrepancy between systemd and fs cgroup managers #2813

Fix discrepancy between systemd and fs cgroup managers #2813

kolyshkin commented Feb 20, 2021 •

edited

Loading

odinuge commented Feb 22, 2021

odinuge commented Feb 22, 2021 •

edited

Loading

kolyshkin commented Feb 23, 2021

kolyshkin commented Feb 23, 2021 •

edited

Loading

Fix discrepancy between systemd and fs cgroup managers #2813

Fix discrepancy between systemd and fs cgroup managers #2813

Comments

kolyshkin commented Feb 20, 2021 • edited Loading

odinuge commented Feb 22, 2021

odinuge commented Feb 22, 2021 • edited Loading

kolyshkin commented Feb 23, 2021

kolyshkin commented Feb 23, 2021 • edited Loading

kolyshkin commented Feb 20, 2021 •

edited

Loading

odinuge commented Feb 22, 2021 •

edited

Loading

kolyshkin commented Feb 23, 2021 •

edited

Loading