cgroup v1 regression in crun 1.7 and above #1144

paulnivin · 2023-02-17T21:01:48Z

Describe the bug

In cgroupsV1 only, systems running crun 1.7 and above will leak cpuset, freezer, net_cls, perf_event, net_prio, hugetlb, and misc cgroups due to orphaned scopes. This issue eventually results in the inability to create new containers:

Error: crun: creating cgroup directory `/sys/fs/cgroup/net_cls,net_prio/machine.slice/libpod-9e1d3e4bbaae664b552edaf6cdc5ebf1dd6c04ab89fc15549a8ec97a1e6de342.scope/container`: No such file or directory: OCI runtime attempted to invoke a command that was not found

#1012 is the PR that introduced this regression.

We run our production environment with cgroupsV1 and started experiencing container creation issues following the upgrade to FCOS stable 37.20221127.3.0, which upgraded crun 1.6-2.fc37.x86_64 to crun 1.7-1.fc37.x86_64.

Reproduction steps

boot any FCOS image running crun 1.7 or above with kernel arg systemd.unified_cgroup_hierarchy=0
login as core
run:

for x in {1..5}; do
    sudo podman run -it --rm docker.io/amazon/aws-cli:2.0.41 "--version"
    grep net_cls /proc/cgroups
done

repeat runs of the shell script to observe net_cls cgroups not being cleaned up (other cgroups leak as well)

Expected behavior

net_cls num_cgroups count should not permanently increase.

For example with crun 1.6:

aws-cli/2.0.41 Python/3.7.3 Linux/6.0.8-300.fc37.x86_64 docker/x86_64.amzn.2
net_cls	5	2	1
aws-cli/2.0.41 Python/3.7.3 Linux/6.0.8-300.fc37.x86_64 docker/x86_64.amzn.2
net_cls	5	2	1
aws-cli/2.0.41 Python/3.7.3 Linux/6.0.8-300.fc37.x86_64 docker/x86_64.amzn.2
net_cls	5	2	1
aws-cli/2.0.41 Python/3.7.3 Linux/6.0.8-300.fc37.x86_64 docker/x86_64.amzn.2
net_cls	5	2	1
aws-cli/2.0.41 Python/3.7.3 Linux/6.0.8-300.fc37.x86_64 docker/x86_64.amzn.2
net_cls	5	2	1

Actual behavior

net_cls num_cgroups count permanently increases.

For example with crun 1.7:

aws-cli/2.0.41 Python/3.7.3 Linux/6.0.8-300.fc37.x86_64 docker/x86_64.amzn.2
net_cls	5	3	1
aws-cli/2.0.41 Python/3.7.3 Linux/6.0.8-300.fc37.x86_64 docker/x86_64.amzn.2
net_cls	5	4	1
aws-cli/2.0.41 Python/3.7.3 Linux/6.0.8-300.fc37.x86_64 docker/x86_64.amzn.2
net_cls	5	5	1
aws-cli/2.0.41 Python/3.7.3 Linux/6.0.8-300.fc37.x86_64 docker/x86_64.amzn.2
net_cls	5	6	1
aws-cli/2.0.41 Python/3.7.3 Linux/6.0.8-300.fc37.x86_64 docker/x86_64.amzn.2
net_cls	5	7	1

The text was updated successfully, but these errors were encountered:

commit 7ea7617 caused a regression on cgroup v1, and some directories that are created manually are not cleaned up on container termination causing a cgroup leak. Fix it by deleting the entire systemd scope directory instead of deleting only the final cgroup. Closes: containers#1144 Signed-off-by: Giuseppe Scrivano <[email protected]>

giuseppe · 2023-02-19T21:23:13Z

thanks for the report and the bisection.

I've opened a PR to address the issue: #1146

paulnivin · 2023-02-19T22:12:26Z

Thanks for the quick fix! Would it be worthwhile to add some crun tests to ensure cgroups don't leak in the future? In general, we spawn a lot of containers in production and this is the class of issue we're most likely to hit, whether it turns out to be a regression in kernel space or user space.

commit 7ea7617 caused a regression on cgroup v1, and some directories that are created manually are not cleaned up on container termination causing a cgroup leak. Fix it by deleting the entire systemd scope directory instead of deleting only the final cgroup. Closes: containers#1144 Signed-off-by: Giuseppe Scrivano <[email protected]>

giuseppe · 2023-02-20T09:17:57Z

yes, I agree with you that we should add tests to prevent such kind of issues. One problem is that on the CI it will be difficult to test both cgroupv1 and cgroupv2, as well as the two different cgroup managers (cgroupfs and systemd). We will likely need to use VMs to get the different configuration.

Is there any reason for using cgroupv1 instead of cgroup2 though? Such kind of issues are more difficult with cgroupv2, since the entire scope is destroyed by systemd, instead of having to cleanup each individual controller.

commit 7ea7617 caused a regression on cgroup v1, and some directories that are created manually are not cleaned up on container termination causing a cgroup leak. Fix it by deleting the entire systemd scope directory instead of deleting only the final cgroup. Closes: containers#1144 Signed-off-by: Giuseppe Scrivano <[email protected]>

rhatdan · 2023-02-20T11:13:47Z

We would like to see cgroupv1 be sunsetted ASAP. Any chance you can move your effors to cgroupV2?

paulnivin · 2023-02-21T18:40:32Z

Understood on the difficulty for testing for regressions on both cgroupV1 and cgroupV2, as well as the desire to sunset cgroupV1. We have a very large Kubernetes installation and we need to ensure our migration to cgroupV2 is successful without impacting existing cgroupV1 workloads. We don't yet have a timeline for being fully moved off of cgroupV1. Thanks again for fixing this cgroupV1 regression -- it's much appreciated.

paulnivin · 2023-02-21T22:13:45Z

We would like to see cgroupv1 be sunsetted ASAP. Any chance you can move your effors to cgroupV2?

It's also worth noting that Kubernetes just moved cgroupV2 to GA with the release of Kubernetes 1.25 in August 2022, which has yet to be widely adopted. The latest version of AWS EKS is 1.24. EKS 1.25 is scheduled for a March 2023 release.

giuseppe · 2023-02-23T10:23:04Z

somehow related to this issue: #1149

I think cgroupv2 in Kubernetes works fine (unless you need control for the real-time processes which is still missing in cgroupv2), also because there is not much cgroupv2 handling in Kubernetes itself but it is mostly taking advantages of other components (container runtimes and OCI runtimes). Moving to GA was done in a quite conservative way.

Most people probably do not care about what cgroup version is used, deprecating cgroupv1 (and being annoying about it) is just a way to push them to the better alternative. Moving to cgroupv2 by default was done in a similar way on Fedora 31, it broke some use cases but IMO the decision paid off.

giuseppe mentioned this issue Feb 19, 2023

cgroup: rmdir the entire systemd scope #1146

Merged

rhatdan closed this as completed in #1146 Feb 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cgroup v1 regression in crun 1.7 and above #1144

cgroup v1 regression in crun 1.7 and above #1144

paulnivin commented Feb 17, 2023

giuseppe commented Feb 19, 2023

paulnivin commented Feb 19, 2023

giuseppe commented Feb 20, 2023

rhatdan commented Feb 20, 2023

paulnivin commented Feb 21, 2023

paulnivin commented Feb 21, 2023

giuseppe commented Feb 23, 2023

cgroup v1 regression in crun 1.7 and above #1144

cgroup v1 regression in crun 1.7 and above #1144

Comments

paulnivin commented Feb 17, 2023

Describe the bug

Reproduction steps

Expected behavior

Actual behavior

giuseppe commented Feb 19, 2023

paulnivin commented Feb 19, 2023

giuseppe commented Feb 20, 2023

rhatdan commented Feb 20, 2023

paulnivin commented Feb 21, 2023

paulnivin commented Feb 21, 2023

giuseppe commented Feb 23, 2023