Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgroup v1 regression in crun 1.7 and above #1144

Closed
paulnivin opened this issue Feb 17, 2023 · 7 comments · Fixed by #1146
Closed

cgroup v1 regression in crun 1.7 and above #1144

paulnivin opened this issue Feb 17, 2023 · 7 comments · Fixed by #1146

Comments

@paulnivin
Copy link

Describe the bug

In cgroupsV1 only, systems running crun 1.7 and above will leak cpuset, freezer, net_cls, perf_event, net_prio, hugetlb, and misc cgroups due to orphaned scopes. This issue eventually results in the inability to create new containers:

Error: crun: creating cgroup directory `/sys/fs/cgroup/net_cls,net_prio/machine.slice/libpod-9e1d3e4bbaae664b552edaf6cdc5ebf1dd6c04ab89fc15549a8ec97a1e6de342.scope/container`: No such file or directory: OCI runtime attempted to invoke a command that was not found

#1012 is the PR that introduced this regression.

We run our production environment with cgroupsV1 and started experiencing container creation issues following the upgrade to FCOS stable 37.20221127.3.0, which upgraded crun 1.6-2.fc37.x86_64 to crun 1.7-1.fc37.x86_64.

Reproduction steps

  1. boot any FCOS image running crun 1.7 or above with kernel arg systemd.unified_cgroup_hierarchy=0
  2. login as core
  3. run:
for x in {1..5}; do
    sudo podman run -it --rm docker.io/amazon/aws-cli:2.0.41 "--version"
    grep net_cls /proc/cgroups
done
  1. repeat runs of the shell script to observe net_cls cgroups not being cleaned up (other cgroups leak as well)

Expected behavior

net_cls num_cgroups count should not permanently increase.

For example with crun 1.6:

aws-cli/2.0.41 Python/3.7.3 Linux/6.0.8-300.fc37.x86_64 docker/x86_64.amzn.2
net_cls	5	2	1
aws-cli/2.0.41 Python/3.7.3 Linux/6.0.8-300.fc37.x86_64 docker/x86_64.amzn.2
net_cls	5	2	1
aws-cli/2.0.41 Python/3.7.3 Linux/6.0.8-300.fc37.x86_64 docker/x86_64.amzn.2
net_cls	5	2	1
aws-cli/2.0.41 Python/3.7.3 Linux/6.0.8-300.fc37.x86_64 docker/x86_64.amzn.2
net_cls	5	2	1
aws-cli/2.0.41 Python/3.7.3 Linux/6.0.8-300.fc37.x86_64 docker/x86_64.amzn.2
net_cls	5	2	1

Actual behavior

net_cls num_cgroups count permanently increases.

For example with crun 1.7:

aws-cli/2.0.41 Python/3.7.3 Linux/6.0.8-300.fc37.x86_64 docker/x86_64.amzn.2
net_cls	5	3	1
aws-cli/2.0.41 Python/3.7.3 Linux/6.0.8-300.fc37.x86_64 docker/x86_64.amzn.2
net_cls	5	4	1
aws-cli/2.0.41 Python/3.7.3 Linux/6.0.8-300.fc37.x86_64 docker/x86_64.amzn.2
net_cls	5	5	1
aws-cli/2.0.41 Python/3.7.3 Linux/6.0.8-300.fc37.x86_64 docker/x86_64.amzn.2
net_cls	5	6	1
aws-cli/2.0.41 Python/3.7.3 Linux/6.0.8-300.fc37.x86_64 docker/x86_64.amzn.2
net_cls	5	7	1
giuseppe added a commit to giuseppe/crun that referenced this issue Feb 19, 2023
commit 7ea7617 caused a regression on
cgroup v1, and some directories that are created manually are not
cleaned up on container termination causing a cgroup leak.

Fix it by deleting the entire systemd scope directory instead of
deleting only the final cgroup.

Closes: containers#1144

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe
Copy link
Member

thanks for the report and the bisection.

I've opened a PR to address the issue: #1146

@paulnivin
Copy link
Author

Thanks for the quick fix! Would it be worthwhile to add some crun tests to ensure cgroups don't leak in the future? In general, we spawn a lot of containers in production and this is the class of issue we're most likely to hit, whether it turns out to be a regression in kernel space or user space.

giuseppe added a commit to giuseppe/crun that referenced this issue Feb 20, 2023
commit 7ea7617 caused a regression on
cgroup v1, and some directories that are created manually are not
cleaned up on container termination causing a cgroup leak.

Fix it by deleting the entire systemd scope directory instead of
deleting only the final cgroup.

Closes: containers#1144

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe
Copy link
Member

yes, I agree with you that we should add tests to prevent such kind of issues. One problem is that on the CI it will be difficult to test both cgroupv1 and cgroupv2, as well as the two different cgroup managers (cgroupfs and systemd). We will likely need to use VMs to get the different configuration.

Is there any reason for using cgroupv1 instead of cgroup2 though? Such kind of issues are more difficult with cgroupv2, since the entire scope is destroyed by systemd, instead of having to cleanup each individual controller.

giuseppe added a commit to giuseppe/crun that referenced this issue Feb 20, 2023
commit 7ea7617 caused a regression on
cgroup v1, and some directories that are created manually are not
cleaned up on container termination causing a cgroup leak.

Fix it by deleting the entire systemd scope directory instead of
deleting only the final cgroup.

Closes: containers#1144

Signed-off-by: Giuseppe Scrivano <[email protected]>
@rhatdan
Copy link
Member

rhatdan commented Feb 20, 2023

We would like to see cgroupv1 be sunsetted ASAP. Any chance you can move your effors to cgroupV2?

@paulnivin
Copy link
Author

Understood on the difficulty for testing for regressions on both cgroupV1 and cgroupV2, as well as the desire to sunset cgroupV1. We have a very large Kubernetes installation and we need to ensure our migration to cgroupV2 is successful without impacting existing cgroupV1 workloads. We don't yet have a timeline for being fully moved off of cgroupV1. Thanks again for fixing this cgroupV1 regression -- it's much appreciated.

@paulnivin
Copy link
Author

We would like to see cgroupv1 be sunsetted ASAP. Any chance you can move your effors to cgroupV2?

It's also worth noting that Kubernetes just moved cgroupV2 to GA with the release of Kubernetes 1.25 in August 2022, which has yet to be widely adopted. The latest version of AWS EKS is 1.24. EKS 1.25 is scheduled for a March 2023 release.

@giuseppe
Copy link
Member

somehow related to this issue: #1149

I think cgroupv2 in Kubernetes works fine (unless you need control for the real-time processes which is still missing in cgroupv2), also because there is not much cgroupv2 handling in Kubernetes itself but it is mostly taking advantages of other components (container runtimes and OCI runtimes). Moving to GA was done in a quite conservative way.

Most people probably do not care about what cgroup version is used, deprecating cgroupv1 (and being annoying about it) is just a way to push them to the better alternative. Moving to cgroupv2 by default was done in a similar way on Fedora 31, it broke some use cases but IMO the decision paid off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants