libpod: drop hack to set conmon cgroup pids.max=1 #13403

giuseppe · 2022-03-02T10:28:19Z

avoid forcing the pids.max = 1 limit to avoid cleanup processes, which
is racy since the cleanup processes could be triggered by the
container exiting; and it doesn't work with rootless when it cannot
use cgroups, i.e. cgroupfs and cgroup v1).

Closes: #13382

[NO NEW TESTS NEEDED] it doesn't add any new functionality

Signed-off-by: Giuseppe Scrivano [email protected]

Alternative to #13398

avoid forcing the pids.max = 1 limit to avoid cleanup processes, which is racy since the cleanup processes could be triggered by the container exiting; and it doesn't work with rootless when it cannot use cgroups, i.e. cgroupfs and cgroup v1). Closes: containers#13382 [NO NEW TESTS NEEDED] it doesn't add any new functionality Signed-off-by: Giuseppe Scrivano <[email protected]>

flouthoc · 2022-03-02T11:21:59Z

libpod/runtime_pod_linux.go

@@ -300,6 +281,12 @@ func (r *Runtime) removePod(ctx context.Context, p *Pod, removeCtrs, force bool,
 		}
 	}

+	// let's unlock the containers so the cleanup processes can terminate their execution
+	for _, ctr := range ctrs {


Just a small concern here, we are unlocking containers before remove is completed so other podman process have permission to modify containers and lock containers in a pod while the actual pod is being removed from another process.

However I'm unable to think of issues where this race could impact anybody.

flouthoc

LGTM. Just a small concern above but I'm sure its not gonna affect any actual use-case.

openshift-ci · 2022-03-02T11:22:36Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: flouthoc, giuseppe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [flouthoc,giuseppe]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

rhatdan · 2022-03-02T16:01:01Z

@mheon PTAL

mheon · 2022-03-02T18:12:12Z

While using the PID limit may be racy, it’s the best way we found of removing another race - when removing a pod with running containers, we stop the containers, remove them, then remove the pod cgroup. Problem: the stopped containers launch their cleanup processes in the pod cgroup, and that interferes with removing the cgroup. I’m against removing this entirely unless we have another fix for that race because it bit us a lot in CI

…

On Wed, Mar 2, 2022 at 11:04 Daniel J Walsh ***@***.***> wrote: @mheon <https://github.com/mheon> PTAL — Reply to this email directly, view it on GitHub <#13403 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3AOCEQLGFCYCHZE5WKAXDU56GFBANCNFSM5PXALJ3Q> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were mentioned.Message ID: ***@***.***>

giuseppe · 2022-03-02T18:20:42Z

We are now attempting again at removing the cgroup up to 5 seconds

TomSweeneyRedHat · 2022-03-03T22:30:15Z

libpod/runtime_pod_linux.go

+					}
+					time.Sleep(time.Millisecond * 100)
+				}
+				if err != nil {


Any use to having at least a logdebug here if attempts are >= 50 just stating that we ran out of attempts?

rhatdan · 2022-03-10T19:51:34Z

@giuseppe whats up with this one?

mheon · 2022-03-10T20:00:52Z

I'm still a little iffy on moving to a timeout from the current cgroup-based system. It seems like something we should do if we can't set resource limits, as opposed to a default.

giuseppe · 2022-03-11T08:58:09Z

if you prefer there is an alternative version of the fix: #13398

Without this patch though, there is still a potential race where the cleanup process starts and the cgroup limit is applied afterward (that could happen because the container process exits). If it happens, the main podman process and the cleanup process will race for locking the container and it might end up in a deadlock

openshift-ci · 2022-03-23T10:09:08Z

@giuseppe: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

It solves a race where a container cleanup process launched because of the container process exiting normally would hang. It also solves a problem when running as rootless on cgroup v1 since it is not possible to force pids.max = 1 on conmon to limit spawning the cleanup process. Partially copied from containers#13403 Related to: containers#14057 [NO NEW TESTS NEEDED] it doesn't add any new functionality Signed-off-by: Giuseppe Scrivano <[email protected]>

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 2, 2022

This was referenced Mar 2, 2022

libpod: drop warning if cgroup doesn't exist #13398

Merged

dbus-launch and conmon/pids.max problem #13382

Closed

flouthoc reviewed Mar 2, 2022

View reviewed changes

flouthoc approved these changes Mar 2, 2022

View reviewed changes

TomSweeneyRedHat reviewed Mar 3, 2022

View reviewed changes

openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 23, 2022

giuseppe closed this Mar 24, 2022

giuseppe mentioned this pull request Apr 29, 2022

libpod: unlock containers when removing pod #14061

Merged

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libpod: drop hack to set conmon cgroup pids.max=1 #13403

libpod: drop hack to set conmon cgroup pids.max=1 #13403

giuseppe commented Mar 2, 2022

flouthoc Mar 2, 2022

flouthoc left a comment

openshift-ci bot commented Mar 2, 2022

rhatdan commented Mar 2, 2022

mheon commented Mar 2, 2022 via email

giuseppe commented Mar 2, 2022

TomSweeneyRedHat Mar 3, 2022

rhatdan commented Mar 10, 2022

mheon commented Mar 10, 2022

giuseppe commented Mar 11, 2022

openshift-ci bot commented Mar 23, 2022

libpod: drop hack to set conmon cgroup pids.max=1 #13403

libpod: drop hack to set conmon cgroup pids.max=1 #13403

Conversation

giuseppe commented Mar 2, 2022

flouthoc Mar 2, 2022

Choose a reason for hiding this comment

flouthoc left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Mar 2, 2022

rhatdan commented Mar 2, 2022

mheon commented Mar 2, 2022 via email

giuseppe commented Mar 2, 2022

TomSweeneyRedHat Mar 3, 2022

Choose a reason for hiding this comment

rhatdan commented Mar 10, 2022

mheon commented Mar 10, 2022

giuseppe commented Mar 11, 2022

openshift-ci bot commented Mar 23, 2022