-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libpod: drop hack to set conmon cgroup pids.max=1 #13403
Conversation
avoid forcing the pids.max = 1 limit to avoid cleanup processes, which is racy since the cleanup processes could be triggered by the container exiting; and it doesn't work with rootless when it cannot use cgroups, i.e. cgroupfs and cgroup v1). Closes: containers#13382 [NO NEW TESTS NEEDED] it doesn't add any new functionality Signed-off-by: Giuseppe Scrivano <[email protected]>
@@ -300,6 +281,12 @@ func (r *Runtime) removePod(ctx context.Context, p *Pod, removeCtrs, force bool, | |||
} | |||
} | |||
|
|||
// let's unlock the containers so the cleanup processes can terminate their execution | |||
for _, ctr := range ctrs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small concern here, we are unlocking containers
before remove is completed so other podman
process have permission to modify containers and lock containers in a pod
while the actual pod
is being removed from another process.
However I'm unable to think of issues where this race could impact anybody.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just a small concern above but I'm sure its not gonna affect any actual use-case.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: flouthoc, giuseppe The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@mheon PTAL |
While using the PID limit may be racy, it’s the best way we found of
removing another race - when removing a pod with running containers, we
stop the containers, remove them, then remove the pod cgroup. Problem: the
stopped containers launch their cleanup processes in the pod cgroup, and
that interferes with removing the cgroup. I’m against removing this
entirely unless we have another fix for that race because it bit us a lot
in CI
…On Wed, Mar 2, 2022 at 11:04 Daniel J Walsh ***@***.***> wrote:
@mheon <https://github.com/mheon> PTAL
—
Reply to this email directly, view it on GitHub
<#13403 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3AOCEQLGFCYCHZE5WKAXDU56GFBANCNFSM5PXALJ3Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
We are now attempting again at removing the cgroup up to 5 seconds |
} | ||
time.Sleep(time.Millisecond * 100) | ||
} | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any use to having at least a logdebug here if attempts are >= 50 just stating that we ran out of attempts?
@giuseppe whats up with this one? |
I'm still a little iffy on moving to a timeout from the current cgroup-based system. It seems like something we should do if we can't set resource limits, as opposed to a default. |
if you prefer there is an alternative version of the fix: #13398 Without this patch though, there is still a potential race where the cleanup process starts and the cgroup limit is applied afterward (that could happen because the container process exits). If it happens, the main podman process and the cleanup process will race for locking the container and it might end up in a deadlock |
@giuseppe: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
It solves a race where a container cleanup process launched because of the container process exiting normally would hang. It also solves a problem when running as rootless on cgroup v1 since it is not possible to force pids.max = 1 on conmon to limit spawning the cleanup process. Partially copied from containers#13403 Related to: containers#14057 [NO NEW TESTS NEEDED] it doesn't add any new functionality Signed-off-by: Giuseppe Scrivano <[email protected]>
It solves a race where a container cleanup process launched because of the container process exiting normally would hang. It also solves a problem when running as rootless on cgroup v1 since it is not possible to force pids.max = 1 on conmon to limit spawning the cleanup process. Partially copied from containers#13403 Related to: containers#14057 [NO NEW TESTS NEEDED] it doesn't add any new functionality Signed-off-by: Giuseppe Scrivano <[email protected]>
avoid forcing the pids.max = 1 limit to avoid cleanup processes, which
is racy since the cleanup processes could be triggered by the
container exiting; and it doesn't work with rootless when it cannot
use cgroups, i.e. cgroupfs and cgroup v1).
Closes: #13382
[NO NEW TESTS NEEDED] it doesn't add any new functionality
Signed-off-by: Giuseppe Scrivano [email protected]
Alternative to #13398