Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libpod: drop hack to set conmon cgroup pids.max=1 #13403

Closed
wants to merge 1 commit into from

Conversation

giuseppe
Copy link
Member

@giuseppe giuseppe commented Mar 2, 2022

avoid forcing the pids.max = 1 limit to avoid cleanup processes, which
is racy since the cleanup processes could be triggered by the
container exiting; and it doesn't work with rootless when it cannot
use cgroups, i.e. cgroupfs and cgroup v1).

Closes: #13382

[NO NEW TESTS NEEDED] it doesn't add any new functionality

Signed-off-by: Giuseppe Scrivano [email protected]

Alternative to #13398

avoid forcing the pids.max = 1 limit to avoid cleanup processes, which
is racy since the cleanup processes could be triggered by the
container exiting; and it doesn't work with rootless when it cannot
use cgroups, i.e. cgroupfs and cgroup v1).

Closes: containers#13382

[NO NEW TESTS NEEDED] it doesn't add any new functionality

Signed-off-by: Giuseppe Scrivano <[email protected]>
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 2, 2022
@@ -300,6 +281,12 @@ func (r *Runtime) removePod(ctx context.Context, p *Pod, removeCtrs, force bool,
}
}

// let's unlock the containers so the cleanup processes can terminate their execution
for _, ctr := range ctrs {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small concern here, we are unlocking containers before remove is completed so other podman process have permission to modify containers and lock containers in a pod while the actual pod is being removed from another process.

However I'm unable to think of issues where this race could impact anybody.

Copy link
Collaborator

@flouthoc flouthoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just a small concern above but I'm sure its not gonna affect any actual use-case.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 2, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: flouthoc, giuseppe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rhatdan
Copy link
Member

rhatdan commented Mar 2, 2022

@mheon PTAL

@mheon
Copy link
Member

mheon commented Mar 2, 2022 via email

@giuseppe
Copy link
Member Author

giuseppe commented Mar 2, 2022

We are now attempting again at removing the cgroup up to 5 seconds

}
time.Sleep(time.Millisecond * 100)
}
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any use to having at least a logdebug here if attempts are >= 50 just stating that we ran out of attempts?

@rhatdan
Copy link
Member

rhatdan commented Mar 10, 2022

@giuseppe whats up with this one?

@mheon
Copy link
Member

mheon commented Mar 10, 2022

I'm still a little iffy on moving to a timeout from the current cgroup-based system. It seems like something we should do if we can't set resource limits, as opposed to a default.

@giuseppe
Copy link
Member Author

if you prefer there is an alternative version of the fix: #13398

Without this patch though, there is still a potential race where the cleanup process starts and the cgroup limit is applied afterward (that could happen because the container process exits). If it happens, the main podman process and the cleanup process will race for locking the container and it might end up in a deadlock

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 23, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 23, 2022

@giuseppe: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@giuseppe giuseppe closed this Mar 24, 2022
giuseppe added a commit to giuseppe/libpod that referenced this pull request Apr 29, 2022
It solves a race where a container cleanup process launched because of
the container process exiting normally would hang.

It also solves a problem when running as rootless on cgroup v1 since
it is not possible to force pids.max = 1 on conmon to limit spawning
the cleanup process.

Partially copied from containers#13403

Related to: containers#14057

[NO NEW TESTS NEEDED] it doesn't add any new functionality

Signed-off-by: Giuseppe Scrivano <[email protected]>
mheon pushed a commit to mheon/libpod that referenced this pull request May 3, 2022
It solves a race where a container cleanup process launched because of
the container process exiting normally would hang.

It also solves a problem when running as rootless on cgroup v1 since
it is not possible to force pids.max = 1 on conmon to limit spawning
the cleanup process.

Partially copied from containers#13403

Related to: containers#14057

[NO NEW TESTS NEEDED] it doesn't add any new functionality

Signed-off-by: Giuseppe Scrivano <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dbus-launch and conmon/pids.max problem
5 participants