-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libpod: do not lock all containers on pod rm #14976
libpod: do not lock all containers on pod rm #14976
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: giuseppe The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just nits, LGTM ... reproducer does not kick in anymore
It should also fix #14921 |
I launched long-time test with this patch (test based on #14921 description) and I got the following situation:
It's definitely not the same situation, but also looks like a deadlock. |
I think we still need the patch to first lock the pod on container cleanup EDIT: could you share the reproducer you are using? |
6925454
to
5b2a562
Compare
See #14969
Without your patch issue is reproduced after several minutes. |
Stack with this infinite loop:
|
thanks! Could you also share the stack trace for the other process? |
To be honest I don't how can I do this. If I attach to process via GDB or GoLand - there is no Go functions in stask - just C code (thread mutex locking)
|
what do you get if you run |
Output of
(gdb) thread apply all bt
Thread 14 (LWP 2040712 "podman"): Thread 13 (LWP 2040538 "podman"): Thread 12 (LWP 2040537 "podman"): Thread 11 (LWP 2040535 "podman"): Thread 10 (LWP 2040534 "podman"): Thread 9 (LWP 2040533 "podman"): Thread 8 (LWP 2040532 "podman"): Thread 7 (LWP 2040531 "podman"): Thread 6 (LWP 2040530 "podman"): Thread 5 (LWP 2040529 "podman"): Thread 4 (LWP 2040528 "podman"): Thread 3 (LWP 2040527 "podman"): Thread 2 (LWP 2040526 "podman"): Thread 1 (LWP 2040524 "podman"): |
I am trying your reproducer locally and I am at iteration Could you try with delve ( Are there only two podman processes running? |
In my case it's
Yes
A little bit later I will try it |
I ran my reproducer for hours and nothing has happened. |
same here, cannot end up in that state. Might be something unrelated |
tests are green |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@mheon PTAL
libpod/runtime_pod_linux.go
Outdated
} | ||
} | ||
if removalErr != nil { | ||
return removalErr | ||
} | ||
|
||
// We're going to be removing containers. | ||
// If we are Cgroupfs cgroup driver, to avoid races, we need to hit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this block be moved up, to ensure we're still preventing cleanup processes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would require to lock all containers at the same time which would reintroduce the deadlock.
@giuseppe WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we just remove this hack?
Since there is no locking now for removing containers, the cleanup processes should not hang for too long so it is fine if they run.
I've pushed a new version without this code block; let's see if it causes any issue in the CI.
5b2a562
to
3dde31b
Compare
do not attempt to lock all containers on pod rm since it can cause deadlocks when other podman cleanup processes are attempting to lock the same containers in a different order. [NO NEW TESTS NEEDED] Closes: containers#14929 Signed-off-by: Giuseppe Scrivano <[email protected]>
3dde31b
to
af118f7
Compare
CI is green again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Let's merge |
do not attempt to lock all containers on pod rm since it can cause
deadlocks when other podman cleanup processes are attempting to lock
the same containers in a different order.
[NO NEW TESTS NEEDED]
Closes: #14929
Signed-off-by: Giuseppe Scrivano [email protected]
Does this PR introduce a user-facing change?