-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deadlock(?) in podman rm(?) #14929
Comments
Maybe related #14921 |
Could be, but when I filed this I looked at the Super trivial reproducer: $ ./test/apiv2/test-apiv2 80 This seems to hang three out of four times on my laptop. |
I try to take a look today. @edsantiago, |
Also happens locally. An easy reproducer: With
It smells like another exit-code issue. |
|
Also happens when using |
@mheon WDYT? |
Need to know what that lock is assigned to ( It could be the same as #14291 in that we seem to have a case where normal |
Stack trace is point at podman/libpod/runtime_pod_linux.go Line 223 in b4c09be
|
Yes, the stack trace points there as well
I do not yet see why though.
I sent something there as well but it's clearly racy. My theory is that container cleanup and |
Cleanup should not take the pod lock, so it shouldn’t be able to force a
lock ordering deadlock. Maybe cleanup is stuck instead?
…On Fri, Jul 15, 2022 at 09:41 Valentin Rothberg ***@***.***> wrote:
Yes, the stack trace points there as well
/home/vrothberg/go/src/github.com/containers/podman/libpod/runtime_pod_linux.go:223 +0x304
I do not yet see why though.
removePod unfortunately takes basically every lock in existence (locks the
pod, but also every container in the pod) so I can't really tell what it's
waiting on.
I sent something there as well but it's clearly racy. My theory is that
container cleanup and pod rm run concurrently to hit this race.
—
Reply to this email directly, view it on GitHub
<#14929 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3AOCBOEAKTK4VIIU43XBTVUFS7TANCNFSM53P5CSTQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
That could be it, yes. It's always the first |
There is no pod batching mechanism so that shouldn’t be possible. The Kube
teardown code could be taking a container lock, but it would have to be
doing something to hold it indefinitely and I don’t know of many operations
that can do that. Something is probably freezing in a critical section,
only question is where. Killing all cleanup processes would probably tell
us if it’s that versus the play kube teardown deadlocking itself.
…On Fri, Jul 15, 2022 at 11:14 Valentin Rothberg ***@***.***> wrote:
That could be it, yes. It's always the first (*Container).Lock() so
either another process already has the lock or the *same* process has it.
Probably worth checking if this can be called in a batched context.
—
Reply to this email directly, view it on GitHub
<#14929 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3AOCD3JKTSSB55MJ5FGGLVUF547ANCNFSM53P5CSTQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I think that it is root cause. As I understand
0 - Pod |
You are looking at cleanup with a restart policy. Restart policy involves a potential start of dependency containers and requires locking said dependencies to ascertain their state - in this case, the infra container of a pod. It is not applicable in this case as no container has a restart policy (or |
Hey, could someone make a fix for this, even if it's a subobtimal fix? The flake is triggering pretty frequently. |
I do not yet know why it's happening. It reproduces easily but I just don't yet understand the source. |
I can take a look once I am done with RC2 |
There is a simple C code that is able to print mutexes owners:
I found it useful when investigating deadlock. |
From what I can see the issue happens when I think one possible way to solve it is to make sure the container cleanup first gets the lock on its pod. There might be other combinations of commands that bring to this same situation, in general whenever we need to lock the container dependencies, as it can race with diff --git a/libpod/container_api.go b/libpod/container_api.go
index 742eb6d3e..579fc7f79 100644
--- a/libpod/container_api.go
+++ b/libpod/container_api.go
@@ -667,6 +667,16 @@ func (c *Container) WaitForConditionWithInterval(ctx context.Context, waitTimeou
// It also cleans up the network stack
func (c *Container) Cleanup(ctx context.Context) error {
if !c.batched {
+ // if the container is part of a pod, we need to first lock its pod
+ pod, err := c.runtime.state.Pod(c.config.Pod)
+ if err != nil {
+ return fmt.Errorf("container %s is in pod %s, but pod cannot be retrieved: %w", c.ID(), c.config.Pod, err)
+ }
+ if pod != nil {
+ pod.lock.Lock()
+ defer pod.lock.Unlock()
+ }
+
c.lock.Lock()
defer c.lock.Unlock() I am still testing this patch but so far I could not reproduce the issue anymore Alternatively, we could probably simplify |
Fixing this in pod removal certainly seems safer. Otherwise we'll be chasing races elsewhere (anything that looks at dependencies, e.g. |
I'm also very curious as to why cleanup is locking dependency containers. I know that can happen with a restart policy set, but I don't see a code path that would need to inspect dependencies otherwise. |
I believe that the problem can also be fixed when Let's assume The current deadlock can happen when we cleanup If we make sure that |
I already tried this approach in PR #14969 and it has a bug - Cleanup function might be called from Pod.stopWithTimeout (pod_api.go) and mutex will be double-locked. UPD: I just pushed fix for this issue. |
Wrong lock order is occurred not only between containers, but also between Pod and containers. For example there is a case when Pod's lock is happened after Container's lock.
|
Please point to the exact location. I don't think that is possible. |
The only place I know is https://github.com/containers/podman/blob/main/libpod/container_internal.go#L1955 and it's documented in the comment how the dead lock is resolved. |
thanks, you are right, another attempt that I am still testing: diff --git a/libpod/runtime_pod_linux.go b/libpod/runtime_pod_linux.go
index 75ff24e41..e2cc4e0db 100644
--- a/libpod/runtime_pod_linux.go
+++ b/libpod/runtime_pod_linux.go
@@ -214,34 +214,46 @@ func (r *Runtime) removePod(ctx context.Context, p *Pod, removeCtrs, force bool,
return fmt.Errorf("pod %s contains containers and cannot be removed: %w", p.ID(), define.ErrCtrExists)
}
- // Go through and lock all containers so we can operate on them all at
- // once.
- // First loop also checks that we are ready to go ahead and remove.
- containersLocked := true
+ ctrNamedVolumes := make(map[string]*ContainerNamedVolume)
+
+ // Second loop - all containers are good, so we should be clear to
+ // remove.
+ for _, ctr := range ctrs {
+ // Remove the container.
+ // Do NOT remove named volumes. Instead, we're going to build a
+ // list of them to be removed at the end, once the containers
+ // have been removed by RemovePodContainers.
+ }
+
+ var removalErr error
for _, ctr := range ctrs {
- ctrLock := ctr.lock
- ctrLock.Lock()
- defer func() {
- if containersLocked {
+ err := func() error {
+ ctrLock := ctr.lock
+ ctrLock.Lock()
+ defer func() {
ctrLock.Unlock()
+ }()
+
+ if err := ctr.syncContainer(); err != nil {
+ return err
}
- }()
- // If we're force-removing, no need to check status.
- if force {
- continue
- }
+ for _, vol := range ctr.config.NamedVolumes {
+ ctrNamedVolumes[vol.Name] = vol
+ }
- // Sync all containers
- if err := ctr.syncContainer(); err != nil {
- return err
- }
+ return r.removeContainer(ctx, ctr, force, false, true, timeout)
+ }()
- // Ensure state appropriate for removal
- if err := ctr.checkReadyForRemoval(); err != nil {
- return fmt.Errorf("pod %s has containers that are not ready to be removed: %w", p.ID(), err)
+ if removalErr == nil {
+ removalErr = err
+ } else {
+ logrus.Errorf("Removing container %s from pod %s: %v", ctr.ID(), p.ID(), err)
}
}
+ if removalErr != nil {
+ return removalErr
+ }
// We're going to be removing containers.
// If we are Cgroupfs cgroup driver, to avoid races, we need to hit
@@ -268,30 +280,6 @@ func (r *Runtime) removePod(ctx context.Context, p *Pod, removeCtrs, force bool,
}
}
- var removalErr error
-
- ctrNamedVolumes := make(map[string]*ContainerNamedVolume)
-
- // Second loop - all containers are good, so we should be clear to
- // remove.
- for _, ctr := range ctrs {
- // Remove the container.
- // Do NOT remove named volumes. Instead, we're going to build a
- // list of them to be removed at the end, once the containers
- // have been removed by RemovePodContainers.
- for _, vol := range ctr.config.NamedVolumes {
- ctrNamedVolumes[vol.Name] = vol
- }
-
- if err := r.removeContainer(ctx, ctr, force, false, true, timeout); err != nil {
- if removalErr == nil {
- removalErr = err
- } else {
- logrus.Errorf("Removing container %s from pod %s: %v", ctr.ID(), p.ID(), err)
- }
- }
- }
-
// Clear infra container ID before we remove the infra container.
// There is a potential issue if we don't do that, and removal is
// interrupted between RemoveAllContainers() below and the pod's removal
@@ -326,12 +314,6 @@ func (r *Runtime) removePod(ctx context.Context, p *Pod, removeCtrs, force bool,
}
}
- // let's unlock the containers so if there is any cleanup process, it can terminate its execution
- for _, ctr := range ctrs {
- ctr.lock.Unlock()
- }
- containersLocked = false
-
// Remove pod cgroup, if present
if p.state.CgroupPath != "" {
logrus.Debugf("Removing pod cgroup %s", p.state.CgroupPath) |
@giuseppe I think your patch would work but at the cost of having small races in case a container transitions state (e.g., gets restarted). |
It looks like it's accessing containers without a lock, which isn't safe. We'd need to sync, get anything we need out of the container structs, then unlock and use the local variables we created, then lock again when the time came to remove. |
@mheon, what's your take on #14929 (comment)? |
We have code that does that (but in reverse) to start the pod. Using the existing graph-traversal code to stop the pod seems sensible to me. However, it doesn't fully resolve races, I think; we can't lock everything at once, otherwise we get the same issues as this, so while we are working on the outermost containers on the graph (those without dependencies), the inner containers can be locked and restarted and etc. Doesn't really matter for a force removal, but it'd be a semantic change for a normal removal - a container can restart during it and prevent removal of the entire pod. We could do a sequential locking (one by one, not holding more than a single lock at a time) of all containers at the beginning to verify state and then proceed if all are stopped, and just go ahead and force-remove any running containers that restarted while we were going? |
that is what I've tried to do, what more do we need to lock? |
They can still be locked and unlocked at any time. IMHO, the crux of the matter is the order in which containers are locked. The order in |
To rephrase it to a rule: if two containers are locked at the same time (by the same process), the order of the locking must follow their dependencies. If the order is random - which it is in |
I see what you're saying. My brain initially insisted that it can't be safe, but thinking about it more I can't see any cases where it wouldn't be. |
Sorry for insisting on it so much, but I really believe having a way of sorting AND accessing containers in that order for pods is the right way. If we put it into a method |
We have preexisting graph creation and traversal code in |
But I am cool using @giuseppe's patch as well. Short-lived locks that avoid keeping two containers locked at the same time in |
opened a PR: |
You are right, sorry, it's my inattention. |
do not attempt to lock all containers on pod rm since it can cause deadlocks when other podman cleanup processes are attempting to lock the same containers in a different order. [NO NEW TESTS NEEDED] Closes: containers#14929 Signed-off-by: Giuseppe Scrivano <[email protected]>
do not attempt to lock all containers on pod rm since it can cause deadlocks when other podman cleanup processes are attempting to lock the same containers in a different order. [NO NEW TESTS NEEDED] Closes: containers#14929 Signed-off-by: Giuseppe Scrivano <[email protected]>
Sorry, I really don't know what happened nor how to even report it. While testing #14772 my podman got stuck, and remains stuck. It looks like a deadlock. I don't actually know where it stuck in
apiv2
tests, but think it's in a DELETE. I was able to run aps
:...but
rm
has hung for at least ten minutes:$ bin/podman --root /dev/shm/test-apiv2.tmp.ZK2Ovh/server_root --runroot /run/user/14904/containers rm -f -a
lsof
pointed me at/dev/shm/libpod_rootless_lock_MYUID
, which surprised me because I would expect all the--root/--runroot
args to use a different lockfile. The four processes shown by lsof are (edited for clarity):Couldn't think of anything else to check, so I
kill
ed (TERM) all the processes, and have my system back now.The text was updated successfully, but these errors were encountered: