deadlock(?) in podman rm(?) #14929

edsantiago · 2022-07-13T20:05:24Z

Sorry, I really don't know what happened nor how to even report it. While testing #14772 my podman got stuck, and remains stuck. It looks like a deadlock. I don't actually know where it stuck in apiv2 tests, but think it's in a DELETE. I was able to run a ps:

$ bin/podman --root /dev/shm/test-apiv2.tmp.ZK2Ovh/server_root --runroot /run/user/14904/containers ps -a
CONTAINER ID  IMAGE                                        COMMAND     CREATED        STATUS                      PORTS       NAMES
ef964cc9f594  localhost/podman-pause:4.2.0-dev-1657739704              6 minutes ago  Exited (137) 6 minutes ago              dbdb45542af1-infra
36f1f1caf0d2  quay.io/libpod/alpine_labels:latest                      6 minutes ago  Stopping                                gallantmahavira-pod-gallantmahavira

...but rm has hung for at least ten minutes:

$ bin/podman --root /dev/shm/test-apiv2.tmp.ZK2Ovh/server_root --runroot /run/user/14904/containers rm -f -a

lsof pointed me at /dev/shm/libpod_rootless_lock_MYUID, which surprised me because I would expect all the --root/--runroot args to use a different lockfile. The four processes shown by lsof are (edited for clarity):

    podman system service    <---- this is apiv2 tests
    podman [lots of args I never added] container cleanup ef9... [SHA of the podman-pause container above]
    podman [lots of args I never added] container cleanup 36f... [SHA of the alpine_labels container above]
    podman rm -f -a        <--- this is the "rm" that I ran on the command line

Couldn't think of anything else to check, so I killed (TERM) all the processes, and have my system back now.

The text was updated successfully, but these errors were encountered:

Luap99 · 2022-07-14T13:02:37Z

Maybe related #14921

edsantiago · 2022-07-14T14:21:12Z

Could be, but when I filed this I looked at the kube.yaml file and did not see the string restart in it. I just double-checked, still absent.

Super trivial reproducer:

$ ./test/apiv2/test-apiv2 80

This seems to hang three out of four times on my laptop.

vrothberg · 2022-07-15T08:04:37Z

I try to take a look today.

@edsantiago, kill -SIGABRT will let go print a stack trace which can be extremely useful for such deadlocks.

vrothberg · 2022-07-15T08:44:56Z

Also happens locally. An easy reproducer:
./bin/podman kube play ~/kube.yaml && ./bin/podman kube play --down ~/kube.yaml

With ~/kube.yaml:

apiVersion: v1                                                
kind: Pod                                                     
metadata:                                                     
  annotations:                                                
    io.kubernetes.cri-o.TTY/gallantmahavira: "false"          
    io.podman.annotations.autoremove/gallantmahavira: "FALSE" 
    io.podman.annotations.init/gallantmahavira: "FALSE"       
    io.podman.annotations.privileged/gallantmahavira: "FALSE" 
    io.podman.annotations.publish-all/gallantmahavira: "FALSE"
  creationTimestamp: "2022-07-13T11:35:48Z"                   
  labels:                                                     
    app: gallantmahavira-pod                                  
  name: gallantmahavira-pod                                   
spec:                                                         
  containers:                                                 
  - image: docker.io/library/alpine:latest                    
    name: gallantmahavira                                     
    securityContext:                                          
      capabilities:                                           
        drop:                                                 
        - CAP_MKNOD                                           
        - CAP_NET_RAW                                         
        - CAP_AUDIT_WRITE

It smells like another exit-code issue.

vrothberg · 2022-07-15T08:50:18Z

goroutine 1 [syscall, locked to thread]:                                                                                                                                                    
github.com/containers/podman/v4/libpod/lock/shm._Cfunc_lock_semaphore(0x7fd49c0bf000, 0x6)                                                                                                  
        _cgo_gotypes.go:199 +0x4c                                                                                                                                                           
github.com/containers/podman/v4/libpod/lock/shm.(*SHMLocks).LockSemaphore(0xc000ae2a20, 0x6)                                                                                                
        /home/vrothberg/go/src/github.com/containers/podman/libpod/lock/shm/shm_lock.go:225 +0x58                                                                                           
github.com/containers/podman/v4/libpod/lock.(*SHMLock).Lock(0xc000b90fc0?)                                                                                                                  
        /home/vrothberg/go/src/github.com/containers/podman/libpod/lock/shm_lock_manager_linux.go:114 +0x25                                                                                 
github.com/containers/podman/v4/libpod.(*Runtime).removePod(0xc000098fc0, {0x197d1c0, 0xc000a72690}, 0xc0006eca20, 0x1, 0x0, 0xc00058a000?)                                                 
        /home/vrothberg/go/src/github.com/containers/podman/libpod/runtime_pod_linux.go:223 +0x304                                                                                          
github.com/containers/podman/v4/libpod.(*Runtime).RemovePod(0xc000098fc0, {0x197d1c0, 0xc000a72690}, 0xc0006eca20, 0xc0?, 0x8f?, 0xc00037b958?)                                             
        /home/vrothberg/go/src/github.com/containers/podman/libpod/runtime_pod.go:46 +0x137                                                                                                 
github.com/containers/podman/v4/pkg/domain/infra/abi.(*ContainerEngine).PodRm(0xc00059ae68, {0x197d1c0, 0xc000a72690}, {0xc000c530c0?, 0xc00037b9b8?, 0x4e42a5?}, {0x0, 0x0, 0x0, 0x0, ...})
        /home/vrothberg/go/src/github.com/containers/podman/pkg/domain/infra/abi/pods.go:274 +0x1d8                                                                                         
github.com/containers/podman/v4/pkg/domain/infra/abi.(*ContainerEngine).PlayKubeDown(0x7ffc2eb20222?, {0x197d1c0, 0xc000a72690}, {0x1973da0, 0xc00059ae80}, {})                             
        /home/vrothberg/go/src/github.com/containers/podman/pkg/domain/infra/abi/play.go:960 +0x209                                                                                         
github.com/containers/podman/v4/cmd/podman/kube.teardown({0x7ffc2eb20222, 0x19})                                                                                                            
        /home/vrothberg/go/src/github.com/containers/podman/cmd/podman/kube/play.go:264 +0x103                                                                                              
github.com/containers/podman/v4/cmd/podman/kube.Play(0x22f2860?, {0xc0000d75e0, 0x1, 0x2?})                                                                                                 
        /home/vrothberg/go/src/github.com/containers/podman/cmd/podman/kube/play.go:239 +0x33d                                                                                              
github.com/spf13/cobra.(*Command).execute(0x22f2860, {0xc00003c0d0, 0x2, 0x2})                                                                                                              
        /home/vrothberg/go/src/github.com/containers/podman/vendor/github.com/spf13/cobra/command.go:872 +0x694                                                                             
github.com/spf13/cobra.(*Command).ExecuteC(0x23006e0)                                                                                                                                       
        /home/vrothberg/go/src/github.com/containers/podman/vendor/github.com/spf13/cobra/command.go:990 +0x3b4                                                                             
github.com/spf13/cobra.(*Command).Execute(...)                                                                                                                                              
        /home/vrothberg/go/src/github.com/containers/podman/vendor/github.com/spf13/cobra/command.go:918                                                                                    
github.com/spf13/cobra.(*Command).ExecuteContext(...)                                                                                                                                       
        /home/vrothberg/go/src/github.com/containers/podman/vendor/github.com/spf13/cobra/command.go:911                                                                                    
main.Execute()                                                                                                                                                                              
        /home/vrothberg/go/src/github.com/containers/podman/cmd/podman/root.go:99 +0xc5                                                                                                     
main.main()                                                                                                                                                                                 
        /home/vrothberg/go/src/github.com/containers/podman/cmd/podman/main.go:40 +0x7c

vrothberg · 2022-07-15T08:56:10Z

Also happens when using podman pod rm.

vrothberg · 2022-07-15T12:34:49Z

@mheon WDYT?

mheon · 2022-07-15T13:25:41Z

Need to know what that lock is assigned to (LockSemaphore(0xc000ae2a20, 0x6) would indicate it's SHM lock 6). removePod unfortunately takes basically every lock in existence (locks the pod, but also every container in the pod) so I can't really tell what it's waiting on.

It could be the same as #14291 in that we seem to have a case where normal podman rm takes the container lock then the pod lock, but podman pod rm does the opposite, which can lead to the two deadlocking.

Luap99 · 2022-07-15T13:33:39Z

Stack trace is point at

podman/libpod/runtime_pod_linux.go

Line 223 in b4c09be

ctrLock.Lock()

so it is a container lock.

vrothberg · 2022-07-15T13:41:03Z

Yes, the stack trace points there as well

    /home/vrothberg/go/src/github.com/containers/podman/libpod/runtime_pod_linux.go:223 +0x304

I do not yet see why though.

removePod unfortunately takes basically every lock in existence (locks the pod, but also every container in the pod) so I can't really tell what it's waiting on.

I sent something there as well but it's clearly racy. My theory is that container cleanup and pod rm run concurrently to hit this race.

mheon · 2022-07-15T15:04:49Z

Cleanup should not take the pod lock, so it shouldn’t be able to force a lock ordering deadlock. Maybe cleanup is stuck instead?

…

On Fri, Jul 15, 2022 at 09:41 Valentin Rothberg ***@***.***> wrote: Yes, the stack trace points there as well /home/vrothberg/go/src/github.com/containers/podman/libpod/runtime_pod_linux.go:223 +0x304 I do not yet see why though. removePod unfortunately takes basically every lock in existence (locks the pod, but also every container in the pod) so I can't really tell what it's waiting on. I sent something there as well but it's clearly racy. My theory is that container cleanup and pod rm run concurrently to hit this race. — Reply to this email directly, view it on GitHub <#14929 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3AOCBOEAKTK4VIIU43XBTVUFS7TANCNFSM53P5CSTQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

vrothberg · 2022-07-15T15:14:13Z

That could be it, yes. It's always the first (*Container).Lock() so either another process already has the lock or the same process has it. Probably worth checking if this can be called in a batched context.

mheon · 2022-07-15T15:23:26Z

There is no pod batching mechanism so that shouldn’t be possible. The Kube teardown code could be taking a container lock, but it would have to be doing something to hold it indefinitely and I don’t know of many operations that can do that. Something is probably freezing in a critical section, only question is where. Killing all cleanup processes would probably tell us if it’s that versus the play kube teardown deadlocking itself.

…

On Fri, Jul 15, 2022 at 11:14 Valentin Rothberg ***@***.***> wrote: That could be it, yes. It's always the first (*Container).Lock() so either another process already has the lock or the *same* process has it. Probably worth checking if this can be called in a batched context. — Reply to this email directly, view it on GitHub <#14929 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3AOCD3JKTSSB55MJ5FGGLVUF547ANCNFSM53P5CSTQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

tyler92 · 2022-07-18T07:51:17Z

Cleanup should not take the pod lock, so it shouldn’t be able to force a lock ordering deadlock.

I think that it is root cause. As I understand podman cleanup locks not only container's lock, but also other infra (?) container. See output from #14291

Process 'podman pod rm -f -a':

Lock 0                                       
Locked 0                                     
Lock 1                                       
Locked 1                                     
Lock 2

Process 'podman ... container cleanup ...':

Lock 2
Locked 2
Lock 1
Locked 1
Unlock 1
Unlocked 1
Lock 1

0 - Pod
1 - infra container
2 - container

mheon · 2022-07-18T13:04:29Z

You are looking at cleanup with a restart policy. Restart policy involves a potential start of dependency containers and requires locking said dependencies to ascertain their state - in this case, the infra container of a pod. It is not applicable in this case as no container has a restart policy (or --rm set).

edsantiago · 2022-07-18T15:00:42Z

Hey, could someone make a fix for this, even if it's a subobtimal fix? The flake is triggering pretty frequently.

vrothberg · 2022-07-18T15:11:27Z

I do not yet know why it's happening. It reproduces easily but I just don't yet understand the source.

mheon · 2022-07-18T15:50:05Z

I can take a look once I am done with RC2

tyler92 · 2022-07-19T09:03:44Z

There is a simple C code that is able to print mutexes owners:

#include "shm_lock.h"
#include <stdio.h>
#include <string.h>

int main()
{
    int error = 0;
    shm_struct_t *mem;
    struct __pthread_mutex_s *mutex_data;
    int owner;

    mem = open_lock_shm(strdup("/libpod_rootless_lock_1000"), 2048, &error);

    if(!mem) {
        printf("%s\n", strerror(error));
        return 1;
    }

    if(mem->segment_lock.__data.__owner) {
        printf("segment_lock owner %d\n", mem->segment_lock.__data.__owner);
    }

    for(int i = 0, k = 0; i < BITMAP_SIZE; ++i) {
        for(int j = 0; j < 2048 / BITMAP_SIZE; ++j) {
            mutex_data = &mem->locks[i].locks[j].__data;

            if(mutex_data->__owner) {
                owner = mutex_data->__owner;
                printf("Lock #%d owner is process %d\n", k, owner);
            }

            ++k;
        }
    }

    return 0;
}

I found it useful when investigating deadlock.

giuseppe · 2022-07-19T13:57:42Z

From what I can see the issue happens when removePod attempts to lock sequentially all the containers in the pod and at the same time there is already a cleanup process that has a lock on a container and then it needs to lock the dependency container as well.

I think one possible way to solve it is to make sure the container cleanup first gets the lock on its pod.

There might be other combinations of commands that bring to this same situation, in general whenever we need to lock the container dependencies, as it can race with removePod.

diff --git a/libpod/container_api.go b/libpod/container_api.go
index 742eb6d3e..579fc7f79 100644
--- a/libpod/container_api.go
+++ b/libpod/container_api.go
@@ -667,6 +667,16 @@ func (c *Container) WaitForConditionWithInterval(ctx context.Context, waitTimeou
 // It also cleans up the network stack
 func (c *Container) Cleanup(ctx context.Context) error {
        if !c.batched {
+               // if the container is part of a pod, we need to first lock its pod
+               pod, err := c.runtime.state.Pod(c.config.Pod)
+               if err != nil {
+                       return fmt.Errorf("container %s is in pod %s, but pod cannot be retrieved: %w", c.ID(), c.config.Pod, err)
+               }
+               if pod != nil {
+                       pod.lock.Lock()
+                       defer pod.lock.Unlock()
+               }
+
                c.lock.Lock()
                defer c.lock.Unlock()

I am still testing this patch but so far I could not reproduce the issue anymore

Alternatively, we could probably simplify removePod and avoid to keep all the locks. The disadvantage would be that we might delete some containers before realizing there are some running containers that prevent the pod removal, but IMHO it is an acceptable cost.

mheon · 2022-07-19T14:04:49Z

Fixing this in pod removal certainly seems safer. Otherwise we'll be chasing races elsewhere (anything that looks at dependencies, e.g. podman start, could also cause this). Furthermore, serializing podman cleanup in a pod seems unnecessary and potentially a negative for performance.

mheon · 2022-07-19T14:06:57Z

I'm also very curious as to why cleanup is locking dependency containers. I know that can happen with a restart policy set, but I don't see a code path that would need to inspect dependencies otherwise.

vrothberg · 2022-07-19T14:12:51Z

I believe that the problem can also be fixed when pod rm sorts the list of containers according to their dependencies.

Let's assume A depends-on B depends-on C:

The current deadlock can happen when we cleanup A who wants to lock B. In parallel we have pod rm running which wants to lock the containers in order C, B, A. Boom ... deadlock.

If we make sure that pod rm always sorts the slice/list of containers according to their (inter) dependencies, the locking order would A, B, C. Then pod rm would have to wait for the cleanup of A to be done before it can acquire it's lock and the other ones after.

tyler92 · 2022-07-19T14:17:42Z

I am still testing this patch but so far I could not reproduce the issue anymore

I already tried this approach in PR #14969 and it has a bug - Cleanup function might be called from Pod.stopWithTimeout (pod_api.go) and mutex will be double-locked.

UPD: I just pushed fix for this issue.

tyler92 · 2022-07-19T14:32:59Z

I believe that the problem can also be fixed when pod rm sorts the list of containers according to their dependencies.

Wrong lock order is occurred not only between containers, but also between Pod and containers. For example there is a case when Pod's lock is happened after Container's lock.

P1, C1 vs C1, P1

vrothberg · 2022-07-19T14:33:45Z

Wrong lock order is occurred not only between containers, but also between Pod and containers. For example there is a case when Pod's lock is happened after Container's lock.

P1, C1 vs C1, P1

Please point to the exact location. I don't think that is possible.

vrothberg · 2022-07-19T14:35:27Z

The only place I know is https://github.com/containers/podman/blob/main/libpod/container_internal.go#L1955 and it's documented in the comment how the dead lock is resolved.

giuseppe · 2022-07-19T14:35:27Z

I am still testing this patch but so far I could not reproduce the issue anymore

I already tried this approach in PR #14969 and it has a bug - Cleanup function might be called from Pod.stopWithTimeout (pod_api.go) and mutex will be double-locked.

thanks, you are right, another attempt that I am still testing:

diff --git a/libpod/runtime_pod_linux.go b/libpod/runtime_pod_linux.go
index 75ff24e41..e2cc4e0db 100644
--- a/libpod/runtime_pod_linux.go
+++ b/libpod/runtime_pod_linux.go
@@ -214,34 +214,46 @@ func (r *Runtime) removePod(ctx context.Context, p *Pod, removeCtrs, force bool,
                return fmt.Errorf("pod %s contains containers and cannot be removed: %w", p.ID(), define.ErrCtrExists)
        }
 
-       // Go through and lock all containers so we can operate on them all at
-       // once.
-       // First loop also checks that we are ready to go ahead and remove.
-       containersLocked := true
+       ctrNamedVolumes := make(map[string]*ContainerNamedVolume)
+
+       // Second loop - all containers are good, so we should be clear to
+       // remove.
+       for _, ctr := range ctrs {
+               // Remove the container.
+               // Do NOT remove named volumes. Instead, we're going to build a
+               // list of them to be removed at the end, once the containers
+               // have been removed by RemovePodContainers.
+       }
+
+       var removalErr error
        for _, ctr := range ctrs {
-               ctrLock := ctr.lock
-               ctrLock.Lock()
-               defer func() {
-                       if containersLocked {
+               err := func() error {
+                       ctrLock := ctr.lock
+                       ctrLock.Lock()
+                       defer func() {
                                ctrLock.Unlock()
+                       }()
+
+                       if err := ctr.syncContainer(); err != nil {
+                               return err
                        }
-               }()
 
-               // If we're force-removing, no need to check status.
-               if force {
-                       continue
-               }
+                       for _, vol := range ctr.config.NamedVolumes {
+                               ctrNamedVolumes[vol.Name] = vol
+                       }
 
-               // Sync all containers
-               if err := ctr.syncContainer(); err != nil {
-                       return err
-               }
+                       return r.removeContainer(ctx, ctr, force, false, true, timeout)
+               }()
 
-               // Ensure state appropriate for removal
-               if err := ctr.checkReadyForRemoval(); err != nil {
-                       return fmt.Errorf("pod %s has containers that are not ready to be removed: %w", p.ID(), err)
+               if removalErr == nil {
+                       removalErr = err
+               } else {
+                       logrus.Errorf("Removing container %s from pod %s: %v", ctr.ID(), p.ID(), err)
                }
        }
+       if removalErr != nil {
+               return removalErr
+       }
 
        // We're going to be removing containers.
        // If we are Cgroupfs cgroup driver, to avoid races, we need to hit
@@ -268,30 +280,6 @@ func (r *Runtime) removePod(ctx context.Context, p *Pod, removeCtrs, force bool,
                }
        }
 
-       var removalErr error
-
-       ctrNamedVolumes := make(map[string]*ContainerNamedVolume)
-
-       // Second loop - all containers are good, so we should be clear to
-       // remove.
-       for _, ctr := range ctrs {
-               // Remove the container.
-               // Do NOT remove named volumes. Instead, we're going to build a
-               // list of them to be removed at the end, once the containers
-               // have been removed by RemovePodContainers.
-               for _, vol := range ctr.config.NamedVolumes {
-                       ctrNamedVolumes[vol.Name] = vol
-               }
-
-               if err := r.removeContainer(ctx, ctr, force, false, true, timeout); err != nil {
-                       if removalErr == nil {
-                               removalErr = err
-                       } else {
-                               logrus.Errorf("Removing container %s from pod %s: %v", ctr.ID(), p.ID(), err)
-                       }
-               }
-       }
-
        // Clear infra container ID before we remove the infra container.
        // There is a potential issue if we don't do that, and removal is
        // interrupted between RemoveAllContainers() below and the pod's removal
@@ -326,12 +314,6 @@ func (r *Runtime) removePod(ctx context.Context, p *Pod, removeCtrs, force bool,
                }
        }
 
-       // let's unlock the containers so if there is any cleanup process, it can terminate its execution
-       for _, ctr := range ctrs {
-               ctr.lock.Unlock()
-       }
-       containersLocked = false
-
        // Remove pod cgroup, if present
        if p.state.CgroupPath != "" {
                logrus.Debugf("Removing pod cgroup %s", p.state.CgroupPath)

vrothberg · 2022-07-19T14:37:55Z

@giuseppe I think your patch would work but at the cost of having small races in case a container transitions state (e.g., gets restarted).

mheon · 2022-07-19T14:41:44Z

It looks like it's accessing containers without a lock, which isn't safe. We'd need to sync, get anything we need out of the container structs, then unlock and use the local variables we created, then lock again when the time came to remove.

vrothberg · 2022-07-19T14:43:26Z

@mheon, what's your take on #14929 (comment)?

mheon · 2022-07-19T14:46:54Z

We have code that does that (but in reverse) to start the pod. Using the existing graph-traversal code to stop the pod seems sensible to me. However, it doesn't fully resolve races, I think; we can't lock everything at once, otherwise we get the same issues as this, so while we are working on the outermost containers on the graph (those without dependencies), the inner containers can be locked and restarted and etc. Doesn't really matter for a force removal, but it'd be a semantic change for a normal removal - a container can restart during it and prevent removal of the entire pod. We could do a sequential locking (one by one, not holding more than a single lock at a time) of all containers at the beginning to verify state and then proceed if all are stopped, and just go ahead and force-remove any running containers that restarted while we were going?

giuseppe · 2022-07-19T14:50:12Z

It looks like it's accessing containers without a lock, which isn't safe. We'd need to sync, get anything we need out of the container structs, then unlock and use the local variables we created, then lock again when the time came to remove.

that is what I've tried to do, what more do we need to lock?

vrothberg · 2022-07-19T14:52:25Z

we can't lock everything at once, otherwise we get the same issues as this, so while we are working on the outermost containers on the graph (those without dependencies), the inner containers can be locked and restarted and etc.

They can still be locked and unlocked at any time. IMHO, the crux of the matter is the order in which containers are locked. The order in pod rm does not follow the order in which cleanup locks. That opens the race for an ABBA lock.

vrothberg · 2022-07-19T14:54:00Z

To rephrase it to a rule: if two containers are locked at the same time (by the same process), the order of the locking must follow their dependencies. If the order is random - which it is in pod rm - we will have deadlocks when running concurrently.

mheon · 2022-07-19T14:55:51Z

I see what you're saying. My brain initially insisted that it can't be safe, but thinking about it more I can't see any cases where it wouldn't be.

vrothberg · 2022-07-19T14:58:04Z

Sorry for insisting on it so much, but I really believe having a way of sorting AND accessing containers in that order for pods is the right way. If we put it into a method (*Pod).GetContainers() or something like that, future maintainers can just use that method whenever they need to access all containers.

mheon · 2022-07-19T15:00:24Z

We have preexisting graph creation and traversal code in container_graph.go. I think the graph itself can be reused, but the traversal code will need a new function that does an outside-in traversal, as opposed to an inside-out one.

vrothberg · 2022-07-19T15:10:18Z

But I am cool using @giuseppe's patch as well. Short-lived locks that avoid keeping two containers locked at the same time in pod rm will also solve this issue.

giuseppe · 2022-07-19T15:47:27Z

opened a PR:

libpod: do not lock all containers on pod rm #14976

tyler92 · 2022-07-19T17:57:00Z

Please point to the exact location. I don't think that is possible.

You are right, sorry, it's my inattention.

do not attempt to lock all containers on pod rm since it can cause deadlocks when other podman cleanup processes are attempting to lock the same containers in a different order. [NO NEW TESTS NEEDED] Closes: containers#14929 Signed-off-by: Giuseppe Scrivano <[email protected]>

edsantiago mentioned this issue Jul 14, 2022

fix tests for "podman kube play" #14938

Merged

edsantiago mentioned this issue Jul 18, 2022

Add --host and -H as equivalent options to --url #14947

Merged

vrothberg mentioned this issue Jul 19, 2022

WIP: fix deadlock between play kube and cleanup #14969

Closed

giuseppe mentioned this issue Jul 19, 2022

libpod: do not lock all containers on pod rm #14976

Merged

openshift-merge-robot closed this as completed in #14976 Jul 22, 2022

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023

deadlock(?) in podman rm(?) #14929

deadlock(?) in podman rm(?) #14929

Comments

edsantiago commented Jul 13, 2022

Luap99 commented Jul 14, 2022

edsantiago commented Jul 14, 2022

vrothberg commented Jul 15, 2022

vrothberg commented Jul 15, 2022

vrothberg commented Jul 15, 2022

vrothberg commented Jul 15, 2022

vrothberg commented Jul 15, 2022

mheon commented Jul 15, 2022

Luap99 commented Jul 15, 2022

vrothberg commented Jul 15, 2022

mheon commented Jul 15, 2022 via email

vrothberg commented Jul 15, 2022

mheon commented Jul 15, 2022 via email

tyler92 commented Jul 18, 2022 • edited Loading

mheon commented Jul 18, 2022

edsantiago commented Jul 18, 2022

vrothberg commented Jul 18, 2022

mheon commented Jul 18, 2022

tyler92 commented Jul 19, 2022

giuseppe commented Jul 19, 2022

mheon commented Jul 19, 2022

mheon commented Jul 19, 2022

vrothberg commented Jul 19, 2022

tyler92 commented Jul 19, 2022 • edited Loading

tyler92 commented Jul 19, 2022

vrothberg commented Jul 19, 2022

vrothberg commented Jul 19, 2022

giuseppe commented Jul 19, 2022

vrothberg commented Jul 19, 2022

mheon commented Jul 19, 2022

vrothberg commented Jul 19, 2022

mheon commented Jul 19, 2022

giuseppe commented Jul 19, 2022

vrothberg commented Jul 19, 2022

vrothberg commented Jul 19, 2022 • edited Loading

mheon commented Jul 19, 2022

vrothberg commented Jul 19, 2022

mheon commented Jul 19, 2022

vrothberg commented Jul 19, 2022

giuseppe commented Jul 19, 2022

tyler92 commented Jul 19, 2022

tyler92 commented Jul 18, 2022 •

edited

Loading

tyler92 commented Jul 19, 2022 •

edited

Loading

vrothberg commented Jul 19, 2022 •

edited

Loading