Deadlocks using the gitlab runner docker executor on the docker-compatible socket API #10090

thmo · 2021-04-20T15:01:37Z

/kind bug

Description

Trying to use gitlab-ci with the docker executor and the podman 3.x docker-compatible API socket very frequently yields to deadlock-like situations. These manifest in the Gitlab CI pipelines hanging for a while, and in addition, podman ps and other CLI commands hanging indefinitely. When this happens, only a systemctl restart podman helps.

The setup uses podman 3.1.0 from the container-tools module stream of CentOS8. The Gitlab runner itself is run as a podman container, using the latest docker.io/gitlab/gitlab-runner:alpine image, with /run/docker.sock mounted inside the runner. The socket is also mounted into a traefik container on the same machine.

As a side note, intermediate containers created by the CI are stopped, but not cleaned up.

Looking at the podman journal, I repeatedly see some messages like this:

Request Failed(Not Found): no container with name or ID \"...\" found: no such container"
Request Failed(Internal Server Error): can only kill running containers. ... is in state stopped: container state improper"
Request Failed(Internal Server Error): can only attach to created or running containers - currently in state stopped: container state improper"
Request Failed(Internal Server Error): container ... does not exist in database: no such container

not sure if these are really related or not.

Output of podman version:

Version:      3.1.0-dev
API Version:  3.1.0-dev
Go Version:   go1.16.1
Built:        Fri Mar 26 19:32:03 2021
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.19.8
  cgroupManager: systemd
  cgroupVersion: v1
  conmon:
    package: conmon-2.0.27-1.module_el8.5.0+733+9bb5dffa.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.27, commit: dc08a6edf03cc2dadfe803eac14b896b44cc4721'
  cpus: 16
  distribution:
    distribution: '"centos"'
    version: "8"
  eventLogger: file
  hostname: *****
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 4.18.0-301.1.el8.x86_64
  linkmode: dynamic
  memFree: 52090458112
  memTotal: 67455647744
  ociRuntime:
    name: crun
    package: crun-0.18-1.module_el8.5.0+733+9bb5dffa.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 0.18
      commit: 808420efe3dc2b44d6db9f1a3fac8361dde42a95
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    selinuxEnabled: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 20000530432
  swapTotal: 20000530432
  uptime: 89h 5m 49.06s (Approximately 3.71 days)
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 22
    paused: 0
    running: 22
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageStore:
    number: 26
  runRoot: /var/run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 3.1.0-dev
  Built: 1616783523
  BuiltTime: Fri Mar 26 19:32:03 2021
  GitCommit: ""
  GoVersion: go1.16.1
  OsArch: linux/amd64
  Version: 3.1.0-dev

Package info (e.g. output of rpm -q podman or apt list podman):

podman-3.1.0-0.13.module_el8.5.0+733+9bb5dffa.x86_64

The text was updated successfully, but these errors were encountered:

vrothberg · 2021-05-04T11:53:12Z

Thanks for reaching out and apologies for the longer silence, we've been very busy in the past weeks.

@mheon, do you have a suspicion on this one?

mheon · 2021-05-04T13:29:27Z

No. We need further debug information - full output of podman system service --log-level=debug with the executor run against it would be a good start.

thmo · 2021-05-15T17:56:21Z

Ok, here we go: this is a log wile running exactly one build in the gitlab pipeline.

The pipeline eventually finished. However, podman ps starts to hang around 18:17:02 - I run it in a loop once a second and the last one is from that time. It even hangs after the gitlab pipeline has finished, i.e. at the end of the logfile. Running podman ps with strace shows that it is waiting on a futex. A systemctl restart podman helps.

The containers 8d97d4de82b9, 3e596be4b232, 05c563522297, 1adba860a5c9, and 518ad874ec98 are still there, in Exited state. They should have been automatically cleaned up instead.

thmo · 2021-06-01T21:03:32Z

Problem persists with podman-3.1.2-1.el8.2.16.x86_64 from the Kubic project.

vrothberg · 2021-06-02T09:04:56Z

Ok, here we go: this is a log wile running exactly one build in the gitlab pipeline.

Apologies for not getting back to this issue earlier. The archive seems to be empty. I unpacked it, but there's just an empty directory.

Do you still have the logs?

thmo · 2021-06-02T09:51:26Z

The archive seems to be empty. I unpacked it, but there's just an empty directory.

It's a single packed file. This worked for me:

curl -LO https://github.com/containers/podman/files/6487924/podman_log.gz
gunzip podman_log.gz

vrothberg · 2021-06-21T17:48:51Z

Thanks! I'll look into it.

vrothberg · 2021-06-22T08:34:08Z

What I see in the logs below is the the containers in question are not getting deleted. At least, the handlers do not finish deletion in the two minutes.

May 15 18:19:02 hostname podman[262957]: time="2021-05-15T18:19:02+02:00" level=debug msg="IdleTracker 0xc000010320:active 8m+0h/67t connection(s)"
May 15 18:19:02 hostname podman[262957]: time="2021-05-15T18:19:02+02:00" level=info msg="APIHandler(b0c6ba44-50e6-4f92-b4b2-99ee40bb82a3) -- DELETE /v1.25/containers/05c5635222973035c274c8d77e3c77e522c1e84907bc7c7a714664a3b5e67414?force=1&v=1 BEGIN"
May 15 18:19:02 hostname podman[262957]: time="2021-05-15T18:19:02+02:00" level=debug msg="APIHandler(b0c6ba44-50e6-4f92-b4b2-99ee40bb82a3) -- Header: User-Agent=[Go-http-client/1.1]"
May 15 18:19:02 hostname podman[262957]: time="2021-05-15T18:19:02+02:00" level=debug msg="IdleTracker 0xc000010328:active 9m+0h/67t connection(s)"
May 15 18:19:02 hostname podman[262957]: time="2021-05-15T18:19:02+02:00" level=info msg="APIHandler(6db5a07e-7cc8-4c9b-a22a-a225aa2e2d59) -- DELETE /v1.25/containers/1adba860a5c9ba053b9fd5f8e385af017dfec503dad1f9ba77b93ca74a32db17?force=1&v=1 BEGIN"
May 15 18:19:02 hostname podman[262957]: time="2021-05-15T18:19:02+02:00" level=debug msg="APIHandler(6db5a07e-7cc8-4c9b-a22a-a225aa2e2d59) -- Header: User-Agent=[Go-http-client/1.1]"
May 15 18:19:02 hostname podman[262957]: time="2021-05-15T18:19:02+02:00" level=info msg="APIHandler(b33fb8df-b560-4494-aade-f566f4cbbcf4) -- DELETE /v1.25/containers/3e596be4b2323601646067248de40783def65ca9dae260bbd407ec49a9b9ce31?force=1&v=1 BEGIN"
May 15 18:19:02 hostname podman[262957]: time="2021-05-15T18:19:02+02:00" level=debug msg="APIHandler(b33fb8df-b560-4494-aade-f566f4cbbcf4) -- Header: User-Agent=[Go-http-client/1.1]"
[...]
May 15 18:21:08 hostname podman[270006]: healthy

vrothberg · 2021-06-22T08:47:38Z

It also seems that sigkills are failing to the container:

level=info msg="Request Failed(Internal Server Error): error sending signal to container 8d97d4de82b9df8deb88f5ab74e241833596a49388a13a9f2d13a2986cfae710: `/usr/bin/crun kill 8d97d4de82b9df8deb88f5ab74e241833596a49388a13a9f2d13a2986cfae710 9` failed: exit status 1"
level=debug msg="APIHandler(e5a95f68-af0f-43c4-90f8-a490862625e0) -- POST /v1.25/containers/8d97d4de82b9df8deb88f5ab74e241833596a49388a13a9f2d13a2986cfae710/kill?signal=SIGKILL END"

Before the deadlock, I see 5 consecutive requests to /networks causing quite some noise:

May 15 18:17:02 hostname podman[262957]: time="2021-05-15T18:17:02+02:00" level=info msg="APIHandler(0a3caf4f-9eee-4b14-87a6-e103b8840118) -- GET /v1.25/networks BEGIN"
May 15 18:17:02 hostname podman[262957]: time="2021-05-15T18:17:02+02:00" level=info msg="APIHandler(0e98c86a-5821-43b6-ab62-141374ceda5b) -- GET /v1.25/networks BEGIN"
May 15 18:17:02 hostname podman[262957]: time="2021-05-15T18:17:02+02:00" level=info msg="APIHandler(a2f15a00-da33-4d7d-828e-25d15e98d1d9) -- GET /v1.25/networks BEGIN"
May 15 18:17:02 hostname podman[262957]: time="2021-05-15T18:17:02+02:00" level=info msg="APIHandler(d6fd8ae6-d3ec-4d02-84e0-dae694b555f2) -- GET /v1.25/networks BEGIN"
May 15 18:17:02 hostname podman[262957]: time="2021-05-15T18:17:02+02:00" level=info msg="APIHandler(814f6766-3365-4e6e-b244-b97a3f3ae97d) -- GET /v1.25/networks BEGIN"

But only two of these requests return:

May 15 18:17:02 hostname podman[262957]: time="2021-05-15T18:17:02+02:00" level=info msg="APIHandler(92161cca-75c3-45c7-9f60-c732fcd77fca) -- GET /v1.24/containers/json?limit=0 BEGIN"
[...]
May 15 18:17:03 hostname podman[262957]: time="2021-05-15T18:17:03+02:00" level=debug msg="APIHandler(d6fd8ae6-d3ec-4d02-84e0-dae694b555f2) -- GET /v1.25/networks END"
[...]
May 15 18:17:03 hostname podman[262957]: time="2021-05-15T18:17:03+02:00" level=info msg="APIHandler(338d452c-3e95-4f10-be98-3eb90b8f412f) -- DELETE /v1.25/containers/518ad874ec9822776e6fbc0424b9b0037f5e285d70991608677f4da9efecba7e?force=1&v=1 BEGIN"
[...]
May 15 18:17:03 hostname podman[262957]: time="2021-05-15T18:17:03+02:00" level=debug msg="APIHandler(a2f15a00-da33-4d7d-828e-25d15e98d1d9) -- GET /v1.25/networks END"

The others do not return.

A lot to unpack.

thmo · 2021-06-22T09:02:05Z

What I see in the logs below is the the containers in question are not getting deleted. At least, the handlers do not finish deletion in the two minutes.

That corresponds to the observation that a bunch of temporary containers created by the CI are not cleaned up (also not after the two minutes) and have to be removed manually.

vrothberg · 2021-06-22T09:07:20Z

May 15 18:17:02 hostname podman[262957]: time="2021-05-15T18:17:02+02:00" level=debug msg="netNames: "podman""

Only shows up 3 times but I expect to see 5.

vrothberg · 2021-06-22T09:11:35Z

It seems like some logs are not displayed. The "shares network namespace, retrieving network info of container" suggest that all 5 network lists ran through.

vrothberg · 2021-06-22T09:45:39Z

It seems like some logs are not displayed. The "shares network namespace, retrieving network info of container" suggest that all 5 network lists ran through.

Yet, looking at the logs, the only theory I came up with so far is that the network listings ran into some deadlock. @mheon @Luap99 is there anything in the networking code that would support this theory? Some functions look to be recursive (e..g, getContainerNetworkInfo()).

Luap99 · 2021-06-22T11:57:27Z

There is definitely a lot of runtime locking but I do not see anything obvious where it could deadlock.
However I noticed that netNsCtr.syncContainer() is called without locking netNsCtr first.

podman/libpod/networking_linux.go

Lines 885 to 898 in ed511d2

    
           func (c *Container) getContainerNetworkInfo() (*define.InspectNetworkSettings, error) { 
        
           	if c.config.NetNsCtr != "" { 
        
           		netNsCtr, err := c.runtime.GetContainer(c.config.NetNsCtr) 
        
           		if err != nil { 
        
           			return nil, err 
        
           		} 
        
           		// Have to sync to ensure that state is populated 
        
           		if err := netNsCtr.syncContainer(); err != nil { 
        
           			return nil, err 
        
           		} 
        
           		logrus.Debugf("Container %s shares network namespace, retrieving network info of container %s", c.ID(), c.config.NetNsCtr) 
        
           		return netNsCtr.getContainerNetworkInfo() 
        
           	}

According to the comment syncContainer() should only be called when the container is locked. @mheon Could this be the cause?

podman/libpod/container_internal.go

Lines 325 to 329 in ed511d2

    
           // Sync this container with on-disk state and runtime status 
        
           // Should only be called with container lock held 
        
           // This function should suffice to ensure a container's state is accurate and 
        
           // it is valid for use. 
        
           func (c *Container) syncContainer() error {

mheon · 2021-06-22T13:20:50Z

That probably won't deadlock, but it is definitely undefined behavior - could cause really weird results if 2 operations were happening to the container at the same time. Since our API handlers are multithreaded, that could well be happening.

vrothberg · 2021-06-22T13:25:03Z

That probably won't deadlock, but it is definitely undefined behavior - could cause really weird results if 2 operations were happening to the container at the same time. Since our API handlers are multithreaded, that could well be happening.

I think that's the best shot we have so far. This code is executed concurrently (see logs) can yield undefined behavior which would reflect the image I got from staring at the logs. Maybe the deadlock is a symptom of that undefined behavior/state?

@Luap99, mind opening a fix for that?

Luap99 · 2021-06-22T14:18:02Z

PR #10754
@thmo Can you build this and test if it solves this issue?

thmo · 2021-06-22T14:56:06Z

Copied this static binary over /usr/bin/podman from kubic's podman-3.2.1-1.el8.4.1.x86_64, doesn't change the overall situation (stray containers, hanging podman ps.

podman version 
Version:      3.3.0-dev
API Version:  3.3.0-dev
Go Version:   go1.16.4
Git Commit:   b34f5be344d3b6b9014bf71b8f49d232057277a3-dirty
Built:        Tue Jan  1 01:00:00 1980
OS/Arch:      linux/amd64

github-actions · 2021-07-23T00:02:55Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2021-07-23T09:54:15Z

@mheon @vrothberg @Luap99 Looks like this is still an issue.

github-actions · 2021-08-23T00:03:57Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2021-08-23T14:47:12Z

@mheon @vrothberg @Luap99 any progress

J-Sorenson · 2021-10-28T15:48:16Z

Be advised that this bug is a blocker to getting Podman supported by the GitLab Pipeline runner. Is there at least a work-around that could be suggested?
https://gitlab.com/gitlab-org/gitlab-runner/-/issues/27119

rhatdan · 2021-10-28T18:33:34Z

Can you get us the stack trace? BTW We have had a release in the last month, has anyone tried this on podman 3.4?

github-actions · 2021-11-29T00:04:15Z

A friendly reminder that this issue had no activity for 30 days.

runiq · 2021-11-29T05:49:17Z

Can you get us the stack trace? BTW We have had a release in the last month, has anyone tried this on podman 3.4?

I hope I'll be able to try this today at work.

2xB · 2021-12-07T21:51:14Z

I observed such a deadlock-like behavior once recently, but couldn't reproduce it. Although I also saw two other issues that might be linked to this while I was using Podman inside a GitLab Runner Custom Executor:

GitLab Runner provided as TMPDIR a subfolder of /tmp, podman used this to build and frequently ran out of memory during committing. I could imagine if this was happening in an unfortunate moment, it could seemingly freeze. A solution was to execute podman commands with a custom TMPDIR like TMPDIR=/custom/dir podman build [...].
When building with Podman inside a custom GitLab Runner executor, storage goes full quite quickly, since for some reason ghost containers accumulate over time that do not occur in podman container list --external and are therefore not cleaned using podman container prune, as discussed in rmi: image in use by nonexistent container #12353 . As I understand it, this is also what this issue here already marks as a potentially related side note. I am not aware of such a storage-accumulating behavior in Docker, so this might also be a source of issues when using the Docker executor of GitLab Runner.

github-actions · 2022-01-07T00:04:46Z

A friendly reminder that this issue had no activity for 30 days.

github-actions · 2022-02-07T00:04:34Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2022-02-07T21:30:49Z

@thmo @2xB Still seeing this issue?

2xB · 2022-02-08T00:15:11Z

@rhatdan I haven't seen the deadlock again, but I am also now very aware of the two issues I mentioned above that cause too small storage. So if my hypothesis is right that for me the freeze came from too low storage space, I didn't yet have an opportunity to see it again since I'm now frequently running buildah rm --all to remove all accumulated containers including those "hidden" ones from cancelled podman builds (#12353) and taking care that TMPDIR set by GitLab Runner is large enough or I specify a custom TMPDIR. Cleaning containers manually, especially using buildah to ensure all containers including those hidden ones from cancelled podman builds are removed, may be a bit of a home-grown solution. But at least in my application of a custom GitLab executor running podman build in a known environment it works nicely.

Now that I think about this, the GitLab Docker executor @thmo is using probably never calls podman build, so the accumulated containers @thmo observes in their side note - that probably also stem from cancelled GitLab CI builds - are probably no unintuitive-to-clean Buildah containers that I observed, but can already be removed by simply executing e.g. podman container prune --force from time to time. Which would still need some thought on how to do this with the GitLab Docker executor, since as far as I know there is no equivalent thing needed for Docker in that direction.

thmo · 2022-02-21T22:46:12Z

@thmo @2xB Still seeing this issue?

Yes, at least with podman-3.4.1-3.module_el8.6.0+954+963caf36.x86_64 (waiting for 4.0 on c8s).

Attached is a SIGQUIT dump from the podman system service process in a the situation where podman ps hangs, in the hope that this will be helpful.

(NB: I am pretty sure I am not having a disk full problem.)

Luap99 · 2022-02-22T09:41:45Z

If this was really an issue with the network endpoints then I am confident this is is fixed in 4.0

However if I interpret your dump correctly goroutine 4331 [syscall, 9 minutes, locked to thread] is waiting for a container lock so maybe it is not network related.
This one goroutine 4076 [semacquire, 9 minutes]: also hangs waiting for a container on a container rm call.

This part is suspicious it hangs in container_inspect because it is waiting for the runtime lock.

Feb 21 23:33:19 thehostname podman[53872]: goroutine 4405 [semacquire, 9 minutes, locked to thread]:
Feb 21 23:33:19 thehostname podman[53872]: runtime.gopark(0x563c854413d0, 0x563c860a2020, 0xc002591912, 0x4)
Feb 21 23:33:19 thehostname podman[53872]:         /usr/lib/golang/src/runtime/proc.go:336 +0xe6 fp=0xc000bcc4d0 sp=0xc000bcc4b0 pc=0x563c837b8166
Feb 21 23:33:19 thehostname podman[53872]: runtime.goparkunlock(...)
Feb 21 23:33:19 thehostname podman[53872]:         /usr/lib/golang/src/runtime/proc.go:342
Feb 21 23:33:19 thehostname podman[53872]: runtime.semacquire1(0xc000134dc8, 0x0, 0x3, 0x0)
Feb 21 23:33:19 thehostname podman[53872]:         /usr/lib/golang/src/runtime/sema.go:144 +0x1a7 fp=0xc000bcc530 sp=0xc000bcc4d0 pc=0x563c837ca007
Feb 21 23:33:19 thehostname podman[53872]: sync.runtime_SemacquireMutex(0xc000134dc8, 0x203000, 0x0)
Feb 21 23:33:19 thehostname podman[53872]:         /usr/lib/golang/src/runtime/sema.go:71 +0x49 fp=0xc000bcc560 sp=0xc000bcc530 pc=0x563c837e9ce9
Feb 21 23:33:19 thehostname podman[53872]: sync.(*RWMutex).RLock(...)
Feb 21 23:33:19 thehostname podman[53872]:         /usr/lib/golang/src/sync/rwmutex.go:63
Feb 21 23:33:19 thehostname podman[53872]: github.com/containers/podman/libpod.(*Runtime).GetContainer(0xc000134c40, 0xc000ac5500, 0x40, 0x0, 0x0, 0x0)
Feb 21 23:33:19 thehostname podman[53872]:         /builddir/build/BUILD/containers-podman-c15c154/_build/src/github.com/containers/podman/libpod/runtime_ctr.go:902 +0x14e fp=0xc000bcc5b8 sp=0xc000bcc560 pc=0x563c848a20ee
Feb 21 23:33:19 thehostname podman[53872]: github.com/containers/podman/libpod.(*Container).getContainerNetworkInfo(0xc000c83130, 0x0, 0x0, 0x0)
Feb 21 23:33:19 thehostname podman[53872]:         /builddir/build/BUILD/containers-podman-c15c154/_build/src/github.com/containers/podman/libpod/networking_linux.go:1006 +0x9f fp=0xc000bcc9b8 sp=0xc000bcc5b8 pc=0x563c8485ef5f
Feb 21 23:33:19 thehostname podman[53872]: github.com/containers/podman/libpod.(*Container).getContainerInspectData(0xc000c83130, 0xc000142800, 0xc000461140, 0x40, 0xc000461140, 0x0)
Feb 21 23:33:19 thehostname podman[53872]:         /builddir/build/BUILD/containers-podman-c15c154/_build/src/github.com/containers/podman/libpod/container_inspect.go:163 +0xb4f fp=0xc000bcccc8 sp=0xc000bcc9b8 pc=0x563c84808b0f
Feb 21 23:33:19 thehostname podman[53872]: github.com/containers/podman/libpod.(*Container).inspectLocked(0xc000c83130, 0x0, 0x0, 0x37ad1ec09debdee, 0xc001776828)
Feb 21 23:33:19 thehostname podman[53872]:         /builddir/build/BUILD/containers-podman-c15c154/_build/src/github.com/containers/podman/libpod/container_inspect.go:36 +0x21d fp=0xc000bccd58 sp=0xc000bcccc8 pc=0x563c84807c7d
Feb 21 23:33:19 thehostname podman[53872]: github.com/containers/podman/libpod.(*Container).Inspect(0xc000c83130, 0xc00093ad00, 0x0, 0x0, 0x0)
Feb 21 23:33:19 thehostname podman[53872]:         /builddir/build/BUILD/containers-podman-c15c154/_build/src/github.com/containers/podman/libpod/container_inspect.go:50 +0x65 fp=0xc000bccda8 sp=0xc000bccd58 pc=0x563c84807ec5
Feb 21 23:33:19 thehostname podman[53872]: github.com/containers/podman/pkg/api/handlers/compat.getNetworkResourceByNameOrID(0xc000bdcca0, 0x6, 0xc000134c40, 0xc000a21dd0, 0xc000bcd288, 0x1, 0x1)
Feb 21 23:33:19 thehostname podman[53872]:         /builddir/build/BUILD/containers-podman-c15c154/_build/src/github.com/containers/podman/pkg/api/handlers/compat/networks.go:130 +0x4d3 fp=0xc000bcd1c8 sp=0xc000bccda8

Because it is already inside inspect it should hold a container lock. Maybe a ABBA deadlock between a container and the runtime lock?

mheon · 2022-02-22T14:15:51Z

Honestly, the runtime in-memory lock doesn't really protect us against anything, given that it's purely a per-process lock and so much of what we want to protect against is multiprocess. I think we could tear it out completely without negative consequence.

rhatdan · 2022-02-22T15:42:06Z

Then tear it out.

github-actions · 2022-03-25T00:06:04Z

A friendly reminder that this issue had no activity for 30 days.

Luap99 · 2022-03-25T10:33:04Z

Given that @mheon removed the runtime lock (#13311) I would think this is fixed.
Can anyone try this with the main branch?

thmo · 2022-03-25T12:14:52Z

In which release is this going to be?

rhatdan · 2022-03-25T12:26:23Z

Definitely 4.1

openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 20, 2021

Luap99 mentioned this issue Jun 22, 2021

getContainerNetworkInfo: lock netNsCtr before sync #10754

Merged

github-actions bot added the stale-issue label Jul 23, 2021

rhatdan removed the stale-issue label Jul 23, 2021

github-actions bot added the stale-issue label Aug 23, 2021

rhatdan removed the stale-issue label Aug 23, 2021

rhatdan closed this as completed Oct 25, 2021

rhatdan reopened this Oct 28, 2021

github-actions bot removed the stale-issue label Oct 30, 2021

github-actions bot added the stale-issue label Nov 29, 2021

github-actions bot removed the stale-issue label Nov 30, 2021

github-actions bot added the stale-issue label Jan 7, 2022

rhatdan removed the stale-issue label Jan 7, 2022

github-actions bot added the stale-issue label Feb 7, 2022

rhatdan removed the stale-issue label Feb 7, 2022

github-actions bot added the stale-issue label Mar 25, 2022

Luap99 closed this as completed Mar 25, 2022

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023

Deadlocks using the gitlab runner docker executor on the docker-compatible socket API #10090

Deadlocks using the gitlab runner docker executor on the docker-compatible socket API #10090

Comments

thmo commented Apr 20, 2021

vrothberg commented May 4, 2021

mheon commented May 4, 2021

thmo commented May 15, 2021 • edited Loading

thmo commented Jun 1, 2021

vrothberg commented Jun 2, 2021

thmo commented Jun 2, 2021

vrothberg commented Jun 21, 2021

vrothberg commented Jun 22, 2021

vrothberg commented Jun 22, 2021

thmo commented Jun 22, 2021

vrothberg commented Jun 22, 2021

vrothberg commented Jun 22, 2021

vrothberg commented Jun 22, 2021

Luap99 commented Jun 22, 2021

mheon commented Jun 22, 2021

vrothberg commented Jun 22, 2021

Luap99 commented Jun 22, 2021

thmo commented Jun 22, 2021

github-actions bot commented Jul 23, 2021

rhatdan commented Jul 23, 2021

github-actions bot commented Aug 23, 2021

rhatdan commented Aug 23, 2021

J-Sorenson commented Oct 28, 2021

rhatdan commented Oct 28, 2021

github-actions bot commented Nov 29, 2021

runiq commented Nov 29, 2021

2xB commented Dec 7, 2021 • edited Loading

github-actions bot commented Jan 7, 2022

github-actions bot commented Feb 7, 2022

rhatdan commented Feb 7, 2022

2xB commented Feb 8, 2022 • edited Loading

thmo commented Feb 21, 2022 • edited Loading

Luap99 commented Feb 22, 2022

mheon commented Feb 22, 2022

rhatdan commented Feb 22, 2022

github-actions bot commented Mar 25, 2022

Luap99 commented Mar 25, 2022

thmo commented Mar 25, 2022

rhatdan commented Mar 25, 2022

thmo commented May 15, 2021 •

edited

Loading

2xB commented Dec 7, 2021 •

edited

Loading

2xB commented Feb 8, 2022 •

edited

Loading

thmo commented Feb 21, 2022 •

edited

Loading