Podman lock contention when attempting to restart multiple containers #11940

gcs278 · 2021-10-12T14:23:45Z

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

With a restart policy as always or on-failed, podman seems to really struggle and potentially deadlock when it is restarting multiple containers that are constantly exiting. I first noticed this problem with using podman play kube where a couple containers wereconstantly dying and the restart policy was always. I then added an script with just exit 1 as the entrypoint and watched podman commands being to hang longer.

I started 8 instances of exit 1 and --restart=always containers via podman run and podman commands took around 60 seconds to return. After about a minute, podman seemed to deadlock. Podman commands weren't returning and I couldn't stop any of the dying containers. I rm -f /dev/shm/libpod_lock and did a pkill podman to release the deadlock.

This is a big problem for us, as we can't trust podman to restart containers without deadlocking. This seems related to #11589, but I thought it would be better to separately track since it's a different situation.

Steps to reproduce the issue:

podman run -d --restart=always --entrypoint="" image_name bash -c "exit 1"
podman run -d --restart=always --entrypoint="" image_name bash -c "exit 1"
podman run -d --restart=always --entrypoint="" image_name bash -c "exit 1"
podman run -d --restart=always --entrypoint="" image_name bash -c "exit 1"
podman run -d --restart=always --entrypoint="" image_name bash -c "exit 1"
podman run -d --restart=always --entrypoint="" image_name bash -c "exit 1"
podman run -d --restart=always --entrypoint="" image_name bash -c "exit 1"
podman run -d --restart=always --entrypoint="" image_name bash -c "exit 1"

Check podman commands like podman ps. See if podman deadlocks

Describe the results you received:
Podman gets extremely sluggish and then deadlocks

Describe the results you expected:
Podman wouldn't deadlock

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

podman version 3.2.3

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.21.3
  cgroupControllers:
  - cpuset
  - cpu
  - cpuacct
  - blkio
  - memory
  - devices
  - freezer
  - net_cls
  - perf_event
  - net_prio
  - hugetlb
  - pids
  - rdma
  cgroupManager: systemd
  cgroupVersion: v1
  conmon:
    package: conmon-2.0.29-1.module+el8.4.0+11822+6cc1e7d7.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.29, commit: ae467a0c8001179d4d0adf4ada381108a893d7ec'
  cpus: 10
  distribution:
    distribution: '"rhel"'
    version: "8.2"
  eventLogger: file
  hostname: rhel82
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 4.18.0-193.el8.x86_64
  linkmode: dynamic
  memFree: 136581120
  memTotal: 3884625920
  ociRuntime:
    name: runc
    package: runc-1.0.0-74.rc95.module+el8.4.0+11822+6cc1e7d7.x86_64
    path: /usr/bin/runc
    version: |-
      runc version spec: 1.0.2-dev
      go: go1.15.13
      libseccomp: 2.4.1
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 3990089728
  swapTotal: 4190105600
  uptime: 2159h 52m 24.42s (Approximately 89.96 days)
registries:
  registry:5000:
    Blocked: false
    Insecure: true
    Location: registry:5000
    MirrorByDigestOnly: false
    Mirrors: []
    Prefix: registry:5000
  search: ""
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 16
    paused: 0
    running: 0
    stopped: 16
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageStore:
    number: 77
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 3.2.3
  Built: 1627570963
  BuiltTime: Thu Jul 29 11:02:43 2021
  GitCommit: ""
  GoVersion: go1.15.7
  OsArch: linux/amd64
  Version: 3.2.3

Package info (e.g. output of rpm -q podman or apt list podman):

podman-3.2.3-0.11.module+el8.4.0+12050+ef972f71.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md)

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

The text was updated successfully, but these errors were encountered:

mheon · 2021-10-12T16:13:58Z

This doesn't seem like a deadlock - it more seems like Podman is constantly attempting to restart containers, resulting in at least one container having its lock taken at all times, making ps take a long time to finish as it waits to acquire locks. After 5 minutes, I haven't been able to replicate a deadlock, though podman ps is taking upwards of a minute to successfully execute. It is absolutely blowing up the load average as well - loading 8 cores to ~80% I think this is a rather inherent limitation of our daemonless architecture as each command needs to launch a Podman cleanup process to handle restart, which is resulting in a massive process storm - it's why we strongly recommend using systemd-managed containers instead.

Is this a particularly slow system you're testing on? It could explain why things appear to deadlock. I'm fairly convinced there's no actual deadlock here, just a severely taxed system.

gcs278 · 2021-10-12T20:16:30Z

Thanks for looking into this @mheon. Yea it's a dedicated server with 48 cores, the deadlocking is somewhat inconsistent for me. I tried it again and couldn't get it to deadlock, but other times, it deadlocks after the first couple restart cycle on 8 containers. I would let podman commands for 5-10 minutes before removing the lock file and killing processes.

I'm using podman play so I don't think there is an option for using systemd with podman play.

github-actions · 2021-11-12T00:04:02Z

A friendly reminder that this issue had no activity for 30 days.

github-actions · 2021-12-13T00:04:29Z

A friendly reminder that this issue had no activity for 30 days.

vrothberg · 2022-03-22T08:53:57Z

FWIW, I think that podman ps is way too expensive. The lock of a single container is acquired and released ~ a dozen times just to query certain data (e.g., state, mappings, root FS, etc.). I think we need to optimize querying that data and put it into a single locked function (rather than N locked ones).

vrothberg · 2022-03-22T09:09:03Z

I'll take a stab at it.

vrothberg · 2022-03-22T09:37:36Z

FWIW, I think that podman ps is way too expensive. The lock of a single container is acquired and released ~ a dozen times just to query certain data (e.g., state, mappings, root FS, etc.). I think we need to optimize querying that data and put it into a single locked function (rather than N locked ones).

Scratch that ... these operations are batched.

mheon · 2022-03-25T14:49:24Z

I was seeing this earlier this week in a slightly different context (podman ps and podman rm -af), so I took a further look. Current observations support it being contention of the container locks, which is exacerbated by the amount of parallel processes we run . I believe our algorithm is CPU Cores * 3 +1, which means that on my system, I have 25 threads going for both podman ps and podman rm, each contending for CPU time, and each aggressively trying to take locks for containers they are operating on. In short, we aren't waiting on a single lock for a minute, we're waiting on a hundred locks for a second or two each. I don't really know if we can improve this easily.

One thought I have is to print results as they come, instead of all at once when the command is done. This isn't perfect, but it would be a lot more clear to the user what is happening (at least, it will be obvious that the commands are not deadlocked)

mheon · 2022-03-25T14:56:36Z

Other possible thought: randomize the order in which we act on containers. podman ps and podman rm were operating on the same set of containers in the same order, with one being a lot slower than the other, so ps was run second but caught up quickly and ended up waiting on locks until rm finished. Random ordering much improves our odds of getting containers that aren't in contention.

mheon · 2022-03-25T15:30:49Z

I added a bit of randomization to the ordering, but it wasn't enough - no appreciable increase in performance, there are still too many collisions (25 parallel jobs over 200 test containers meant ps and stop, for example, are each working on 1/8 of total containers at any given time - high odds for collisions, which cause lock contention, which cause ps to slow down....)

vrothberg · 2022-03-25T16:09:06Z

@mheon, that is a great trail your on.

Maybe we should think in terms of a work pool rather in terms of workers per caller. Could we have a global shared semaphore to limit the number of parallel batch workers? That would limit lock contention etc. AFAIK the locks are already fair.

mheon · 2022-03-25T16:55:33Z

We do have a semaphore right now, but it's per-process, not global. Making it global is potentially interesting, if we can get a MP-safe shared-memory semaphore.

mheon · 2022-03-25T17:03:24Z

Shared semaphore looks viable. My only concern is making sure that crashes and SIGKILL don't affect us - if, say, podman stop is running and using all available jobs, and then gets a SIGKILL, we want the semaphore to be released back to its maximum value.

rhatdan · 2022-05-17T19:25:17Z

@mheon Any movement on this?

mheon · 2022-05-18T13:27:32Z

Negative. Might be worth discussing at the cabal if we have time? I don't have a solid feel on how to fix this.

tyler92 · 2022-07-12T19:05:21Z

I have investigated this issue (it reproduces in my case too). Simple program based on shm_lock code shows the following picture:

LockID = 1 (Pod)              owner PID = 462221
LockID = 2 (infra container)  owner PID = 462221
LockID = 3,(app container)    owner PID = 462207

462207 is process, that is started when restart is occurred - podman container cleanup
462221 is any other process, in my case it's podman pod rm -f -a

And these processes are deadlocked because they are waiting each other (lock order problem).
The simplest way to reproduce is run the following script:

#!/bin/bash

set -o errexit

for x in {1..10000};
    do echo "* $x *"
    podman play kube ./my-pod.yaml
    podman trace pod rm -f -a
    podman trace rm -a
done

where my-pod.yaml looks like:

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: my-pod
  name: my-pod
spec:
  containers:
  - name: app
    image: debian
    imagePullPolicy: Never
    command:
    - /bin/sleep
    args:
    - 0.001
  hostNetwork: true
  restartPolicy: Always

tyler92 · 2022-07-12T19:20:27Z

So it looks like we should lock of a container's Pod before lock a container. Is it a good idea?

mheon · 2022-07-12T23:25:44Z

That is definitely a separate issue, please file a new bug for it

…

On Tue, Jul 12, 2022 at 15:20 tyler92 ***@***.***> wrote: So it looks like we should lock of a container's Pod before lock a container. Is it a good idea? — Reply to this email directly, view it on GitHub <#11940 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3AOCH6CPEVHX7W5MKDT23VTXAQNANCNFSM5F2WUEAA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

tyler92 · 2022-07-13T12:59:14Z

No problem: #14921

github-actions · 2022-08-23T00:09:05Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2022-08-23T14:14:59Z

@mheon Any progress on this?

mheon · 2022-08-23T14:41:53Z

Negative. I don't think we have a good solution yet.

github-actions · 2022-09-30T00:15:51Z

A friendly reminder that this issue had no activity for 30 days.

vrothberg · 2023-06-16T12:27:53Z

I'm using podman play so I don't think there is an option for using systemd with podman play.

@gcs278, running kube pay under systemd is working now. The podman-kube@ systemd template works but I find Quadlet to be better suited:

FWIW, I had another look at the issue. I couldn't see any deadlocks and ps performs much better than back in October '21. Podman's deamonless architecture makes it subject to lock contention which is hitting pretty hard with --restart=always and a failing containers.

vrothberg · 2023-06-16T12:29:15Z

@rhatdan @mheon I feel like we can close this issue at this point. One thing to consider is to change kube play to stop defaulting to --restart=always in containers. I know it's K8s compat but I find it less appealing for the Podman use cases.

vrothberg · 2023-06-19T10:05:16Z

Cc: @Luap99 @giuseppe

rhatdan · 2023-06-20T20:10:09Z

Its funny that we just has a discussion with BU Student where restart always might come in handy. Imagine you have two or more containers in a pod or multiple pods that require services from each other. In Compose you can set what containers need to come up first before a second container starts.

In podman we sequentially start the containers, and if Container A requires Container B, then when container A fails we failed, without starting Container B. If they all started simultaneously then Container A could fail, container B would succeed, and when Container A restarted Container B would be running, and we would get to a good state. I think current design is Contaner A keeps restarting, and Container B does not ever get a chance. I think if we fix this simultanious start, then restart always will make some sense.

vrothberg · 2023-06-21T07:06:43Z

I'll take a stab and close the issue. As mentioned in #11940 (comment), things have improved considerably since the initial report in Oct '21. Feel free to drop a comment or reopen if you think otherwise.

openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 12, 2021

github-actions bot added the stale-issue label Nov 12, 2021

vrothberg removed the stale-issue label Nov 12, 2021

rhatdan mentioned this issue Nov 18, 2021

podman deadlocks with multiple concurrent running containers #11589

Closed

github-actions bot added the stale-issue label Dec 13, 2021

vrothberg removed the stale-issue label Mar 22, 2022

vrothberg self-assigned this Mar 22, 2022

vrothberg removed their assignment Mar 22, 2022

umohnani8 added the needs-design-doc label Jul 21, 2022

umohnani8 changed the title ~~Podman deadlocks when attempting to restart multiple containers~~ Podman lock contention when attempting to restart multiple containers Jul 21, 2022

umohnani8 added the priority/medium label Jul 21, 2022

github-actions bot added the stale-issue label Aug 23, 2022

rhatdan removed the stale-issue label Aug 23, 2022

github-actions bot added the stale-issue label Sep 30, 2022

rhatdan removed the stale-issue label Sep 30, 2022

vrothberg closed this as completed Jun 21, 2023

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Podman lock contention when attempting to restart multiple containers #11940

Podman lock contention when attempting to restart multiple containers #11940

gcs278 commented Oct 12, 2021 •

edited

Loading

mheon commented Oct 12, 2021

gcs278 commented Oct 12, 2021

github-actions bot commented Nov 12, 2021

github-actions bot commented Dec 13, 2021

vrothberg commented Mar 22, 2022 •

edited

Loading

vrothberg commented Mar 22, 2022

vrothberg commented Mar 22, 2022

mheon commented Mar 25, 2022

mheon commented Mar 25, 2022

mheon commented Mar 25, 2022

vrothberg commented Mar 25, 2022

mheon commented Mar 25, 2022

mheon commented Mar 25, 2022

rhatdan commented May 17, 2022

mheon commented May 18, 2022

tyler92 commented Jul 12, 2022 •

edited

Loading

tyler92 commented Jul 12, 2022

mheon commented Jul 12, 2022 via email

tyler92 commented Jul 13, 2022

github-actions bot commented Aug 23, 2022

rhatdan commented Aug 23, 2022

mheon commented Aug 23, 2022

github-actions bot commented Sep 30, 2022

vrothberg commented Jun 16, 2023

vrothberg commented Jun 16, 2023

vrothberg commented Jun 19, 2023

rhatdan commented Jun 20, 2023

vrothberg commented Jun 21, 2023

Podman lock contention when attempting to restart multiple containers #11940

Podman lock contention when attempting to restart multiple containers #11940

Comments

gcs278 commented Oct 12, 2021 • edited Loading

mheon commented Oct 12, 2021

gcs278 commented Oct 12, 2021

github-actions bot commented Nov 12, 2021

github-actions bot commented Dec 13, 2021

vrothberg commented Mar 22, 2022 • edited Loading

vrothberg commented Mar 22, 2022

vrothberg commented Mar 22, 2022

mheon commented Mar 25, 2022

mheon commented Mar 25, 2022

mheon commented Mar 25, 2022

vrothberg commented Mar 25, 2022

mheon commented Mar 25, 2022

mheon commented Mar 25, 2022

rhatdan commented May 17, 2022

mheon commented May 18, 2022

tyler92 commented Jul 12, 2022 • edited Loading

tyler92 commented Jul 12, 2022

mheon commented Jul 12, 2022 via email

tyler92 commented Jul 13, 2022

github-actions bot commented Aug 23, 2022

rhatdan commented Aug 23, 2022

mheon commented Aug 23, 2022

github-actions bot commented Sep 30, 2022

vrothberg commented Jun 16, 2023

vrothberg commented Jun 16, 2023

vrothberg commented Jun 19, 2023

rhatdan commented Jun 20, 2023

vrothberg commented Jun 21, 2023

gcs278 commented Oct 12, 2021 •

edited

Loading

vrothberg commented Mar 22, 2022 •

edited

Loading

tyler92 commented Jul 12, 2022 •

edited

Loading