Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generated systemd user units do not stop on shutdown before timeout expires #16683

Closed
jackwilsdon opened this issue Nov 30, 2022 · 5 comments
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@jackwilsdon
Copy link

jackwilsdon commented Nov 30, 2022

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

systemd user units created with podman generate systemd --new do not stop on shutdown before timeout expires.

Steps to reproduce the issue:

  1. Create a container to generate the unit from:

    podman run --name hello-world -p 8000:8000 crccheck/hello-world
    
  2. Generate the unit and place it in the user mode load path (e.g. $HOME/.config/systemd/user):

    podman generate systemd --new hello-world > ~/.config/systemd/user/hello-world.service
    
    hello-world.service

    # container-18cfab4f2c165de0a0a5143bef4deb2d769d2eccb731420bfcbf6b023de84f23.service
    # autogenerated by Podman 4.3.1
    # Wed Nov 30 01:40:03 GMT 2022
    
    [Unit]
    Description=Podman container-18cfab4f2c165de0a0a5143bef4deb2d769d2eccb731420bfcbf6b023de84f23.service
    Documentation=man:podman-generate-systemd(1)
    Wants=network-online.target
    After=network-online.target
    RequiresMountsFor=%t/containers
    
    [Service]
    Environment=PODMAN_SYSTEMD_UNIT=%n
    Restart=on-failure
    TimeoutStopSec=70
    ExecStartPre=/bin/rm \
    	-f %t/%n.ctr-id
    ExecStart=/usr/bin/podman run \
    	--cidfile=%t/%n.ctr-id \
    	--cgroups=no-conmon \
    	--rm \
    	--sdnotify=conmon \
    	-d \
    	--replace \
    	--name hello-world \
    	-p 8000:8000 crccheck/hello-world
    ExecStop=/bin/sleep infinity
    Type=notify
    NotifyAccess=all
    
    [Install]
    WantedBy=default.target
    

  3. Start the unit:

    systemctl --user start hello-world.service
    
  4. Reboot:

    systemctl reboot
    
  5. Note that systemd is now waiting for hello-world.service to stop, which ends up being killed after the stop timeout.

Describe the results you received:
Generated units do not stop on shutdown before timeout expires.

Describe the results you expected:
Generated units stop on shutdown before timeout expires.

Additional information you deem important (e.g. issue happens only occasionally):
This doesn't happen when stopping the unit normally (systemctl --user stop hello-world.service), it specifically requires that the system is being shut down.

I modified the unit to hang on stop instead of running podman stop and ran the command myself, getting the following output:

Output

$ podman --log-level trace stop --ignore -t 10 --cidfile=/run/user/1000/hello-world.service.ctr-id
time="2022-11-30T01:41:30Z" level=info msg="podman filtering at log level trace"
time="2022-11-30T01:41:30Z" level=debug msg="Called stop.PersistentPreRunE(podman --log-level trace stop --ignore -t 10 --cidfile=/run/user/1000/hello-world.service.ctr-id)"
time="2022-11-30T01:41:30Z" level=trace msg="Reading configuration file \"/usr/share/containers/containers.conf\""
time="2022-11-30T01:41:30Z" level=debug msg="Merged system config \"/usr/share/containers/containers.conf\""
time="2022-11-30T01:41:30Z" level=trace msg="&{Containers:{Devices:[] Volumes:[] ApparmorProfile:containers-default-0.50.1 Annotations:[] BaseHostsFile: CgroupNS:private Cgroups:enabled DefaultCapabilities:[CHOWN DAC_OVERRIDE FOWNER FSETID KILL NET_BIND_SERVICE SETFCAP SETGID SETPCAP SETUID SYS_CHROOT] DefaultSysctls:[net.ipv4.ping_group_range=0 0] DefaultUlimits:[] DefaultMountsFile: DNSServers:[] DNSOptions:[] DNSSearches:[] EnableKeyring:true EnableLabeling:true Env:[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin TERM=xterm] EnvHost:false HostContainersInternalIP: HTTPProxy:true Init:false InitPath: IPCNS:shareable LogDriver:journald LogSizeMax:-1 LogTag: NetNS:private NoHosts:false PidsLimit:2048 PidNS:private PrepareVolumeOnCreate:false SeccompProfile: ShmSize:65536k TZ: Umask:0022 UTSNS:private UserNS: UserNSSize:65536} Engine:{CgroupCheck:false CgroupManager:systemd ConmonEnvVars:[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] ConmonPath:[/usr/libexec/podman/conmon /usr/local/libexec/podman/conmon /usr/local/lib/podman/conmon /usr/bin/conmon /usr/sbin/conmon /usr/local/bin/conmon /usr/local/sbin/conmon /run/current-system/sw/bin/conmon] ConmonRsPath:[/usr/libexec/podman/conmonrs /usr/local/libexec/podman/conmonrs /usr/local/lib/podman/conmonrs /usr/bin/conmonrs /usr/sbin/conmonrs /usr/local/bin/conmonrs /usr/local/sbin/conmonrs /run/current-system/sw/bin/conmonrs] CompatAPIEnforceDockerHub:true DetachKeys:ctrl-p,ctrl-q EnablePortReservation:true Env:[] EventsLogFilePath: EventsLogFileMaxSize:1000000 EventsLogger:journald graphRoot:/var/home/jack/.local/share/containers/storage HelperBinariesDir:[/usr/local/libexec/podman /usr/local/lib/podman /usr/libexec/podman /usr/lib/podman] HooksDir:[/usr/share/containers/oci/hooks.d] ImageBuildFormat:oci ImageDefaultTransport:docker:// ImageParallelCopies:0 ImageDefaultFormat: ImageVolumeMode:bind InfraCommand: InfraImage: InitPath:/usr/libexec/podman/catatonit LockType:shm MachineEnabled:false MultiImageArchive:false Namespace: NetworkCmdPath: NetworkCmdOptions:[] NoPivotRoot:false NumLocks:2048 OCIRuntime:crun OCIRuntimes:map[crun:[/usr/bin/crun /usr/sbin/crun /usr/local/bin/crun /usr/local/sbin/crun /sbin/crun /bin/crun /run/current-system/sw/bin/crun] kata:[/usr/bin/kata-runtime /usr/sbin/kata-runtime /usr/local/bin/kata-runtime /usr/local/sbin/kata-runtime /sbin/kata-runtime /bin/kata-runtime /usr/bin/kata-qemu /usr/bin/kata-fc] krun:[/usr/bin/krun /usr/local/bin/krun] runc:[/usr/bin/runc /usr/sbin/runc /usr/local/bin/runc /usr/local/sbin/runc /sbin/runc /bin/runc /usr/lib/cri-o-runc/sbin/runc /run/current-system/sw/bin/runc] runj:[/usr/local/bin/runj] runsc:[/usr/bin/runsc /usr/sbin/runsc /usr/local/bin/runsc /usr/local/sbin/runsc /bin/runsc /sbin/runsc /run/current-system/sw/bin/runsc]] PodExitPolicy:continue PullPolicy:missing Remote:false RemoteURI: RemoteIdentity: ActiveService: ServiceDestinations:map[] SSHConfig:/var/home/jack/.ssh/config RuntimePath:[] RuntimeSupportsJSON:[crun runc kata runsc krun] RuntimeSupportsNoCgroups:[crun krun] RuntimeSupportsKVM:[kata kata-runtime kata-qemu kata-fc krun] SetOptions:{StorageConfigRunRootSet:false StorageConfigGraphRootSet:false StorageConfigGraphDriverNameSet:false StaticDirSet:false VolumePathSet:false TmpDirSet:false} SignaturePolicyPath:/etc/containers/policy.json SDNotify:false StateType:3 ServiceTimeout:5 StaticDir:/var/home/jack/.local/share/containers/storage/libpod StopTimeout:10 ExitCommandDelay:300 ImageCopyTmpDir:/var/tmp TmpDir:/run/user/1000/libpod/tmp VolumePath:/var/home/jack/.local/share/containers/storage/volumes VolumePluginTimeout:5 VolumePlugins:map[] ChownCopiedFiles:true CompressionFormat:} Machine:{CPUs:1 DiskSize:100 Image:testing Memory:2048 User:core Volumes:[$HOME:$HOME]} Network:{NetworkBackend: CNIPluginDirs:[/usr/local/libexec/cni /usr/libexec/cni /usr/local/lib/cni /usr/lib/cni /opt/cni/bin] DefaultNetwork:podman DefaultSubnet:10.88.0.0/16 DefaultSubnetPools:[{Base:10.89.0.0/16 Size:24} {Base:10.90.0.0/15 Size:24} {Base:10.92.0.0/14 Size:24} {Base:10.96.0.0/11 Size:24} {Base:10.128.0.0/9 Size:24}] NetworkConfigDir: DNSBindPort:0} Secrets:{Driver:file Opts:map[]} ConfigMaps:{Driver: Opts:map[]}}"
time="2022-11-30T01:41:30Z" level=debug msg="Using conmon: \"/usr/bin/conmon\""
time="2022-11-30T01:41:30Z" level=debug msg="Initializing boltdb state at /var/home/jack/.local/share/containers/storage/libpod/bolt_state.db"
time="2022-11-30T01:41:30Z" level=debug msg="systemd-logind: Could not activate remote peer: activation request failed: a concurrent deactivation request is already in progress."
time="2022-11-30T01:41:30Z" level=debug msg="Using graph driver overlay"
time="2022-11-30T01:41:30Z" level=debug msg="Using graph root /var/home/jack/.local/share/containers/storage"
time="2022-11-30T01:41:30Z" level=debug msg="Using run root /run/user/1000/containers"
time="2022-11-30T01:41:30Z" level=debug msg="Using static dir /var/home/jack/.local/share/containers/storage/libpod"
time="2022-11-30T01:41:30Z" level=debug msg="Using tmp dir /run/user/1000/libpod/tmp"
time="2022-11-30T01:41:30Z" level=debug msg="Using volume path /var/home/jack/.local/share/containers/storage/volumes"
time="2022-11-30T01:41:30Z" level=debug msg="Set libpod namespace to \"\""
time="2022-11-30T01:41:30Z" level=debug msg="Not configuring container store"
time="2022-11-30T01:41:30Z" level=debug msg="Initializing event backend journald"
time="2022-11-30T01:41:30Z" level=trace msg="found runtime \"/usr/bin/crun\""
time="2022-11-30T01:41:30Z" level=debug msg="Configured OCI runtime runc initialization failed: no valid executable found for OCI runtime runc: invalid argument"
time="2022-11-30T01:41:30Z" level=debug msg="Configured OCI runtime runj initialization failed: no valid executable found for OCI runtime runj: invalid argument"
time="2022-11-30T01:41:30Z" level=debug msg="Configured OCI runtime kata initialization failed: no valid executable found for OCI runtime kata: invalid argument"
time="2022-11-30T01:41:30Z" level=debug msg="Configured OCI runtime runsc initialization failed: no valid executable found for OCI runtime runsc: invalid argument"
time="2022-11-30T01:41:30Z" level=debug msg="Configured OCI runtime krun initialization failed: no valid executable found for OCI runtime krun: invalid argument"
time="2022-11-30T01:41:30Z" level=debug msg="Using OCI runtime \"/usr/bin/crun\""
time="2022-11-30T01:41:30Z" level=info msg="Setting parallel job count to 49"
-- hangs here forever --

The interesting log line seems to be systemd-logind: Could not activate remote peer: activation request failed: a concurrent deactivation request is already in progress..

Here's a list of running podman-related processes (ps -aef | grep podman) from before running the above command:

jack        2497    2342  0 01:39 ?        00:00:00 /usr/bin/conmon --api-version 1 -c 18cfab4f2c165de0a0a5143bef4deb2d769d2eccb731420bfcbf6b023de84f23 -u 18cfab4f2c165de0a0a5143bef4deb2d769d2eccb731420bfcbf6b023de84f23 -r /usr/bin/crun -b /var/home/jack/.local/share/containers/storage/overlay-containers/18cfab4f2c165de0a0a5143bef4deb2d769d2eccb731420bfcbf6b023de84f23/userdata -p /run/user/1000/containers/overlay-containers/18cfab4f2c165de0a0a5143bef4deb2d769d2eccb731420bfcbf6b023de84f23/userdata/pidfile -n hello-world --exit-dir /run/user/1000/libpod/tmp/exits --full-attach -s -l journald --log-level warning --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/run/user/1000/containers/overlay-containers/18cfab4f2c165de0a0a5143bef4deb2d769d2eccb731420bfcbf6b023de84f23/userdata/oci-log --conmon-pidfile /run/user/1000/containers/overlay-containers/18cfab4f2c165de0a0a5143bef4deb2d769d2eccb731420bfcbf6b023de84f23/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/home/jack/.local/share/containers/storage --exit-command-arg --runroot --exit-command-arg /run/user/1000/containers --exit-command-arg --log-level --exit-command-arg warning --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /run/user/1000/libpod/tmp --exit-command-arg --network-config-dir --exit-command-arg  --exit-command-arg --network-backend --exit-command-arg netavark --exit-command-arg --volumepath --exit-command-arg /var/home/jack/.local/share/containers/storage/volumes --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup --exit-command-arg --rm --exit-command-arg 18cfab4f2c165de0a0a5143bef4deb2d769d2eccb731420bfcbf6b023de84f23
jack        4116    2497  1 01:41 ?        00:00:00 /usr/bin/podman --root /var/home/jack/.local/share/containers/storage --runroot /run/user/1000/containers --log-level warning --cgroup-manager systemd --tmpdir /run/user/1000/libpod/tmp --network-config-dir  --network-backend netavark --volumepath /var/home/jack/.local/share/containers/storage/volumes --runtime crun --storage-driver overlay --events-backend journald container cleanup --rm 18cfab4f2c165de0a0a5143bef4deb2d769d2eccb731420bfcbf6b023de84f23

Output of podman version:

Client:       Podman Engine
Version:      4.3.1
API Version:  4.3.1
Go Version:   go1.19.2
Built:        Fri Nov 11 15:01:27 2022

Output of podman info:

host:
  arch: amd64
  buildahVersion: 1.28.0
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.5-1.fc37.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.5, commit: '
  cpuUtilization:
    idlePercent: 97.55
    systemPercent: 0.68
    userPercent: 1.77
  cpus: 16
  distribution:
    distribution: fedora
    variant: silverblue
    version: "37"
  eventLogger: journald
  hostname: frmwrk
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 6.0.9-300.fc37.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 62027968512
  memTotal: 67124658176
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.7-1.fc37.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.7
      commit: 40d996ea8a827981895ce22886a9bac367f87264
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  remoteSocket:
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-8.fc37.x86_64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 8589930496
  swapTotal: 8589930496
  uptime: 0h 13m 38.00s
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /var/home/jack/.config/containers/storage.conf
  containerStore:
    number: 2
    paused: 0
    running: 2
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/home/jack/.local/share/containers/storage
  graphRootAllocated: 1998678130688
  graphRootUsed: 218403266560
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 3
  runRoot: /run/user/1000/containers
  volumePath: /var/home/jack/.local/share/containers/storage/volumes
version:
  APIVersion: 4.3.1
  Built: 1668178887
  BuiltTime: Fri Nov 11 15:01:27 2022
  GitCommit: ""
  GoVersion: go1.19.2
  Os: linux
  OsArch: linux/amd64
  Version: 4.3.1

Package info (e.g. output of rpm -q podman or apt list podman or brew info podman):

podman-4.3.1-1.fc37.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

Fedora Silverblue 37

@jackwilsdon jackwilsdon changed the title Generated user units do not stop on shutdown before timeout expires Generated systemd user units do not stop on shutdown before timeout expires Nov 30, 2022
@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Nov 30, 2022
@vrothberg
Copy link
Member

Thanks for reaching out. I haven't seen this happen yet.

@Luap99 do you have a suspicion? Looks like a deadlock although my gut feeling is that we may need to tweak the dependencies? Maybe Silverblue has a different systemd config such that other services get shutdown/unmounted in parallel?

@Luap99
Copy link
Member

Luap99 commented Nov 30, 2022

Never seen this as well, maybe something changed in systemd?
@jackwilsdon Do you have lingering fort the user enabled?

Also could you try to remove --rm from the ExecStart= line and then restart the unit and test again.

@jackwilsdon
Copy link
Author

Lingering isn't enabled:

$ loginctl user-status | grep '^\s\+Linger'
	  Linger: no

Removing --rm (and rebooting a few times) doesn't seem to have fixed it - still seeing the same timeout at shutdown.

@jackwilsdon
Copy link
Author

jackwilsdon commented Dec 10, 2022

Looks like this is a duplicate of #14531, which has been fixed by #16785 (I've tested locally and my machine shuts down properly on 205cb50).

@vrothberg
Copy link
Member

That is really great news. Thanks a lot for checking and sharing, @jackwilsdon !

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 7, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

3 participants