Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

on system-(re)boot podman locks while starting a pod on a autogenerated systemdfile on F31 #5377

Closed
groovyman opened this issue Mar 3, 2020 · 2 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@groovyman
Copy link

groovyman commented Mar 3, 2020

/kind bug

Description
i created two containers, that are running under two separate user accounts. After successful podman-build and podman podman-run both container are running. Both container are currently build on FedoraF31, the host is also running a F31-server edition. The hardware (testbed) is a cheap J4205 server with 4 raid disks, the OS is configured to load updates by itself and perform reboots, if needed. The container:

  1. simple-app: is running user-account, that can be accessed from outside via ssh -Y user@chasmash -p 9222 ./startShellJavaSwingapp.sh. So i can access this desktop-application from any desktop. In an lan this application is running fast enough.

  2. web-application, more complex: this container is running a perl/php fastcgi-apache web-applicaiton, that needs access to a postgres-database, that is running on the host. The podmanan-Dockerfile allows also to generate an image including its own database for development.

I can build and run both apps on the server. From time to time the server needs a reboot, so i have to integrate the podman-start/stop with systemd. By calling:

$ ssh -Y lohnbucher@chasmash -p 9222  ./jlohn28/startjl.sh > container-kivi.service
$ cp container-kivi.service ~/.config/systemd/user/container-kivi.service
$ systemctl --user daemon-reload
$ systemctl --user enable container-kivi.service
$ systemctl --user start  container-kivi.service

does exactly what i want for both application. Great Job!

But when a reboot takes place, something strange happens. It seems, that the podman-start is blocking for any reason, because when i login, the podman tells (kivi_p05 is the desired container) me:

[techlxoffice@chasmash ~]$ podman ps -a
CONTAINER ID  IMAGE                          COMMAND               CREATED       STATUS                   PORTS                        NAMES
016f2e02a570  localhost/kivi_prod_05:latest  /lib/systemd/syst...  32 hours ago  Created                  192.168.178.39:9190->80/tcp  kiki_p05
0925fa89f576  localhost/kivi_prod_04:latest  /lib/systemd/syst...  38 hours ago  Exited (0) 36 hours ago  192.168.178.39:9190->80/tcp  kiki_p04
583200deccf4  localhost/kivi_prod:latest     /lib/systemd/syst...  3 weeks ago   Created                  192.168.178.39:9190->80/tcp  funny_zhukovsky
[techlxoffice@chasmash ~]$ podman start kiki_p05

When you start kiki_p05 see above, the command simply blocks and will not return. Typing a control-c breaks the command. From time-to-time, when you issue to podman command it again, it succeeds.

Well systemctl will give you no more information, because it runs at the end a podman, but returns after some time, when the issued podman is still running (and maybe blocking other calls)

[techlxoffice@chasmash ~]$ systemctl --user start container-kivi.service
Job for container-kivi.service failed because a timeout was exceeded.
See "systemctl --user status container-kivi.service" and "journalctl --user -xe" for details.

[techlxoffice@chasmash ~]$ systemctl --user status container-kivi.service
● container-kivi.service - Podman container-kiki_p05.service
   Loaded: loaded (/mnt/raidSpace/Homes/techlxoffice/.config/systemd/user/container-kivi.service; enabled; vendor preset: enabled)
   Active: activating (start) since Tue 2020-03-03 08:18:10 CET; 10s ago
     Docs: man:podman-generate-systemd(1)
Cntrl PID: 1656 (podman)
    Tasks: 35 (limit: 18752)
   Memory: 84.8M
      CPU: 150ms
   CGroup: /user.slice/user-1001.slice/[email protected]/container-kivi.service
           ├─1611 /usr/bin/podman start kiki_p05
           ├─1625 /usr/bin/slirp4netns --disable-host-loopback --mtu 65520 -c -e 3 -r 4 --netns-type=path /run/user/1001/netns/cni-88cf2e5f-cf3d-b489-9844-2cfb42ead18>
           ├─1628 containers-rootlessport
           ├─1644 /usr/bin/fuse-overlayfs -o lowerdir=/mnt/raidSpace/Homes/techlxoffice/.local/share/containers/storage/overlay/l/VPX562TR4OBJCDWG3H2HMRVR6N:/mnt/raid>
           └─1656 /usr/bin/podman start kiki_p05

Mär 03 08:18:10 chasmash systemd[1542]: Starting Podman container-kiki_p05.service...
Mär 03 08:19:40 chasmash systemd[1542]: container-kivi.service: Found left-over process 1644 (fuse-overlayfs) in control group while starting unit. Ignoring.
Mär 03 08:19:40 chasmash systemd[1542]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Mär 03 08:19:40 chasmash systemd[1542]: container-kivi.service: Found left-over process 1656 (podman) in control group while starting unit. Ignoring.
Mär 03 08:19:40 chasmash systemd[1542]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Mär 03 08:19:40 chasmash systemd[1542]: Starting Podman container-kiki_p05.service...

Anyway i also saw this behaviour before, and the simple application with the java-swing application inside has the same problem. It can not be started after a system reboot.

An idea:
Maybe you expect a logged in user and each account. When podman is called by systemctl --user there is no login on the console or via sshd. I found out, when i login on the account of the simply application from the root account by calling su - simple, then podman fails to run. A real login over ssh or terminal does something else, that systemctl --user maybe forget to do (just an idea).

Steps to reproduce the issue:

  1. create a technical user on a server

  2. inside this technical user do a podman build, podman run and create a systemd-file podman generate systemd --name kiki_p05. Copy the output into a file under ~/.config/systemd/user/ reload and enable the new service

  3. perform a reboot an you gonny miss your running container

Describe the results you received:
a container, that is not running and has problems to get started

Describe the results you expected:
after a host reboot a runnning container

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

[techlxoffice@chasmash ~]$ podman version
Version:            1.8.0
RemoteAPI Version:  1
Go Version:         go1.13.6
OS/Arch:            linux/amd64

Output of podman info --debug:

[techlxoffice@chasmash ~]$ podman info --debug
debug:
  compiler: gc
  git commit: ""
  go version: go1.13.6
  podman version: 1.8.0
host:
  BuildahVersion: 1.13.1
  CgroupVersion: v2
  Conmon:
    package: conmon-2.0.10-2.fc31.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.10, commit: 6b526d9888abb86b9e7de7dfdeec0da98ad32ee0'
  Distribution:
    distribution: fedora
    version: "31"
  IDMappings:
    gidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
  MemFree: 15543816192
  MemTotal: 16427073536
  OCIRuntime:
    name: crun
    package: crun-0.12.2.1-1.fc31.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 0.12.2.1
      commit: cd7cea7114db5f6aa35fbb69fa307c19c2728a31
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  SwapFree: 8300523520
  SwapTotal: 8300523520
  arch: amd64
  cpus: 4
  eventlogger: journald
  hostname: chasmash
  kernel: 5.5.7-200.fc31.x86_64
  os: linux
  rootless: true
  slirp4netns:
    Executable: /usr/bin/slirp4netns
    Package: slirp4netns-0.4.0-20.1.dev.gitbbd6f25.fc31.x86_64
    Version: |-
      slirp4netns version 0.4.0-beta.3+dev
      commit: bbd6f25c70d5db2a1cd3bfb0416a8db99a75ed7e
  uptime: 42m 3.15s
registries:
  search:
  - docker.io
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - quay.io
store:
  ConfigFile: /mnt/raidSpace/Homes/techlxoffice/.config/containers/storage.conf
  ContainerStore:
    number: 3
  GraphDriverName: overlay
  GraphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-0.7.5-2.fc31.x86_64
      Version: |-
        fusermount3 version: 3.6.2
        fuse-overlayfs: version 0.7.5
        FUSE library version 3.6.2
        using FUSE kernel interface version 7.29
  GraphRoot: /mnt/raidSpace/Homes/techlxoffice/.local/share/containers/storage
  GraphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  ImageStore:
    number: 171
  RunRoot: /run/user/1001/containers
  VolumePath: /mnt/raidSpace/Homes/techlxoffice/.local/share/containers/storage/volumes

Package info (e.g. output of rpm -q podman or apt list podman):

[techlxoffice@chasmash ~]$ rpm -q podman
podman-1.8.0-2.fc31.x86_64

Host is running a Fedora F31 server edition (with auto update)

@groovyman
Copy link
Author

groovyman commented Mar 3, 2020

a minor update: (see also #5094) The services are not being launched if i extend the the systemd-service-file with dependencies to [email protected].

# container-kiki_p05.service
# autogenerated by Podman 1.8.0
# Sun Mar  1 23:30:21 CET 2020

[Unit]
Description=Podman container-kiki_p05.service
Documentation=man:podman-generate-systemd(1)
Requires=postgresql.service [email protected]
After=network.target postgresql.target [email protected]

[Service]
Restart=on-failure
ExecStart=/usr/bin/podman start kiki_p05
ExecStop=/usr/bin/podman stop -t 10 kiki_p05
PIDFile=/run/user/1001/containers/overlay-containers/016f2e02a570cf36a5822dea07bb3f14d5ab685fde636594085d58c834ec07ff/userdata/conmon.pid
KillMode=none
Type=forking

[Install]
WantedBy=multi-user.target

@vrothberg
Copy link
Member

Closing as we already discuss ~ the same issue in #5094.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

3 participants