Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rootless podman fails to run after system reboot #7976

Closed
balamuruganravi opened this issue Oct 9, 2020 · 32 comments
Closed

rootless podman fails to run after system reboot #7976

balamuruganravi opened this issue Oct 9, 2020 · 32 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@balamuruganravi
Copy link

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

We are running our pod using systemctl --user start pod-appd.service after the system reboot. none of the podman commands working. its hang.

Steps to reproduce the issue:

  1. mkdir -p ~/.config/systemd/user

  2. podman generate systemd --files --restart-policy=always --name appd

  3. cp .service /home/podman/.config/systemd/user

  4. systemctl --user daemon-reload

  5. systemctl --user enable pod-appd.service

  6. systemctl --user start pod-appd.service

  7. loginctl enable-linger podman

Describe the results you received:
after the system reboot none of the podman commands are working other than podman version.

Describe the results you expected:
The pod & containers should have started and all podman commands should work

Additional information you deem important (e.g. issue happens only occasionally):
Initially in that instance I got Failed to connect to bus: No such file or directory error for systemctl --user. So I added
export XDG_RUNTIME=/run/user/$(id -u) in /home/podman/.bashrc` file.

Output of podman version:

Version:            1.6.4
RemoteAPI Version:  1
Go Version:         go1.13.4
OS/Arch:            linux/amd64

Output of podman info --debug:

debug:
  compiler: gc
  git commit: ""
  go version: go1.13.4
  podman version: 1.6.4
host:
  BuildahVersion: 1.12.0-dev
  CgroupVersion: v1
  Conmon:
    package: conmon-2.0.6-1.module+el8.2.0+6369+1f4293b4.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.6, commit: e33ff1d39b97fdec3963b8ae6621e2a235c1ac17'
  Distribution:
    distribution: '"rhel"'
    version: "8.2"
  IDMappings:
    gidmap:
    - container_id: 0
      host_id: 600000
      size: 1
    - container_id: 1
      host_id: 666666
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 600000
      size: 1
    - container_id: 1
      host_id: 666666
      size: 65536
  MemFree: 141611008
  MemTotal: 8189214720
  OCIRuntime:
    name: runc
    package: runc-1.0.0-64.rc10.module+el8.2.0+6369+1f4293b4.x86_64
    path: /usr/bin/runc
    version: 'runc version spec: 1.0.1-dev'
  SwapFree: 2012213248
  SwapTotal: 2097147904
  arch: amd64
  cpus: 4
  eventlogger: journald
  hostname: uklvadsb0411
  kernel: 4.18.0-193.14.3.el8_2.x86_64
  os: linux
  rootless: true
  slirp4netns:
    Executable: /usr/bin/slirp4netns
    Package: slirp4netns-0.4.2-3.git21fdece.module+el8.2.0+6369+1f4293b4.x86_64
    Version: |-
      slirp4netns version 0.4.2+dev
      commit: 21fdece2737dc24ffa3f01a341b8a6854f8b13b4
  uptime: 356h 18m 10.28s (Approximately 14.83 days)
registries:
  blocked:
  - all
  insecure: null
  search:
  - artifactory.global.com
  - prod.artifactory.global.com
store:
  ConfigFile: /home/podman/.config/containers/storage.conf
  ContainerStore:
    number: 0
  GraphDriverName: overlay
  GraphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-0.7.2-5.module+el8.2.0+6369+1f4293b4.x86_64
      Version: |-
        fuse-overlayfs: version 0.7.2
        FUSE library version 3.2.1
        using FUSE kernel interface version 7.26
  GraphRoot: /home/podman/.local/share/containers/storage
  GraphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  ImageStore:
    number: 0
  RunRoot: /run/user/600000
  VolumePath: /home/podman/.local/share/containers/storage/volumes

Package info (e.g. output of rpm -q podman or apt list podman):

podman-1.6.4-15.module+el8.2.0+7290+954fb593.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?

Yes its working in podman 1.9.3 version

Additional environment details (AWS, VirtualBox, physical, etc.):
VirtualBox

@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 9, 2020
@balamuruganravi
Copy link
Author

Hi Team,
Could someone please help me to resolve the above issue.

@zhangguanzhang
Copy link
Collaborator

@mheon PTAL

@mheon
Copy link
Member

mheon commented Oct 12, 2020

The Failed to connect to bus bit makes me think that the systemd user session is not up - have you run loginctl enable-linger for the user(s) in question?

@balamuruganravi
Copy link
Author

yes. after enabling the systemd user session. I run loginctl enable-linger podman before the system reboot. after the system is up , unable to run any of the podman commands.

@mheon
Copy link
Member

mheon commented Oct 12, 2020

@giuseppe PTAL - Any ideas? User has lingering enabled, but at the same time we don't seem to have a systemd user session active.

@giuseppe
Copy link
Member

what do you see with systemctl --user status dbus-broker.service ?

I wonder if the podman service is trying to start before dbus is running. In this case, could you try adding a dependency on dbus.socket in the podman .service file?

@balamuruganravi
Copy link
Author

podman@container-host[~] $ systemctl --user status dbus-broker.service
Failed to connect to bus: No such file or directory

may I know where is the podman .service file located?

@giuseppe
Copy link
Member

I meant the .service file for starting the container ->cp .service /home/podman/.config/systemd/user

@balamuruganravi
Copy link
Author

Sorry I am not sure where to add the dbus.socket value , could you please explain in detail. Here is my current pod systemd unit file
cat /home/podman/.config/systemd/user/pod-appd.service

# pod-appd.service
# autogenerated by Podman 1.6.4
# Thu Oct  8 16:56:00 BST 2020

[Unit]
Description=Podman pod-appd.service
Documentation=man:podman-generate-systemd(1)
Requires=container-elasticsearch.service container-grafana.service container-grafana-reporter.service container-heartbeat.service container-kibana.service container-logstash.service
Before=container-elasticsearch.service container-grafana.service container-grafana-reporter.service container-heartbeat.service container-kibana.service container-logstash.service

[Service]
Restart=always
ExecStart=/usr/bin/podman start d8f028f6ab76-infra
ExecStop=/usr/bin/podman stop -t 10 d8f028f6ab76-infra
KillMode=none
Type=forking
PIDFile=/run/user/600000/overlay-containers/e374c71dfb99ae6154f4f7a287dbc4b2b231a207fb3f44ef996451179820832a/userdata/conmon.pid

[Install]
WantedBy=multi-user.target

@giuseppe
Copy link
Member

in the Requires= list

@balamuruganravi
Copy link
Author

thanks @giuseppe. added the dbus.socket in pod-appd.service file exit and re-login as podman user. but still I get the same error unable to execute any of the podman commands it hang.

podman@container-host[~] $ systemctl --user status dbus-broker.service
Failed to connect to bus: No such file or directory

cat /home/podman/.config/systemd/user/pod-appd.service

# pod-appd.service
# autogenerated by Podman 1.6.4
# Thu Oct  8 16:56:00 BST 2020

[Unit]
Description=Podman pod-appd.service
Documentation=man:podman-generate-systemd(1)
Requires=dbus.socket container-elasticsearch.service container-grafana.service container-grafana-reporter.service container-heartbeat.service container-kibana.service container-logstash.service
Before=container-elasticsearch.service container-grafana.service container-grafana-reporter.service container-heartbeat.service container-kibana.service container-logstash.service

[Service]
Restart=always
ExecStart=/usr/bin/podman start d8f028f6ab76-infra
ExecStop=/usr/bin/podman stop -t 10 d8f028f6ab76-infra
KillMode=none
Type=forking
PIDFile=/run/user/600000/overlay-containers/e374c71dfb99ae6154f4f7a287dbc4b2b231a207fb3f44ef996451179820832a/userdata/conmon.pid

[Install]
WantedBy=multi-user.target
podman@container-host[600000] $ pwd
/run/user/600000
podman@container-host[600000] $ ls
bus  libpod  systemd

@giuseppe
Copy link
Member

is there any useful information in the user journal journalctl --user or alternatively in the system log?

@balamuruganravi
Copy link
Author

journalctl --user

Oct 08 17:02:57 container-host systemd[5599]: Started podman-89313.scope.
Oct 08 17:03:02 container-host systemd[5599]: Started podman-89333.scope.
Oct 08 17:04:02 container-host systemd[5599]: Started podman-89417.scope.
Oct 08 17:04:10 container-host systemd[5599]: Started podman-89440.scope.
Oct 08 17:37:15 container-host systemd[5599]: Started podman-93015.scope.
Oct 08 17:37:23 container-host systemd[5599]: Started podman-93033.scope.
Oct 08 17:37:40 container-host systemd[5599]: Stopped target Default.
Oct 08 17:37:40 container-host systemd[5599]: Stopping Podman container-logstash.service...
Oct 08 17:37:40 container-host systemd[5599]: Stopping Podman container-grafana-reporter.service...
Oct 08 17:37:40 container-host systemd[5599]: Stopping D-Bus User Message Bus...
Oct 08 17:37:40 container-host systemd[5599]: Stopping Podman container-grafana.service...
Oct 08 17:37:40 container-host systemd[5599]: Stopping Podman container-heartbeat.service...
Oct 08 17:37:40 container-host systemd[5599]: Stopping Podman container-elasticsearch.service...
Oct 08 17:37:40 container-host systemd[5599]: Stopping Podman container-kibana.service...
Oct 08 17:37:40 container-host systemd[5599]: Removed slice user.slice.
Oct 08 17:37:40 container-host systemd[5599]: Stopped D-Bus User Message Bus.
Oct 08 17:37:40 container-host systemd[5599]: container-logstash.service: Control process exited, code=exited status=1
Oct 08 17:37:40 container-host podman[93088]: cannot open /proc/5653/ns/mnt: No such file or directory
Oct 08 17:37:40 container-host systemd[5599]: container-logstash.service: Failed with result 'exit-code'.
Oct 08 17:37:40 container-host systemd[5599]: Stopped Podman container-logstash.service.
Oct 08 17:37:40 container-host podman[93091]: cannot open /proc/5653/ns/mnt: No such file or directory
Oct 08 17:37:40 container-host systemd[5599]: container-grafana.service: Control process exited, code=exited status=1
Oct 08 17:37:40 container-host systemd[5599]: container-grafana.service: Failed with result 'exit-code'.
Oct 08 17:37:40 container-host systemd[5599]: Stopped Podman container-grafana.service.
Oct 08 17:39:10 container-host systemd[5599]: container-grafana-reporter.service: Stopping timed out. Terminating.
Oct 08 17:39:10 container-host systemd[5599]: container-grafana-reporter.service: Failed with result 'timeout'.
Oct 08 17:39:10 container-host systemd[5599]: Stopped Podman container-grafana-reporter.service.
Oct 08 17:39:10 container-host systemd[5599]: container-heartbeat.service: Stopping timed out. Terminating.
Oct 08 17:39:10 container-host systemd[5599]: container-heartbeat.service: Failed with result 'timeout'.
Oct 08 17:39:10 container-host systemd[5599]: Stopped Podman container-heartbeat.service.
Oct 08 17:39:10 container-host systemd[5599]: container-elasticsearch.service: Stopping timed out. Terminating.
Oct 08 17:39:10 container-host systemd[5599]: container-elasticsearch.service: Failed with result 'timeout'.
Oct 08 17:39:10 container-host systemd[5599]: Stopped Podman container-elasticsearch.service.
Oct 08 17:39:10 container-host systemd[5599]: container-kibana.service: Stopping timed out. Terminating.
Oct 08 17:39:10 container-host systemd[5599]: container-kibana.service: Failed with result 'timeout'.
Oct 08 17:39:10 container-host systemd[5599]: Stopped Podman container-kibana.service.
Oct 08 17:39:10 container-host systemd[5599]: Stopping Podman pod-appd.service...
Oct 08 17:39:40 container-host systemd[5603]: pam_unix(systemd-user:session): session closed for user podman
-- Reboot --

@balamuruganravi
Copy link
Author

Hi @giuseppe did you get chance to look at the log.

@balamuruganravi
Copy link
Author

Hi @giuseppe I could see the below message in journalctl --user log is it causing the issue ? could you please verify. still I am stuck at this issue not able to proceed. after the system reboot containers are not auto started and all podman commands got hang

Oct 21 16:08:28 uklvadsb0355 systemd[5585]: pod-appd.service: Found left-over process 159323 (fuse-overlayfs) in con>
Oct 21 16:08:28 uklvadsb0355 systemd[5585]: This usually indicates unclean termination of a previous run, or service im>
Oct 21 16:08:28 uklvadsb0355 systemd[5585]: pod-appd.service: Found left-over process 159332 (conmon) in control gro>
Oct 21 16:08:28 uklvadsb0355 systemd[5585]: This usually indicates unclean termination of a previous run, or service im>

@balamuruganravi
Copy link
Author

Hi all any update on this issue. Can someone please take a look.

@giuseppe
Copy link
Member

sorry, it looks like the service is called dbus.service on RHEL 8.2 instead of dbus-broker.service as it is on Fedora with a newer systemd.

Could you check the status of the dbus service (systemctl --user status dbus.service) and use dbus.service for the Requires= list?

@balamuruganravi
Copy link
Author

Hi @giuseppe for systemctl --user status dbus.service I get Failed to connect to bus: No such file or directory. But the same command worked before the system reboot.

@balamuruganravi
Copy link
Author

balamuruganravi commented Oct 28, 2020

Hi @giuseppe I just reproduced the issue in other instance, surprisingly now the podman commands are not hang, but it throws different error, please verify the below steps and let me know if anything is wrong here.
Step 1: ssh into the instance and switch to the rootless user(podman) sudo su - podman
Step 2: podman@container-host[~] $ systemctl --user status dbus.service
Failed to connect to bus: No such file or directory
Step 3: Added export XDG_RUNTIME_DIR=/run/user/$(id -u) .bashrc file
cat /home/podman/.bashrc

# .bashrc
# Source global definitions
if [ -f /etc/bashrc ]; then
       . /etc/bashrc
fi
# User specific environment
PATH="$HOME/.local/bin:$HOME/bin:$PATH"
export PATH
# Uncomment the following line if you don't like systemctl's auto-paging feature:
# export SYSTEMD_PAGER=
# User specific aliases and functions
podman ps &>/dev/null || podman system migrate &>/dev/null
export XDG_RUNTIME_DIR=/run/user/$(id -u)

Step 4: podman@container-host[~] $ loginctl enable-linger podman
Step 5: Now exit and re login as podman user and verify podman@container-host[~]$ systemctl --user status dbus.service

● dbus.service - D-Bus User Message Bus
   Loaded: loaded (/usr/lib/systemd/user/dbus.service; static; vendor preset: enabled)
   Active: active (running) since Wed 2020-10-28 13:36:11 GMT; 13min ago
     Docs: man:dbus-daemon(1)
 Main PID: 2444 (dbus-daemon)
   CGroup: /user.slice/user-600000.slice/[email protected]/dbus.service
           └─2444 /usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only

Step 6: podman@container-host[~] $ mkdir -p ~/.config/systemd/user
Step 7: create the appd pod and added containers to it
Step 8: podman@container-host[~]$ podman generate systemd --files --restart-policy=always --name appd
Step 9: podman@container-host[~]$ cp .service /home/podman/.config/systemd/user
Step 10: added dbus.service in the pod-appd.service file

cat /home/podman/.config/systemd/user/pod-appd.service
# pod-appd.service
# autogenerated by Podman 1.6.4
# Wed Oct 28 13:21:57 GMT 2020

[Unit]
Description=Podman pod-appd.service
Documentation=man:podman-generate-systemd(1)
Requires=dbus.service container-elasticsearch.service container-grafana.service container-kibana.service
Before=container-elasticsearch.service container-grafana.service container-kibana.service

[Service]
Restart=always
ExecStart=/usr/bin/podman start 287ed6d61390-infra
ExecStop=/usr/bin/podman stop -t 10 287ed6d61390-infra
KillMode=none
Type=forking
PIDFile=/run/user/600000/overlay-containers/39f0551bec3ca536971204ababf057e984afc9fc7df3e552a036ebf942f952c4/userdata/conmon.pid

[Install]
WantedBy=multi-user.target

Step 11: podman@container-host[~]$ systemctl --user daemon-reload
Step 12: podman@container-host[~]$ systemctl --user enable pod-appd.service
Step 13: podman@container-host[~] $ systemctl --user start pod-appd.service
Step 14: podman@container-host[~] $ podman ps list all the containers
Step 15: after the system reboot I got the below error

ERRO[0000] error joining network namespace for container 39f0551bec3ca536971204ababf057e984afc9fc7df3e552a036ebf942f952c4: error retrieving network namespace at /run/user/600000/netns/cni-c6e2dde6-ee45-5da5-86ae-b14d5ef3cc49: failed to Statfs "/run/user/600000/netns/cni-c6e2dde6-ee45-5da5-86ae-b14d5ef3cc49": no such file or directory
ERRO[0000] unable to get container info: "container 39f0551bec3ca536971204ababf057e984afc9fc7df3e552a036ebf942f952c4 is not valid: container has already been removed"

podman info --debug

podman@uklvadsb0454[DEV][~] $ podman info --debug
debug:
  compiler: gc
  git commit: ""
  go version: go1.13.4
  podman version: 1.6.4
host:
  BuildahVersion: 1.12.0-dev
  CgroupVersion: v1
  Conmon:
    package: conmon-2.0.6-1.module+el8.2.0+6369+1f4293b4.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.6, commit: e33ff1d39b97fdec3963b8ae6621e2a235c1ac17'
  Distribution:
    distribution: '"rhel"'
    version: "8.2"
  IDMappings:
    gidmap:
    - container_id: 0
      host_id: 600000
      size: 1
    - container_id: 1
      host_id: 666666
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 600000
      size: 1
    - container_id: 1
      host_id: 666666
      size: 65536
  MemFree: 6345838592
  MemTotal: 8189214720
  OCIRuntime:
    name: runc
    package: runc-1.0.0-64.rc10.module+el8.2.0+6369+1f4293b4.x86_64
    path: /usr/bin/runc
    version: 'runc version spec: 1.0.1-dev'
  SwapFree: 2097147904
  SwapTotal: 2097147904
  arch: amd64
  cpus: 4
  eventlogger: journald
  hostname: uklvadsb0454
  kernel: 4.18.0-193.14.3.el8_2.x86_64
  os: linux
  rootless: true
  slirp4netns:
    Executable: /usr/bin/slirp4netns
    Package: slirp4netns-0.4.2-3.git21fdece.module+el8.2.0+6369+1f4293b4.x86_64
    Version: |-
      slirp4netns version 0.4.2+dev
      commit: 21fdece2737dc24ffa3f01a341b8a6854f8b13b4
  uptime: 1h 2m 31.91s (Approximately 0.04 days)
registries:
  blocked:
  - all
  insecure: null
  search:
  - artifactory.global.com
  - prod.artifactory.global.com
store:
  ConfigFile: /home/podman/.config/containers/storage.conf
  ContainerStore:
    number: 4
  GraphDriverName: overlay
  GraphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-0.7.2-5.module+el8.2.0+6369+1f4293b4.x86_64
      Version: |-
        fuse-overlayfs: version 0.7.2
        FUSE library version 3.2.1
        using FUSE kernel interface version 7.26
  GraphRoot: /home/podman/.local/share/containers/storage
  GraphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  ImageStore:
    number: 4
  RunRoot: /run/user/600000
  VolumePath: /home/podman/.local/share/containers/storage/volumes

@giuseppe
Copy link
Member

ERRO[0000] error joining network namespace for container 39f0551bec3ca536971204ababf057e984afc9fc7df3e552a036ebf942f952c4: error retrieving network namespace at /run/user/600000/netns/cni-c6e2dde6-ee45-5da5-86ae-b14d5ef3cc49: failed to Statfs "/run/user/600000/netns/cni-c6e2dde6-ee45-5da5-86ae-b14d5ef3cc49": no such file or directory
ERRO[0000] unable to get container info: "container 39f0551bec3ca536971204ababf057e984afc9fc7df3e552a036ebf942f952c4 is not valid: container has already been removed"

I've never seen this issue before. Does it also happen if you run the /usr/bin/podman start 287ed6d61390-infra command manually?

@balamuruganravi
Copy link
Author

Yes I am getting the same error even If I run the /usr/bin/podman start 287ed6d61390-infra command manually.

@balamuruganravi
Copy link
Author

Hi @giuseppe did you get chance to reproduce this issue.
if I remove the dbus.service from the cat /home/podman/.config/systemd/user/pod-appd.service Requires= list and reboot the instance. I am unable to run any podman commands its totally hang. only podman version command works

appuser@container-host[~] $ sudo su - podman
Last login: Thu Oct 29 10:03:53 GMT 2020 on pts/1
^Cpodman@container-host[~] $ systemctl --user status dbus.service
Failed to connect to bus: No such file or directory
podman@container-host[~] $ podman ps

@balamuruganravi
Copy link
Author

Hi any update on this issue.

@balamuruganravi
Copy link
Author

@giuseppe PTAL

@giuseppe
Copy link
Member

giuseppe commented Nov 6, 2020

Both podman and systemd were updated in RHEL 8.3. Could you give it a try?

I've just tried following your exact steps on RHEL 8.3 and the container comes up after a reboot

@balamuruganravi
Copy link
Author

podman 1.6.4 available for RHEL 8.3?

is there any workaround in rhel 8.2 for the current issue? also could you please help me to understand why I am getting this issue. because the same steps works fine in podman 1.9.3 version

@giuseppe
Copy link
Member

giuseppe commented Nov 6, 2020

podman 2.0.5 is available on RHEL 8.3.

@balamuruganravi
Copy link
Author

is it safe to assume users on RHEL 8.3 can only configure podman 2.0.5v . But users on REHL 8.2 can configure podman 1.6.4 or 1.9.3.

@giuseppe
Copy link
Member

giuseppe commented Nov 6, 2020

is there any reason to stick with an older version when the updated version works fine?

@balamuruganravi
Copy link
Author

actually in my organization the installations are managed by different team. if you can confirm it's an issue with podman 1.6.4v I can inform our team to upgrade all podman installations to 1.9.3v on RHEL 8.2.

@giuseppe
Copy link
Member

giuseppe commented Nov 6, 2020

you might be hitting #5423 that is fixed in v1.9.3

@rhatdan
Copy link
Member

rhatdan commented Nov 6, 2020

This should be handled as a bugzilla as well, not an issue in the upstream repository.

@rhatdan rhatdan closed this as completed Nov 6, 2020
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

6 participants