Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Podman machine in "currently starting" state even after reboot of laptop #16945

Closed
jamesmortensen opened this issue Dec 27, 2022 · 9 comments · Fixed by #18328
Closed

Podman machine in "currently starting" state even after reboot of laptop #16945

jamesmortensen opened this issue Dec 27, 2022 · 9 comments · Fixed by #18328
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. machine

Comments

@jamesmortensen
Copy link

BIG REPORT

/kind bug

Description

Podman machine got stuck in the "currently starting" state:

$ podman machine list
NAME                     VM TYPE     CREATED      LAST UP             CPUS        MEMORY      DISK SIZE
podman-machine-default*  qemu        5 weeks ago  Currently starting  1           2.147GB     107.4GB

Even after rebooting the computer, it is still in this state. Further investigation shows that the JSON config file has the "Starting" property set to true:

~/.config/containers/podman/machine/qemu/podman/podman-machine-default.json

...
"CPUs": 1,
 "DiskSize": 100,
 "Memory": 2048,
 "IdentityPath": "/Users/james/.ssh/podman-machine-default",
 "Port": 54945,
 "RemoteUsername": "core",
 "Starting": true,
 "Created": "2022-11-17T18:16:58.007925+05:30",
 "LastUp": "2022-12-14T22:26:34.45365+05:30"

Steps to reproduce the issue:

  1. Not sure how the system ended up in this state.

Describe the results you received:

If the machine is not running, it shouldn't say "Currently starting" if it isn't actually starting.

Describe the results you expected:

If the system does end up in this state, it should be easier to get it out of this state without needing to dig into the profile directory. (Or alternatively, perhaps we can add this to the troubleshooting guide! :) )

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

$ podman --version
podman version 4.3.0

Output of podman info:

podman info
Error: failed to connect: dial tcp [::1]:54945: connect: connection refused

NOTE: The machine is not running, so maybe that's why info shows this message.

Package info (e.g. output of rpm -q podman or apt list podman or brew info podman):

$ brew info podman
==> podman: stable 4.3.1 (bottled), HEAD
Tool for managing OCI containers and pods
https://podman.io/
/opt/homebrew/Cellar/podman/4.3.0 (185 files, 47.6MB) *
  Poured from bottle on 2022-10-22 at 10:26:15
From: https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/podman.rb
License: Apache-2.0 and GPL-3.0-or-later
==> Dependencies
Build: go-md2man ✔, [email protected] ✘
Required: qemu ✔
==> Options
--HEAD
	Install HEAD version
==> Caveats
Bash completion has been installed to:
  /opt/homebrew/etc/bash_completion.d

To restart podman after an upgrade:
  brew services restart podman
Or, if you don't want/need a background service you can just run:
  /opt/homebrew/opt/podman/bin/podman system service --time=0
==> Analytics
install: 18,166 (30 days), 72,899 (90 days), 243,516 (365 days)
install-on-request: 16,942 (30 days), 68,176 (90 days), 237,078 (365 days)
build-error: 8 (30 days)

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?

Troubleshooting guide didn't seem to have any information about what to do of podman remote is stuck in the "starting" state.

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

  • Mac M1
  • macOS 13.0.1
@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 27, 2022
@jamesmortensen
Copy link
Author

Note that, after editing the JSON file and changing "Starting": true to false, we see this output:

$ podman machine list
NAME                     VM TYPE     CREATED      LAST UP      CPUS        MEMORY      DISK SIZE
podman-machine-default*  qemu        5 weeks ago  12 days ago  1           2.147GB     107.4GB

Moreover, the VM can now be started with podman machine start

podman info

$ podman info
host:
  arch: arm64
  buildahVersion: 1.28.0
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.5-1.fc37.aarch64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.5, commit: '
  cpuUtilization:
    idlePercent: 35.3
    systemPercent: 54.53
    userPercent: 10.17
  cpus: 1
  distribution:
    distribution: fedora
    variant: coreos
    version: "37"
  eventLogger: journald
  hostname: localhost.localdomain
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 1000000
    uidmap:
    - container_id: 0
      host_id: 502
      size: 1
    - container_id: 1
      host_id: 100000
      size: 1000000
  kernel: 6.0.9-300.fc37.aarch64
  linkmode: dynamic
  logDriver: journald
  memFree: 1671671808
  memTotal: 2049859584
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.7-1.fc37.aarch64
    path: /usr/bin/crun
    version: |-
      crun version 1.7
      commit: 40d996ea8a827981895ce22886a9bac367f87264
      rundir: /run/user/502/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/502/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-8.fc37.aarch64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 0
  swapTotal: 0
  uptime: 0h 0m 14.00s
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - docker.io
store:
  configFile: /var/home/core/.config/containers/storage.conf
  containerStore:
    number: 8
    paused: 0
    running: 0
    stopped: 8
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/home/core/.local/share/containers/storage
  graphRootAllocated: 106825756672
  graphRootUsed: 11202043904
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 30
  runRoot: /run/user/502/containers
  volumePath: /var/home/core/.local/share/containers/storage/volumes
version:
  APIVersion: 4.3.1
  Built: 1668178831
  BuiltTime: Fri Nov 11 20:30:31 2022
  GitCommit: ""
  GoVersion: go1.19.2
  Os: linux
  OsArch: linux/arm64
  Version: 4.3.1

Not sure if there is a "fix" to permanently prevent the system from getting stuck in this state, especially without concrete steps to replicate it, but I do think it would be good to add this to the troubleshooting guide. Hope this helps!

@vrothberg
Copy link
Member

Thanks for providing the details, @jamesmortensen !

@ashley-cui PTAL

@Luap99 Luap99 added the machine label Jan 3, 2023
@github-actions
Copy link

github-actions bot commented Feb 3, 2023

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Feb 3, 2023

@ashley-cui Any ideas?

@ashley-cui
Copy link
Member

I'd assume this is a really niche race condition somewhere? If podman was suddenly interrupted while it was starting a machine?

@ctml91
Copy link

ctml91 commented Feb 5, 2023

Not sure if there is a "fix" to permanently prevent the system from getting stuck in this state, especially without concrete steps to replicate it, but I do think it would be good to add this to the troubleshooting guide. Hope this helps!

Thanks @jamesmortensen , I stumbled into this and your solution helped. In my case this occurred due to an unclean shutdown where my mac encountered a kernel panic, I was doing dev related work with vscode dev container so it would have been interacting with podman at the time. It was in this state after booting back up.

$ podman machine list
NAME                    VM TYPE     CREATED      LAST UP             CPUS        MEMORY      DISK SIZE
podman-machine-default  qemu        2 hours ago  Currently starting  1           2.147GB     107.4GB
$ podman machine stop
Machine "podman-machine-default" stopped successfully
$ podman machine list
NAME                    VM TYPE     CREATED      LAST UP             CPUS        MEMORY      DISK SIZE
podman-machine-default  qemu        2 hours ago  Currently starting  1           2.147GB     107.4GB
$ podman machine list
NAME                    VM TYPE     CREATED      LAST UP             CPUS        MEMORY      DISK SIZE
podman-machine-default  qemu        2 hours ago  Currently starting  1           2.147GB     107.4GB

$ grep Starting ~/.config/containers/podman/machine/qemu/podman-machine-default.json
 "Starting": true,

$ sed -i 's/"Starting": true/"Starting": false/g' ~/.config/containers/podman/machine/qemu/podman-machine-default.json

$ podman machine list
NAME                    VM TYPE     CREATED      LAST UP         CPUS        MEMORY      DISK SIZE
podman-machine-default  qemu        2 hours ago  30 minutes ago  1           2.147GB     107.4GB

$ podman machine start
Starting machine "podman-machine-default"
Waiting for VM ...
Mounting volume... /Users:/Users
Mounting volume... /private:/private
Mounting volume... /var/folders:/var/folders
API forwarding listening on: /var/run/docker.sock
Docker API clients default to this address. You do not need to set DOCKER_HOST.

Machine "podman-machine-default" started successfully

Edit: turns out the shutdown borked my podman machine and can no longer connect to it. Creating a new machine works, but PITA losing my container volumes.

$ podman system connection list
Name        URI         Identity    Default
$ podman ps
Cannot connect to Podman. Please verify your connection to the Linux system using `podman system connection list`, or try `podman machine init` and `podman machine start` to manage a new Linux VM
Error: unable to connect to Podman socket: Get "http://d/v4.4.0/libpod/_ping": dial unix ///var/folders/5k/865_x9vd2_3f3k7bw36k87vc0000gp/T/podman-run--1/podman/podman.sock: connect: no such file or directory

@github-actions
Copy link

github-actions bot commented Mar 8, 2023

A friendly reminder that this issue had no activity for 30 days.

@jamesmortensen
Copy link
Author

Could we just make sure that podman machine stop changes the starting boolean to false? It seems that, when this rare issue does occur, our default instincts tell us to try to stop and restart. That's what both myself and @ctml91 attempted to do....

Not sure what kind of side effects would occur from that. I do think that if podman knows the pid of the underlying QEMU process, the script could check to see if that process is running already when we run podman machine stop. If the QEMU process isn't running, then I think it is safe to assume that the machine has already stopped and that we can change the state. What do you think?

@adrian-moisa
Copy link

adrian-moisa commented May 3, 2023

I had the exact same issue on a Mac M1. podman version 4.5.0 . Apparently podman crashes each time I start the machine. According to this ticket #14652 we need to upgrade macOS to 12.4. I'm currently on 12.0. Currently available version is 13.3.1

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Aug 25, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. machine
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants