Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman-machine-stop should block until the machine is in the target state #12815

Closed
justin-f-perez opened this issue Jan 11, 2022 · 13 comments · Fixed by #12835
Closed

podman-machine-stop should block until the machine is in the target state #12815

justin-f-perez opened this issue Jan 11, 2022 · 13 comments · Fixed by #12835
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@justin-f-perez
Copy link

justin-f-perez commented Jan 11, 2022

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

The interface for podman machine start and podman machine stop are inconsistent. The former blocks until the machine is in the target state, while the latter does not. (NOTE: this inconsistency is why I labeled this a bug and not a feature request.)

podman machine stop behavior causes many annoying problems which are illustrated below. Instead, it should block until the target state is reached (just like other podman machine commands do.)

Significance

Scripting

I never remember this inconsistency when I start scripting on a new project, and I've been burned by it multiple times. This was especially confusing the first time I encountered it; this cost (me personally) multiple hours of debug time.

User Experience

It's also very inconvenient for sending short snippets/advice to colleagues, e.g., this is bad advice because it results in an error:

Hey colleague, did you try restarting podman machine? podman machine stop && podman machine start

Instead, I need to send "colleague" an entire script that correctly blocks or explain to them the inconsistencies on podman's behavior and how to check for whether the machine is actually stopped.

Onboarding

Requiring users to know these implementation details just to "turn it off and on again" is harmful to podman adoption. I'm not going to recommend that my organization adopt podman if I can't tell a junior developer how to perform a simple operation (restart podman machine) without a Zoom call.

Documentation

The differences in start/stop blocking behavior aren't documented in any man pages. I checked:

  • man podman-machine
  • man podman-machine-stop
  • man podman-machine-start

I also checked the corresponding CLI help:

podman machine --help
podman machine start --help
podman machine stop --help

Backward compatibility

I can't think of any use case that would depend on podman not blocking. I suspect most users script against the podman cli are using their own hacky waiting scripts. Thus, if podman machine stop were just a noop in the case that the machine is already stopped, or blocking until the target state is reached in the case that the machine is running, I think this behavior change would be backward compatibile.

If blocking becomes the new default behavior, users can still get stop to not block the current process by backgrounding the command:

podman machine stop &

Furthermore, if podman machine stop blocks, users can check whether the command actually completed by checking the status of the backgrounded job. In the current implementation, the fact that the background job exited is meaningless.

If this behavior can't be implemented as default, I would be just as happy with a flag, e.g.: podman machine stop --wait, podman machine stop --block, or similar. I'll probably still forget that flag, but it's a much easier fix to add a flag to the command than it is to replace podman machine stop everywhere with my own hacks.

Steps to reproduce the issue

stop-and-list

This test case demonstrates that podman machine stop does not block until it reaches the target state.

podman machine stop && podman machine list && sleep 1 && podman machine list && sleep 1 && podman machine list

start-and-list

This test case demonstrates that podman machine start blocks until it reaches the target state.

podman machine start && podman machine list

stop-list-run-list-start

Naturally, you might think you can restart with a simple conjunction of stop and start, but you can't. You might also think "Currently running" means a podman machine is ready to receive work, but it doesn't.

❯ /bin/bash -xc 'podman machine start; podman machine stop; podman machine list; podman run --rm -it "ubuntu:focal" /bin/bash; podman machine list; podman machine start'
+ podman machine start
INFO[0000] waiting for clients...
INFO[0000] listening tcp://127.0.0.1:7777
INFO[0000] new connection from  to /var/folders/39/4mpw12m10xsg92p83p_6lp3rd4kkvh/T/podman/qemu_podman-machine-default.sock
Waiting for VM ...
Machine "podman-machine-default" started successfully
+ podman machine stop
+ podman machine list
NAME                     VM TYPE     CREATED     LAST UP            CPUS        MEMORY      DISK SIZE
podman-machine-default*  qemu        5 days ago  Currently running  3           4.295GB     21.47GB
+ podman run --rm -it ubuntu:focal /bin/bash
Cannot connect to Podman. Please verify your connection to the Linux system using `podman system connection list`, or try `podman machine init` and `podman machine start` to manage a new Linux VM
Error: unable to connect to Podman. failed to create sshClient: Connection to bastion host (ssh://core@localhost:60594/run/user/1000/podman/podman.sock) failed.: dial tcp [::1]:60594: connect: connection refused
+ podman machine list
NAME                     VM TYPE     CREATED     LAST UP            CPUS        MEMORY      DISK SIZE
podman-machine-default*  qemu        5 days ago  Currently running  3           4.295GB     21.47GB
+ podman machine start
Error: cannot start VM podman-machine-default: VM already running

podman machine list tells us the machine is 'Currently running'. That's misleading (it's shutting down).

podman machine start also tells us 'VM already running'. That's misleading (it's shutting down).

Actual results

podman machine stop does not block until the machine is stopped.

Expected results

I expect:

  • podman-machine-stop to block until it reaches the target state (i.e., if the machine is already stopped, return immediately; if the machine is not stopped, block until it stops).

As a consequence:

  • podman machine stop && podman machine start to restart the podman VM (without "VM already started" errors)
  • podman machine stop && podman machine list to indicate that that the machine is stopped (and for the machine to actually be stopped)

Additional information

This is a typical example of how I deal with this problem currently, on a host that runs only a single podman machine.

#!/bin/bash
set -u

podman_is_running() {
    podman machine list | grep -q 'Currently running'
}

countdown=$PODMAN_STOP_TIMEOUT
podman machine stop

printf "Waiting for podman machine to stop"
while podman_is_running && [ $countdown -gt 0 ]; do
    countdown=$((countdown - 1))
    sleep 1
    printf '.'
done
 
if podman_is_running; then
    echo "Timed out waiting for podman machine to stop. Kill podman?"
    exit 1
else
    echo "OK"
fi

Output of podman version:

Client:
Version:      3.4.4
API Version:  3.4.4
Go Version:   go1.17.3
Built:        Wed Dec  8 12:41:11 2021
OS/Arch:      darwin/arm64

Server:
Version:      3.4.4
API Version:  3.4.4
Go Version:   go1.16.8
Built:        Wed Dec  8 15:48:10 2021
OS/Arch:      linux/arm64

Output of podman info --debug:

host:
  arch: arm64
  buildahVersion: 1.23.1
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.0.30-2.fc35.aarch64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.30, commit: '
  cpus: 3
  distribution:
    distribution: fedora
    variant: coreos
    version: "35"
  eventLogger: journald
  hostname: localhost.localdomain
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.15.12-200.fc35.aarch64
  linkmode: dynamic
  logDriver: journald
  memFree: 3650338816
  memTotal: 4086677504
  ociRuntime:
    name: crun
    package: crun-1.4-1.fc35.aarch64
    path: /usr/bin/crun
    version: |-
      crun version 1.4
      commit: 3daded072ef008ef0840e8eccb0b52a7efbd165d
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.12-2.fc35.aarch64
    version: |-
      slirp4netns version 1.1.12
      commit: 7a104a101aa3278a2152351a082a6df71f57c9a3
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 0
  swapTotal: 0
  uptime: 11m 58.49s
plugins:
  log:
  - k8s-file
  - none
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - docker.io
store:
  configFile: /var/home/core/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/home/core/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 15
  runRoot: /run/user/1000/containers
  volumePath: /var/home/core/.local/share/containers/storage/volumes
version:
  APIVersion: 3.4.4
  Built: 1639000090
  BuiltTime: Wed Dec  8 21:48:10 2021
  GitCommit: ""
  GoVersion: go1.16.8
  OsArch: linux/arm64
  Version: 3.4.4

Package info

❯ HOMEBREW_NO_ANALYTICS=true brew info podman

podman: stable 3.4.4 (bottled), HEAD
Tool for managing OCI containers and pods
https://podman.io/
/opt/homebrew/Cellar/podman/3.4.4 (170 files, 41.0MB) *
  Poured from bottle on 2022-01-05 at 08:17:11
From: https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/podman.rb
License: Apache-2.0
==> Dependencies
Build: go ✘, go-md2man ✘
Required: qemu ✔
==> Options
--HEAD
	Install HEAD version
==> Caveats
zsh completions have been installed to:
  /opt/homebrew/share/zsh/site-functions


❯ brew ls podman

/opt/homebrew/Cellar/podman/3.4.4/INSTALL_RECEIPT.json
/opt/homebrew/Cellar/podman/3.4.4/LICENSE
/opt/homebrew/Cellar/podman/3.4.4/bin/podman
/opt/homebrew/Cellar/podman/3.4.4/bin/podman-remote
/opt/homebrew/Cellar/podman/3.4.4/.brew/podman.rb
/opt/homebrew/Cellar/podman/3.4.4/libexec/gvproxy
/opt/homebrew/Cellar/podman/3.4.4/etc/bash_completion.d/podman
/opt/homebrew/Cellar/podman/3.4.4/README.md
/opt/homebrew/Cellar/podman/3.4.4/share/man/man1/podman-machine-init.1
#  <more man pages here>
/opt/homebrew/Cellar/podman/3.4.4/share/man/man1/podman-system-connection-default.1
/opt/homebrew/Cellar/podman/3.4.4/share/zsh/site-functions/_podman
/opt/homebrew/Cellar/podman/3.4.4/share/fish/vendor_completions.d/podman.fish

Tested with latest?

yes (using latest from homebrew)

Checked Podman Troubleshooting Guide?

yes, podman machine stop (lack of) blocking is not mentioned anywhere.

Additional environment details (AWS, VirtualBox, physical, etc.):

local machine
HW: 2020 M1 MacBook Pro
arch: arm64
OS: Big Sur (11.6.2)
shell: zsh
podman installed by: brew

@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Jan 11, 2022
@rhatdan
Copy link
Member

rhatdan commented Jan 11, 2022

@baude PTAL
@justin-f-perez Interested in opening a PR to fix?

@justin-f-perez
Copy link
Author

justin-f-perez commented Jan 11, 2022

@rhatdan After some digging: Does the implementation of podman machine stop on the main branch block until completion? (If no one else answers this question before things slow down at work, I'll try to get podman setup from the main branch to answer it for myself- but this could be several weeks away)

If yes: this issue could be closed or left open for a small documentation PR (I could submit a small PR to against the troubleshooting guide describing the difference in behavior pre-v4.0 and the 'workaround snippet' from the problem description above).

If no: I could attempt a PR when I have more free time (extremely slammed at work right now) but the extent of my go experience is the 'digging' I did below.

digging around

I compared stop.go at v3.4.4 tag against the main branch; I see that there are changes that would print a message on successful stop. I can't imagine this would be useful unless this change also causes the command to block? I also found corresponding changes in pkg/machine/qemu/machine.go

diff --git a/cmd/podman/machine/stop.go b/cmd/podman/machine/stop.go
index 76ba85601..17969298b 100644
--- a/cmd/podman/machine/stop.go
+++ b/cmd/podman/machine/stop.go
@@ -1,11 +1,13 @@
-// +build amd64,!windows arm64,!windows
+//go:build amd64 || arm64
+// +build amd64 arm64
 
 package machine
 
 import (
+	"fmt"
+
 	"github.com/containers/podman/v3/cmd/podman/registry"
 	"github.com/containers/podman/v3/pkg/machine"
-	"github.com/containers/podman/v3/pkg/machine/qemu"
 	"github.com/spf13/cobra"
 )
 
@@ -31,20 +33,21 @@ func init() {
 // TODO  Name shouldn't be required, need to create a default vm
 func stop(cmd *cobra.Command, args []string) error {
 	var (
-		err    error
-		vm     machine.VM
-		vmType string
+		err error
+		vm  machine.VM
 	)
 	vmName := defaultMachineName
 	if len(args) > 0 && len(args[0]) > 0 {
 		vmName = args[0]
 	}
-	switch vmType {
-	default:
-		vm, err = qemu.LoadVMByName(vmName)
-	}
+	provider := getSystemDefaultProvider()
+	vm, err = provider.LoadVMByName(vmName)
 	if err != nil {
 		return err
 	}
-	return vm.Stop(vmName, machine.StopOptions{})
+	if err := vm.Stop(vmName, machine.StopOptions{}); err != nil {
+		return err
+	}
+	fmt.Printf("Machine %q stopped successfully\n", vmName)
+	return nil
 }

@afbjorklund
Copy link
Contributor

afbjorklund commented Jan 12, 2022

I can't imagine this would be useful unless this change also causes the command to block?

I think that was the original request:

But not the implementation, afaik.

@rhatdan
Copy link
Member

rhatdan commented Jan 12, 2022

@flouthoc @afbjorklund @baude Do we know if qemu is fully down at that point. Or are we just removing Podman's ability to communicate with it?

@baude
Copy link
Member

baude commented Jan 12, 2022

not blocked on main ... i'm looking into it.

@baude baude self-assigned this Jan 12, 2022
@afbjorklund
Copy link
Contributor

afbjorklund commented Jan 12, 2022

I don't think it does a controlled shutdown (like: ACPI) at least, because "stop" completes in like a millisecond

QMP

system_powerdown (Command)
Requests that a guest perform a powerdown operation.

Since
0.14

Notes
A guest may or may not respond to this command. This command returning does not indicate that a guest has accepted the request or that it has shut down. Many guests will respond to this command by prompting the user in some way.

baude added a commit to baude/podman that referenced this issue Jan 12, 2022
if users run podman machine stop && podman machine ls, the status of the
machine in the subsequent ls command would running.  now we wait for
everything to complete for stop so that scripting is more accurate.

Fixes: containers#12815

[NO NEW TESTS NEEDED]

Signed-off-by: Brent Baude <[email protected]>
@baude
Copy link
Member

baude commented Jan 13, 2022

@justin-f-perez mind pulling main and trying it out?

@justin-f-perez
Copy link
Author

thanks everyone!

@baude apologies I couldn't get to this sooner. I tried looking for instructions on building the podman client from source on macos, but all I could find for macos were the homebrew installation instructions. For future reference, is there a good way to update a homebrew installation to the tip of the containers/podman main branch? or instructions for building the client from source on macOS? thanks again!

@afbjorklund
Copy link
Contributor

afbjorklund commented Jan 19, 2022

@justin-f-perez: brew install --HEAD podman

      --HEAD                       If formula defines it, install the HEAD
                                   version, aka. main, trunk, unstable, master.

@ssbarnea
Copy link
Collaborator

I am reopening this bug because I have proof that on 4.1.1, podman machine stop returns before shutting down the VM. I seen later gtar failing because the qcow2 image was modified while it was trying to archive it.

[3](https://github.com/ansible/vscode-ansible/runs/7093503490?check_suite_focus=true#step:21:3)
/usr/local/bin/gtar: ../../../.local/share/containers/podman/machine/qemu/podman-machine-default_fedora-coreos-36.20220618.2.0-qemu.x86_64.qcow2: file changed as we read it
[4](https://github.com/ansible/vscode-ansible/runs/7093503490?check_suite_focus=true#step:21:4)
Warning: Tar failed with error: The process '/usr/local/bin/gtar' failed with exit code 1

What is the proper way to stop the machine, so we can archive it?

Is anyone is wondering, I am trying to make GHA actions/cache speedup the process of installing podman on GHA runners, as now it takes 13 minutes, 4x more than our effective test suite.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Aug 23, 2022

Looks like @ssbarnea fixed this.

@rhatdan rhatdan closed this as completed Aug 23, 2022
@ssbarnea
Copy link
Collaborator

I will find-out soon, looking at https://github.com/ansible/vscode-ansible/runs/7952766944?check_suite_focus=true -- i assume that saving cache worked, reported as archiving >2GB.

Now I only need to see that cache restore succeeds and that podman machine initialization is much faster.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 18, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants