Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make podman machine stop wait for qemu to exit #14666

Merged
merged 1 commit into from
Jun 29, 2022

Conversation

shanesmith
Copy link
Contributor

@shanesmith shanesmith commented Jun 20, 2022

After podman machine stop successfully exits the qemu process may still be running, which can cause issues, especially if an attempt is made to start the machine again.

In my trials I found the qemu process normally takes about 1s to exit after podman machine stop, which means podman machine stop && podman machine start can create problems, for example. I've also found that the qemu process could hang from time to time after an attempted stop. I'm able to reproduce this most reliably by stopping the machine while a container is running, it then takes about 90s for the qemu process to exit. This last example should be an issue on its own, but all this to say podman machine stop should only exit after qemu does.

In this PR we use the -pidfile flag when running qemu and store its path in the MachineVM config. The podman machine stop command now waits for the qemu command to exit before itself exiting. Two commits are included for a bit of separation.

A couple notes:

  • The MachineVM update requires a migration. I've simply followed what was already done, but do wonder if there might be a better way.

  • The podman machine stop does not have a timeout, it'll wait for qemu to exit indefinitely. Perhaps it should?

  • The user could interrupt (ie: CTRL-C) a podman machine stop that's taking a while, but the qemu process will still be alive.

  • We could potentially send a kill signal to the qemu process after a timeout and/or a user interrupt.

  • The pidfile can be used elsewhere in the future. For example, it could inform the state of a VM, or at least prevent a podman machine start.

Fixes #14148

Does this PR introduce a user-facing change?

The `podman machine stop` command on macOS now waits for the machine to be completely stopped.

@Luap99
Copy link
Member

Luap99 commented Jun 20, 2022

I don't think you should use migration for that. Migration is used when the json unmarshal would break due breaking changes to the VM format. I don't think this is necessary here. Why not just add the new field to the existing structure?

@shanesmith
Copy link
Contributor Author

Are you suggesting leaving the proxy pidfile path in PidFilePath and just creating a new VMPidFilePath? Definitely possible and would indeed avoid migration, but I thought the naming would be a little confusing in the future.

@Luap99
Copy link
Member

Luap99 commented Jun 20, 2022

Are you suggesting leaving the proxy pidfile path in PidFilePath and just creating a new VMPidFilePath? Definitely possible and would indeed avoid migration, but I thought the naming would be a little confusing in the future.

Yes something like this, the naming might be confusing but this can be explained in the comment.
The complex format migration logic is hard to review/test making maintenance much harder in the long run.

@shanesmith
Copy link
Contributor Author

Sure, that makes sense. =)

One last possibility, just to throw it out there, would be to keep PidFiles (or at least a new name for the proxy one), but not do the migration and essentially use proxyPidFile = vm.PidFiles.Proxy || vm.PidFilePath. That way we get better naming for the future while still supporting older VM configurations.

@benoitf
Copy link
Contributor

benoitf commented Jun 22, 2022

I'm also facing this issue as well (trying to start the machine and it says it's still running while stop command says 'stops successfully'

@shanesmith
Copy link
Contributor Author

@Luap99 Updated the PR to remove the need for migrating the config. Notes from the new commit:

  • A new MachineVM struct is not needed since the PidFiles field is no
    longer used, instead PidFilePath is kept for the proxy and a new
    VMPidFilePath is added for the VM

  • Migration is no longer needed since we're reusing the existing
    MachineVM struct

  • Machines created before this PR won't have the VM PID file configured,
    stopping these VMs will revert back to waiting on the state to change
    away from Running, plus an added 2s sleep to give time for the VM to
    exit and to avoid potential issues

  • Machines created after this PR will have a VM PID file configured and
    stopping the machine will wait indefinitely for the VM to exit

@shanesmith shanesmith force-pushed the machine-pidfile branch 2 times, most recently from 70840f2 to 05aec10 Compare June 22, 2022 14:54
@vrothberg
Copy link
Member

@ashley-cui @baude PTAL

@baude
Copy link
Member

baude commented Jun 27, 2022

LGTM

@baude
Copy link
Member

baude commented Jun 27, 2022

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 27, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: baude, shanesmith

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 27, 2022
Copy link
Member

@ashley-cui ashley-cui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you squash your commits? Other than that, LGTM, thanks @shanesmith !

"time"

"golang.org/x/sys/unix"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove blank line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@shanesmith
Copy link
Contributor Author

Could you squash your commits? Other than that, LGTM, thanks @shanesmith !

Squashed! =)

- New `VMPidFilePath` field in MachineVM config holds the path for the
  qemu PID file

- qemu is now started with the `-pidfile` argument set to `VMPidFilePath`

- Machines created before this won't have the VM PID file configured,
  stopping these VMs will revert back to waiting on the state to change
  away from `Running`, plus an added 2s sleep to give time for the VM to
  exit and to avoid potential issues

- Machines created after this will have a VM PID file configured and
  stopping the machine will wait indefinitely for the VM to exit

[NO NEW TESTS NEEDED]

Signed-off-by: Shane Smith <[email protected]>
@ssbarnea
Copy link
Collaborator

I wonder if this patch also fixed the bug described on #12815 - basically I am still trying to find a way to prevent gtar from failing after I stop the machine:

/usr/local/bin/gtar: ../../../.local/share/containers/podman/machine/qemu/podman-machine-default_fedora-coreos-36.20220618.2.0-qemu.x86_64.qcow2: file changed as we read it

@rhatdan
Copy link
Member

rhatdan commented Jun 28, 2022

/lgtm
/hold
Thanks @shanesmith

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 28, 2022
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 28, 2022
Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 29, 2022
@openshift-ci openshift-ci bot merged commit d7121b0 into containers:main Jun 29, 2022
@shanesmith
Copy link
Contributor Author

I wonder if this patch also fixed the bug described on #12815

@ssbarnea I would assume it should =)

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

podman machine stop doesn't stop the machine despite returning success
8 participants