Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC/WIP: reboot with shutdown/start not reboot #193

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

jandryuk
Copy link
Contributor

This is only partially complete. It's also on top of my Xen 4.14/4.16 work, so I included 384380a to avoid conflicts and let the other two patches apply.

Today we issue xl reboot and use a signal to synchronize between xenmgr and xl. This is needed since xenmgr issues the viptables commands which are needed for xl to talk QMP to stubdom's qemu for a successfully boot. Actually with vchan-socket-proxy providing QMP proxying, and not using argo, that particular issue is gone. But stubom still needs argo rules to connect back to helpers.

The idea here is our VMs can use on_reboot='destroy' in their config files to shutdown when rebooted. With a patch to libxl, xenmgr can notice this and start the VM again for a reboot (and leave halted for a shutdown). xl needs patching to extend the xenstore /state/$uuid/state tracking for a reboot, but then the signalling patch can be removed.

VM initiated/internal reboot works.

The external xec-vm -n VM reboot does not work. While conceptually simple, my haskell is not strong enough. Vm/Actions.hs:rebootVm is under/returns an Rpc () monad, but startVm/restartVm return XM () monads and I don't know how to mash those together.

Posting RFC to see if there is interest in pursuing this idea further. Help on the monad mismatch would be appreciated.

Set TAPDISK3_CRYPTO_KEYDIR in the environment when launching xl.  It
will remain set through the block script and into tap-ctl where it is
used.

The other approach was to have the block-tap script use xec to call back
into xenmgr.  That could work.  The VM case is easy.  The stubdom case
is more complicated since you'd have to map from the stubdom's domid
back to the VM to get the variables.  Since xenmgr has the info, just
pass it down.  This also takes care of the stubdom case since the same
xl process is exec-ing the hotplug scripts.

Help-from: Chris Rogers <[email protected]>
Signed-off-by: Jason Andryuk <[email protected]>
Avoid the need for the reboot signal dance by having xl clean up a
domain on shutdown.  XenMgr can then restart with just xl start.

Signed-off-by: Jason Andryuk <[email protected]>
Expand Xl.start to start the domain from state Rebooted in addition to
Shutdown.  This allows the restartVm path, triggered by a
domain-initiated reboot, to re-start the Vm.  Otherwise state Rebooted
is ignored and Xl.start does nothing.

This refactors the bulk of the code into a _start helper to be called
from the two cases.

Signed-off-by: Jason Andryuk <[email protected]>
@jandryuk
Copy link
Contributor Author

I have a xec-vm -n VM reboot implementation now which I'll push up soon.

jandryuk added 4 commits May 16, 2022 11:14
Replace the use of `xl reboot` by `xl shutdown` and `xl start` to
implement the vm reboot DBus command.  xl shutdown needs
on_reboot="destroy" in the xl.cfg file.  This allows us to remove the
xenmgr <-> xl signal synchronization.

Xl.shutdown & RpcAgent shutdowns both wait for the VM to shutdown,
except the `xl shutdown -c` hypercall version which seems to terminate
right quick.  The waitForState call is therefore not really implementing
a timeout out.

The corner case is if the VM has ACPI & no PV drivers, but is configured
not to react to the power button.  In that case we can't do much.  If
the VM then internally reboots, the xenmgr React code will reboot the VM
and the waitForState will eventually timeout.

writeXlConfig is removed as rebootVm -> restartVm -> _startVm
startVmInternal -> bootVm calls writeXlConfig.

Signed-off-by: Jason Andryuk <[email protected]>
There is no meaningful difference between restartVm and startVm, so
remove restartVm and the unused is_reboot boolean.

Signed-off-by: Jason Andryuk <[email protected]>
This code is no longer called, so remove it.

Signed-off-by: Jason Andryuk <[email protected]>
Sometimes the shutdown command does not work.  It seems like the longer
the VM is up, the more likely it is to not shutdown.  This is based on
Windows 10 without PV drivers.

The button press is seen by the VM to some extent because a DPMS off
(black) screen will turn back on, but shutdown is not initiated.  A
second press will trigger it - that was the intent to the xl trigger
power line added in commit 1cfb6aa "xenmgr: Add xl trigger power to
HVM shutdown".

Re-work the code so a background thread is started that will push the
power buttons* 3 times each with a 1 second delay.  This will hopefully
let the VM recognize the button press without going on for too long.

*HVMs have two power and two sleep buttons.  One is Xen emulating
buttons for HVMs, and the second is from QEMU's acpi-pm-features.patch
and ACPI changes.  xl trigger power is pushing the Xen one and the
xenstore hvm-shutdown write is triggering the QEMU one.

With forkIO, the `xl shutdown -F -ww` runs before the `xl trigger power`
commands.  xl shutdown first tries PV shutdown and then (-F) fallback to
ACPI.  So this would push the Xen ACPI button before the QEMU one in
pushPowerButton.

Signed-off-by: Jason Andryuk <[email protected]>
@jandryuk
Copy link
Contributor Author

Force pushed to remove use of system_ which is added by #194

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant