Skip to content

Commit

Permalink
Drivers: hv: vmbus: hibernation: do not hang forever in vmbus_bus_res…
Browse files Browse the repository at this point in the history
…ume()

After we Stop and later Start a VM that uses Accelerated Networking (NIC
SR-IOV), currently the VF vmbus device's Instance GUID can change, so after
vmbus_bus_resume() -> vmbus_request_offers(), vmbus_onoffer() can not find
the original vmbus channel of the VF, and hence we can't complete()
vmbus_connection.ready_for_resume_event in check_ready_for_resume_event(),
and the VM hangs in vmbus_bus_resume() forever.

Fix the issue by adding a timeout, so the resuming can still succeed, and
the saved state is not lost, and according to my test, the user can disable
Accelerated Networking and then will be able to SSH into the VM for
further recovery. Also prevent the VM in question from suspending again.

The host will be fixed so in future the Instance GUID will stay the same
across hibernation.

Fixes: d8bd2d4 ("Drivers: hv: vmbus: Resume after fixing up old primary channels")
Signed-off-by: Dexuan Cui <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
  • Loading branch information
dcui authored and liuw committed Sep 9, 2020
1 parent b46b4a8 commit 19873ee
Showing 1 changed file with 7 additions and 2 deletions.
9 changes: 7 additions & 2 deletions drivers/hv/vmbus_drv.c
Original file line number Diff line number Diff line change
Expand Up @@ -2387,7 +2387,10 @@ static int vmbus_bus_suspend(struct device *dev)
if (atomic_read(&vmbus_connection.nr_chan_close_on_suspend) > 0)
wait_for_completion(&vmbus_connection.ready_for_suspend_event);

WARN_ON(atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) != 0);
if (atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) != 0) {
pr_err("Can not suspend due to a previous failed resuming\n");
return -EBUSY;
}

mutex_lock(&vmbus_connection.channel_mutex);

Expand Down Expand Up @@ -2463,7 +2466,9 @@ static int vmbus_bus_resume(struct device *dev)

vmbus_request_offers();

wait_for_completion(&vmbus_connection.ready_for_resume_event);
if (wait_for_completion_timeout(
&vmbus_connection.ready_for_resume_event, 10 * HZ) == 0)
pr_err("Some vmbus device is missing after suspending?\n");

/* Reset the event for the next suspend. */
reinit_completion(&vmbus_connection.ready_for_suspend_event);
Expand Down

0 comments on commit 19873ee

Please sign in to comment.