Changed behavior: chained shutdown of netvm client VMs when upstream VM exits? #7266

brendanhoar · 2022-02-12T23:01:42Z

Qubes OS release

R4.0 updated w/ current-testing and kernel-lastest (dom0 and VMs)

Brief summary

When an upstream network-providing VM exits, client VMs that depend upon that network-providing VM exit as well.

This forced shutdown behavior is new. I had noticed it on a test R4.1rc4 + updates machine, but ran across it today on my daily driver which is R4.0.

Steps to reproduce

Start personal VM. This autostarts four other VMs: sys-net, sys-mirage-vpn-to-net, sys-vpn, sys-mirage-vms-to-vpn. The mirage VPNs are low-memory usage firewalls.

Edited:
Exiting upstream VMs (e.g. executing shutdown -h now in sys-vpn, or pause/kill on the mirage VMs) cause the chain of all Linux downstream VMs to shutdown as well. If a mirage VM is reached, it does not shut down.

Expected behavior

No other VMs should be forced to exit.

VMs should not automatically shutdown unless:

the user explicitly shuts the VM down.
the VM is a disposable VM and the invoking app exits
the user has configured idle shutdown timeouts

Actual behavior

VMs in use are unexpectedly shut down, in a chained or tree fashion, potentially causing data loss, and certainly causing annoyance.

In particular, I've been in the habit of manually shutting down the entire networking "conduit" and then restarting the networking "conduit" after template updates. And in Qubes, the networking "conduit" is often 3-4 VMs. As I noted before, this "also shutdown the networking clients" behavior is new.

Brendan

Minimalist73 · 2022-02-12T23:13:20Z

I experienced this too on 4.1.
My NetVM was stuck for some reason so I killed it with the Qubes Manager and all my qube attached to it instantly popped out. That was not doing this before and this is annoying.

marmarek · 2022-02-12T23:18:11Z

It's more likely it is a kernel panic - specifically - #7257 . There is no intentional feature like this. Please check /var/log/xen/console/guest-*.log of relevant qubes to confirm.

Minimalist73 · 2022-02-12T23:24:49Z

@marmarek Here's what I get when the qube shutdown:

[2022-02-13 00:22:49] [   32.794981] #PF: supervisor read access in kernel mode
[2022-02-13 00:22:49] [   32.794989] #PF: error_code(0x0000) - not-present page
[2022-02-13 00:22:49] [   32.794998] PGD 0 P4D 0 
[2022-02-13 00:22:49] [   32.795003] Oops: 0000 [#1] PREEMPT SMP PTI
[2022-02-13 00:22:49] [   32.795011] CPU: 3 PID: 64 Comm: xenwatch Not tainted 5.16.5-1.fc32.qubes.x86_64 #1
[2022-02-13 00:22:49] [   32.795024] RIP: 0010:free_netdev+0xa3/0x1a0
[2022-02-13 00:22:49] [   32.795037] Code: ff 48 89 df e8 1e de 00 00 48 8b 43 50 48 8b 08 48 8d b8 a0 fe ff ff 48 8d a9 a0 fe ff ff 49 39 c4 75 26 eb 47 e8 bd d4 6c ff <48> 8b 85 60 01 00 00 48 8d 95 60 01 00 00 48 89 ef 48 2d 60 01 00
[2022-02-13 00:22:49] [   32.795062] RSP: 0018:ffffc90000b3fd60 EFLAGS: 00010286
[2022-02-13 00:22:49] [   32.795070] RAX: 0000000000000000 RBX: ffff8880ed769000 RCX: 0000000000000000
[2022-02-13 00:22:49] [   32.795082] RDX: 0000000000000001 RSI: ffffc90000b3fc90 RDI: 00000000ffffffff
[2022-02-13 00:22:49] [   32.795093] RBP: fffffffffffffea0 R08: 0000000000000001 R09: 0000000000000000
[2022-02-13 00:22:49] [   32.795104] R10: 0000000000000000 R11: 0000000000000003 R12: ffff8880ed769050
[2022-02-13 00:22:49] [   32.795115] R13: ffff888006c75f88 R14: ffff888003fc2b80 R15: ffff88800855a880
[2022-02-13 00:22:49] [   32.795126] FS:  0000000000000000(0000) GS:ffff8880f5d80000(0000) knlGS:0000000000000000
[2022-02-13 00:22:49] [   32.795138] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2022-02-13 00:22:49] [   32.795147] CR2: 0000000000000000 CR3: 00000000a9d98004 CR4: 00000000003706e0
[2022-02-13 00:22:49] [   32.795159] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[2022-02-13 00:22:49] [   32.795170] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[2022-02-13 00:22:49] [   32.795181] Call Trace:
[2022-02-13 00:22:49] [   32.795186]  <TASK>
[2022-02-13 00:22:49] [   32.795193]  xennet_remove+0x65/0x80 [xen_netfront]
[2022-02-13 00:22:49] [   32.795204]  xenbus_dev_remove+0x6d/0xf0
[2022-02-13 00:22:49] [   32.795213]  __device_release_driver+0x17a/0x240
[2022-02-13 00:22:49] [   32.795223]  device_release_driver+0x24/0x30
[2022-02-13 00:22:49] [   32.795232]  bus_remove_device+0xd8/0x140
[2022-02-13 00:22:49] [   32.795239]  device_del+0x18b/0x410
[2022-02-13 00:22:49] [   32.795246]  ? _raw_spin_unlock+0x16/0x30
[2022-02-13 00:22:49] [   32.795254]  ? klist_iter_exit+0x14/0x20
[2022-02-13 00:22:49] [   32.795262]  device_unregister+0x13/0x60
[2022-02-13 00:22:49] [   32.795268]  xenbus_dev_changed+0x18e/0x1f0
[2022-02-13 00:22:49] [   32.795276]  xenwatch_thread+0xc0/0x1a0
[2022-02-13 00:22:49] [   32.795284]  ? do_wait_intr_irq+0xa0/0xa0
[2022-02-13 00:22:49] [   32.795291]  ? read_reply+0x160/0x160
[2022-02-13 00:22:49] [   32.795298]  kthread+0x158/0x180
[2022-02-13 00:22:49] [   32.795306]  ? set_kthread_struct+0x40/0x40
[2022-02-13 00:22:49] [   32.795313]  ret_from_fork+0x22/0x30
[2022-02-13 00:22:49] [   32.795322]  </TASK>
[2022-02-13 00:22:49] [   32.795326] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device rfkill ipt_REJECT nf_reject_ipv4 xt_state xt_conntrack xenfs nft_counter nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables nfnetlink binfmt_misc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel xen_netfront snd_pcm snd_timer snd soundcore pcspkr xen_privcmd xen_gntdev xen_gntalloc xen_blkback xen_evtchn parport_pc ppdev lp parport drm fuse sunrpc bpf_preload ip_tables overlay xen_blkfront
[2022-02-13 00:22:49] [   32.795412] CR2: 0000000000000000
[2022-02-13 00:22:49] [   32.795420] ---[ end trace b594eee2680b0682 ]---
[2022-02-13 00:22:49] [   32.795428] RIP: 0010:free_netdev+0xa3/0x1a0
[2022-02-13 00:22:49] [   32.795437] Code: ff 48 89 df e8 1e de 00 00 48 8b 43 50 48 8b 08 48 8d b8 a0 fe ff ff 48 8d a9 a0 fe ff ff 49 39 c4 75 26 eb 47 e8 bd d4 6c ff <48> 8b 85 60 01 00 00 48 8d 95 60 01 00 00 48 89 ef 48 2d 60 01 00
[2022-02-13 00:22:49] [   32.795462] RSP: 0018:ffffc90000b3fd60 EFLAGS: 00010286
[2022-02-13 00:22:49] [   32.795470] RAX: 0000000000000000 RBX: ffff8880ed769000 RCX: 0000000000000000
[2022-02-13 00:22:49] [   32.795482] RDX: 0000000000000001 RSI: ffffc90000b3fc90 RDI: 00000000ffffffff
[2022-02-13 00:22:49] [   32.795493] RBP: fffffffffffffea0 R08: 0000000000000001 R09: 0000000000000000
[2022-02-13 00:22:49] [   32.795504] R10: 0000000000000000 R11: 0000000000000003 R12: ffff8880ed769050
[2022-02-13 00:22:49] [   32.795515] R13: ffff888006c75f88 R14: ffff888003fc2b80 R15: ffff88800855a880
[2022-02-13 00:22:49] [   32.795526] FS:  0000000000000000(0000) GS:ffff8880f5d80000(0000) knlGS:0000000000000000
[2022-02-13 00:22:49] [   32.795537] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2022-02-13 00:22:49] [   32.795547] CR2: 0000000000000000 CR3: 00000000a9d98004 CR4: 00000000003706e0
[2022-02-13 00:22:49] [   32.795558] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[2022-02-13 00:22:49] [   32.795569] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[2022-02-13 00:22:49] [   32.795580] Kernel panic - not syncing: Fatal exception
[2022-02-13 00:22:49] [   32.795621] Kernel Offset: disabled

brendanhoar · 2022-02-13T01:44:41Z

Ah kernel panic, that makes more sense. I suspect that if I go back and retest I'll find the mirage VMs end up being, ahem, firewalls in the shutdown chain. If so I'll update the OP.

Brendan

unman · 2022-02-13T03:09:21Z

I've not seen this on any of my systems: mainly thinkpads.

brendanhoar · 2022-02-13T03:15:35Z

Thinkpad W520:
Confirmed as repeatable under R4.0 with VM kernel 5.16.5-1.fc25.
Confirmed as non-repeatable under R4.0 with VM kernel 5.15.14-1.fc25.

GPD Pocket 3:
From memory, I saw the issue several times on this, an R4.1 system. VMs were also running 5.16.5-1 as well.

Agreed that this is likely a duplicate of: #7257

B

andrewdavidwong · 2022-02-13T07:23:24Z

This appears to be a duplicate of an existing issue. If so, please comment on the appropriate existing issue instead. If anyone believes this is not really a duplicate, please leave a comment briefly explaining why. We'll be happy to take another look and, if appropriate, reopen this issue. Thank you.

brendanhoar added P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. labels Feb 12, 2022

andrewdavidwong added C: kernel needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. labels Feb 13, 2022

andrewdavidwong added this to the Release 4.0 updates milestone Feb 13, 2022

andrewdavidwong added R: duplicate Resolution: Another issue exists that is very similar to or subsumes this one. and removed needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. labels Feb 13, 2022

andrewdavidwong closed this as completed Feb 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changed behavior: chained shutdown of netvm client VMs when upstream VM exits? #7266

Changed behavior: chained shutdown of netvm client VMs when upstream VM exits? #7266

brendanhoar commented Feb 12, 2022 •

edited

Loading

Minimalist73 commented Feb 12, 2022

marmarek commented Feb 12, 2022

Minimalist73 commented Feb 12, 2022

brendanhoar commented Feb 13, 2022

unman commented Feb 13, 2022 via email

brendanhoar commented Feb 13, 2022 •

edited

Loading

andrewdavidwong commented Feb 13, 2022

Changed behavior: chained shutdown of netvm client VMs when upstream VM exits? #7266

Changed behavior: chained shutdown of netvm client VMs when upstream VM exits? #7266

Comments

brendanhoar commented Feb 12, 2022 • edited Loading

Qubes OS release

Brief summary

Steps to reproduce

Expected behavior

Actual behavior

Minimalist73 commented Feb 12, 2022

marmarek commented Feb 12, 2022

Minimalist73 commented Feb 12, 2022

brendanhoar commented Feb 13, 2022

unman commented Feb 13, 2022 via email

brendanhoar commented Feb 13, 2022 • edited Loading

andrewdavidwong commented Feb 13, 2022

brendanhoar commented Feb 12, 2022 •

edited

Loading

brendanhoar commented Feb 13, 2022 •

edited

Loading