-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vm fails to resume(start) after memhotplug+managedsave+start sequence on latest devel branch #28
Comments
|
@sathnaga I wasn't able to hit this issue with the following setup: Host: cpu: Power8 Guest: Ubuntu 1704 The guest is restarted back at the same point it was before firing virsh managedsave, including the hotplugged LMBs . Can you send the logs from the VM (/var/log/libvirt/qemu/<vm_name>.log) ? It can help debugging if libvirt somehow is misbehaving in your case. |
Spotted thanks to ASAN: ==25226==ERROR: AddressSanitizer: global-buffer-overflow on address 0x556715a1f120 at pc 0x556714b6f6b1 bp 0x7ffcdfac1360 sp 0x7ffcdfac1350 READ of size 1 at 0x556715a1f120 thread T0 #0 0x556714b6f6b0 in init_disasm /home/elmarco/src/qemu/disas/s390.c:219 #1 0x556714b6fa6a in print_insn_s390 /home/elmarco/src/qemu/disas/s390.c:294 #2 0x55671484d031 in monitor_disas /home/elmarco/src/qemu/disas.c:635 #3 0x556714862ec0 in memory_dump /home/elmarco/src/qemu/monitor.c:1324 #4 0x55671486342a in hmp_memory_dump /home/elmarco/src/qemu/monitor.c:1418 #5 0x5567148670be in handle_hmp_command /home/elmarco/src/qemu/monitor.c:3109 #6 0x5567148674ed in qmp_human_monitor_command /home/elmarco/src/qemu/monitor.c:613 #7 0x556714b00918 in qmp_marshal_human_monitor_command /home/elmarco/src/qemu/build/qmp-marshal.c:1704 #8 0x556715138a3e in do_qmp_dispatch /home/elmarco/src/qemu/qapi/qmp-dispatch.c:104 #9 0x556715138f83 in qmp_dispatch /home/elmarco/src/qemu/qapi/qmp-dispatch.c:131 #10 0x55671485cf88 in handle_qmp_command /home/elmarco/src/qemu/monitor.c:3839 #11 0x55671514e80b in json_message_process_token /home/elmarco/src/qemu/qobject/json-streamer.c:105 #12 0x5567151bf2dc in json_lexer_feed_char /home/elmarco/src/qemu/qobject/json-lexer.c:323 #13 0x5567151bf827 in json_lexer_feed /home/elmarco/src/qemu/qobject/json-lexer.c:373 #14 0x55671514ee62 in json_message_parser_feed /home/elmarco/src/qemu/qobject/json-streamer.c:124 #15 0x556714854b1f in monitor_qmp_read /home/elmarco/src/qemu/monitor.c:3881 #16 0x556715045440 in qemu_chr_be_write_impl /home/elmarco/src/qemu/chardev/char.c:172 #17 0x556715047184 in qemu_chr_be_write /home/elmarco/src/qemu/chardev/char.c:184 #18 0x55671505a8e6 in tcp_chr_read /home/elmarco/src/qemu/chardev/char-socket.c:440 #19 0x5567150943c3 in qio_channel_fd_source_dispatch /home/elmarco/src/qemu/io/channel-watch.c:84 #20 0x7fb90292b90b in g_main_dispatch ../glib/gmain.c:3182 #21 0x7fb90292c7ac in g_main_context_dispatch ../glib/gmain.c:3847 #22 0x556715162eca in glib_pollfds_poll /home/elmarco/src/qemu/util/main-loop.c:214 #23 0x556715163001 in os_host_main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:261 #24 0x5567151631fa in main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:515 #25 0x556714ad6d3b in main_loop /home/elmarco/src/qemu/vl.c:1950 #26 0x556714ade329 in main /home/elmarco/src/qemu/vl.c:4865 #27 0x7fb8fe5c9009 in __libc_start_main (/lib64/libc.so.6+0x21009) #28 0x5567147af4d9 in _start (/home/elmarco/src/qemu/build/s390x-softmmu/qemu-system-s390x+0xf674d9) 0x556715a1f120 is located 32 bytes to the left of global variable 'char_hci_type_info' defined in '/home/elmarco/src/qemu/hw/bt/hci-csr.c:493:23' (0x556715a1f140) of size 104 0x556715a1f120 is located 8 bytes to the right of global variable 's390_opcodes' defined in '/home/elmarco/src/qemu/disas/s390.c:860:33' (0x556715a15280) of size 40600 This fix is based on Andreas Arnez <[email protected]> upstream commit: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commitdiff;h=9ace48f3d7d80ce09c5df60cccb433470410b11b 2014-08-19 Andreas Arnez <[email protected]> * s390-dis.c (init_disasm): Simplify initialization of opc_index[]. This also fixes an access after the last element of s390_opcodes[]. Signed-off-by: Marc-André Lureau <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
@danielhb I tried with latest levels again and able to hit the issue Host: Guest:
|
------- Comment From [email protected] 2018-02-08 10:04:39 EDT------- Host Pegas1.1 running P9 DD2.0
Guest: Pegas 1.1 kernel 4.14.0-37.el7a.ppc64le Guest XML: <domain type='kvm'> This is the output right after the VM is started: localhost login: root $ Domain dhb-memhotplug state saved by libvirt $ [root@localhost ~]# Tried about 10 times in the hopes that the problem might be intermittent. Still no luck - the VM is restarted back even after doing a hotplug. I'll try to simulate with bigger amounts of memory being hotplugged to see if there is a difference. I'll also try with a Host OS guest as the one used in the bug. |
------- Comment From [email protected] 2018-02-08 14:37:40 EDT------- I was able to reproduce the error in Satheesh's env and also in another Power8E environment I have access (also running Host OS). I am not too familiar with how virsh managedsave works - at first I thought it was a savevm call from QEMU, but savevm takes way longer than managedsave to execute. I'll see how can I reproduce the exact behavior of virsh managedsave using QEMU only. |
------- Comment From [email protected] 2018-02-13 12:44:21 EDT------- The problem is in the QEMU side. QEMU isn't setting the new HTAB size properly after the guest kernel resized it. This does not affect the guest immediately, but any loadvm operation (like the situation described here or even in a migration) will trigger the bug, making the guest unresponsive. This impacts all guests that uses HPT (i.e. everyone but P9 Radix). I've sent a fix proposal to the QEMU mailing list. |
Newer kernels have a htab resize capability when adding or remove memory. At these situations, the guest kernel might reallocate its htab to a more suitable size based on the resulting memory. However, we're not setting the new value back into the machine state when a KVM guest resizes its htab. At first this doesn't seem harmful, but when migrating or saving the guest state (via virsh managedsave, for instance) this mismatch between the htab size of QEMU and the kernel makes the guest hangs when trying to load its state. Inside h_resize_hpt_commit, the hypercall that commits the hash page resize changes, let's set spapr->htab_shift to the new value if we're sure that kvmppc_resize_hpt_commit were successful. While we're here, add a "not RADIX" sanity check as it is already done in the related hypercall h_resize_hpt_prepare. Fixes: open-power-host-os#28 Reported-by: Satheesh Rajendran <[email protected]> Signed-off-by: Daniel Henrique Barboza <[email protected]> Signed-off-by: David Gibson <[email protected]>
------- Comment From [email protected] 2018-02-16 11:25:39 EDT------- commit 9478956
I'll move this to FAT. You can test it right now by using QEMU upstream - it is not available on Host OS yet. And let me emphasize my comment #10: this affects all HPT guests running 4.14+ kernels. I think this is a good candidate to be backported to Pegas. ------- Comment From [email protected] 2018-02-21 00:22:36 EDT------- |
Tested and found working in the latest builds. 4.16.0-2.dev.gitb24758c.el7.centos.ppc64le
|
cde:info Mirrored with LTC bug https://bugzilla.linux.ibm.com/show_bug.cgi?id=161042 </cde:info>
kernel: 4.14.0-1.rc4.dev.gitb27fc5c.el7.centos.ppc64le + proposed patch for open-power-host-os/linux#24
qemu-kvm-2.10.0-3.dev.gitbf0fd83.el7.centos.ppc64le
libvirt-3.6.0-3.dev.gitdd9401b.el7.centos.ppc64le
Guest used: Centos 7.4
The text was updated successfully, but these errors were encountered: