Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to QEMU 9.0 including IGVM v6 patch series + direct VMSA #16

Open
wants to merge 8,456 commits into
base: master
Choose a base branch
from

Conversation

roy-hopkins
Copy link

Overview

This branch is based on QEMU master commit a5dd9ee with the IGVM patch series v6 applied. Also, the top commit provides a modification to the handling of IGVM VMSA directives to directly set the VMSA in KVM via the KVM_SEV_SNP_LAUNCH_UPDATE ioctl.

Compatibility

This version of QEMU works with the 6.11 kernel with SVSM kernel patches applied: https://github.com/roy-hopkins/linux/tree/svsm_vmpl_vmsa_restinj. See PR [TODO].

Reason for direct VMSA update

The patch that updates VMSA handling can be dropped from this branch and SVSM will still work correctly. However, due to the way the VMSA is handled for each vCPU in KVM this will result in the launch measurement never matching the pre-calculated launch measurement of the IGVM file. The previous SVSM kernel included a sev_feature flag that indicated use of an SVSM which then changed this behaviour to get the measurement to match but this cannot be supported anymore.

New Command line

The QEMU command line to launch a guest with SVSM has changed since the previous version. In particular init-flags is not supported anymore (which is how we used to indicate an SVSM was present) and IGVM now has its own object:

...
-machine q35,confidential-guest-support=sev0,memory-backend=mem0,igvm-cfg=igvm0 \
-object memory-backend-memfd,size=8G,id=mem0,share=true,prealloc=false,reserve=false \
-object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1 \
-object igvm-cfg,id=igvm0,file=coconut-qemu.igvm \
...

ribalda and others added 30 commits September 11, 2024 09:46
Signed-off-by: Ricardo Ribalda <[email protected]>
Message-Id: <[email protected]>
Acked-by: Igor Mammedov <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
This allows the vhost_net device which has multiple virtqueues to batch
the setup of all its host notifiers. This significantly reduces the
vhost_net device starting and stoping time, e.g. the time spend
on enabling notifiers reduce from 630ms to 75ms and the time spend on
disabling notifiers reduce from 441ms to 45ms for a VM with 192 vCPUs
and 15 vhost-user-net devices (64vq per device) in our case.

Signed-off-by: zuoboqun <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Now virtio_address_space_lookup only lookup common/isr/device/notify
MR and exclude their subregions.

When VHOST_USER_PROTOCOL_F_HOST_NOTIFIER enable, the notify MR has
host-notifier subregions and we need use host-notifier MR to
notify the hardware accelerator directly instead of eventfd notify.

Further more, maybe common/isr/device MR also has subregions in
the future, so need memory_region_find for each MR incluing
their subregions.

Add lookup subregion of VirtIOPCIRegion MR instead of only lookup container MR.

Fixes: a93c8d8 ("virtio-pci: Replace modern_as with direct access to modern_bar")
Co-developed-by: Zuo Boqun <[email protected]>
Signed-off-by: Gao Shiyuan <[email protected]>
Signed-off-by: Zuo Boqun <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
When using the mailbox command get scan media results, the scan media
restart physical address field in the ouput palyload is not 64-byte
aligned.

This patch removed the error source of the restart physical address.

The Scan Media Restart Physical Address is the location from which the
host should restart the Scan Media operation. [5:0] bits are reserved.
Refer to CXL spec r3.1 Table 8-146

Fixes: 89b5cfc ("hw/cxl: Add get scan media results cmd support")
Reviewed-by: Jonathan Cameron <[email protected]>
Link: https://lore.kernel.org/linux-cxl/[email protected]/
Signed-off-by: peng guo <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Currently, the guest may write to the device configuration space,
whereas the virtio sound device specification in chapter 5.14.4
clearly states that the fields in the device configuration space
are driver-read-only.

Remove the set_config function from the virtio_snd class.

This also prevents a heap buffer overflow. See QEMU issue #2296.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2296
Signed-off-by: Volker Rümelin <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
If the config directory in sysfs does not exist at all, we are dealing
with a system that does not support THPs. Simply use 1 MiB block size
then, instead of warning "Could not detect THP size, falling back to
..." and falling back to the default THP size.

Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Gavin Shan <[email protected]>
Cc: Juraj Marcin <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
This patch implements the periodic and the swsmi ICH9 chipset timers. They are
especially useful when prototyping UEFI firmware (e.g. with EDK2's OVMF)
using QEMU.

For backwards compatibility, the compat properties "x-smi-swsmi-timer",
and "x-smi-periodic-timer" are introduced.

Additionally, writes to the SMI_STS register are enabled for the
corresponding two bits using a write mask to make future work easier.

Signed-off-by: Dominic Prinz <[email protected]>
Message-Id: <1d90ea69e01ab71a0f2ced116801dc78e04f4448.1725991505.git.git@dprinz.de>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
…into staging

* Split --enable-sanitizers to --enable-{asan, ubsan}
* Build MSYS2 job using multiple CPUs
* Fix "make distclean" wrt contrib/plugins/
* Convert more Avocado tests to plain standalone functional tests
* Fix bug that breaks "make check-functional" when tesseract is missing
* Use builtin hashlib of Python in the functional tests
* Update the FreeBSD CI jobs to 14.1

# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmbhY4YRHHRodXRoQHJl
# ZGhhdC5jb20ACgkQLtnXdP5wLbU/aw/9HXl9H8BUDn8lnoEmxuuQSk8F19n/l5pt
# en3L8pMBt4dGFe/9KaGes2GFfid+cp2zlx+qQhA4HW35ntMJorF/qinOH/JGDtoM
# 3O6RGZrQPn60zD9P2EbFVCrVYysVYCEu0U3Uglj6tf33bE0L7SJsQxqcbIciyIj5
# aq3Te0yMM2lqzCdMqNpWHGn3VMZRvbRaGBPDU4RLP8V2Bpz1iiRE+6HCH9Kg7HzS
# OmleeXtvcyInG+54onjfTcn4/XA27pl1UU04KFv5PrRPB3M2FspHn7oOT2yyQ+ls
# 79mqIcd8PvycCT+3ch9p8KhVtbVBgZGmeemALLvk5FxysaWnl4KtSqmQNdqSvvpV
# waDDKlLaSnjEHDUse3bCJX0m4d7/vTBY5fOYxqZ4z5dl63csDtgPY4/VF4XR08sP
# tR1mW+2qEH9eygsxuKcBjx/j7Etpy+jL9pX2ii1V3ElhjjYuEnpEiURa+TaqPjpZ
# jmPtBEszzUdPbrD707tDkW3/ezT7VAnASQeYneJXB/JQG6K6Z//05iX6oCzCbRm3
# ceW/fem3UaeGYpzbMdoZToTuNlXEyS7NDcr39xJjH4LyRTPJAX4zeqUEdzces9g/
# u4Dw6rJ0Yhj4rscKxRvGl3/BH6CTI+8IAsbju2B/CnVLTqaABB0q/MDB90aB44xX
# bAVsl4P03Uk=
# =5TR0
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 11 Sep 2024 10:31:50 BST
# gpg:                using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Thomas Huth <[email protected]>" [full]
# gpg:                 aka "Thomas Huth <[email protected]>" [full]
# gpg:                 aka "Thomas Huth <[email protected]>" [full]
# gpg:                 aka "Thomas Huth <[email protected]>" [unknown]
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3  EAB9 2ED9 D774 FE70 2DB5

* tag 'pull-request-2024-09-11' of https://gitlab.com/thuth/qemu: (24 commits)
  Update FreeBSD CI jobs FreeBSD 14.1
  tests/functional/qemu_test: Use Python hashlib instead of external programs
  tests/functional: Fix bad usage of has_cmd
  tests/functional: Convert the multiprocess avocado test into a standalone test
  tests/functional: Convert the or1k-sim Avocado test
  tests/functional: Convert the m68k MCF5208EVB Avocado test
  tests/functional: Convert the Alpha Clipper Avocado test
  tests/functional: Convert Aarch64 Raspi4 avocado tests
  tests/functional: Convert Aarch64 Raspi3 avocado tests
  tests/functional: Convert ARM Raspi2 avocado tests
  tests/functional: Convert mips32eb 4Kc Malta avocado tests
  tests/functional: Convert nanomips Malta avocado tests
  tests/functional: Convert mips32el Malta YAMON avocado test
  tests/functional: Convert mips64el 5KEc Malta avocado tests
  tests/functional: Convert mips64el I6400 Malta avocado tests
  tests/functional: Convert mips64el Fuloong2e avocado test (2/2)
  tests/functional: Convert the m68k Q800 Avocado test into a functional test
  tests/functional: Add the LinuxKernelTest for testing the Linux boot process
  MAINTAINERS: Remove myself from the Meson section
  MAINTAINERS: Remove myself as reviewer
  ...

Signed-off-by: Peter Maydell <[email protected]>
Add support for, and migrate, a single-entry fp
instruction queue for sparc32.

Signed-off-by: Carl Hauser <[email protected]>
[rth: Split from a larger patch;
      adjust representation with union;
      add migration state]
Signed-off-by: Richard Henderson <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Acked-by: Mark Cave-Ayland <[email protected]>
Tested-by: Carl Hauser <[email protected]>
Implement a single instruction floating point queue,
populated while delivering an fp exception.

Signed-off-by: Carl Hauser <[email protected]>
[rth: Split from a larger patch]
Signed-off-by: Richard Henderson <[email protected]>
Acked-by: Mark Cave-Ayland <[email protected]>
Tested-by: Carl Hauser <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Acked-by: Mark Cave-Ayland <[email protected]>
Tested-by: Carl Hauser <[email protected]>
Invalid encoding of addr should raise TT_ILL_INSN, so
check before supervisor, which might raise TT_PRIV_INSN.
Clear QNE after execution.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2340
Signed-off-by: Richard Henderson <[email protected]>
Acked-by: Mark Cave-Ayland <[email protected]>
Tested-by: Carl Hauser <[email protected]>
Model fp_exception state, in which only fp stores are allowed
until such time as the FQ has been flushed.

Signed-off-by: Richard Henderson <[email protected]>
Acked-by: Mark Cave-Ayland <[email protected]>
Tested-by: Carl Hauser <[email protected]>
With edk2-stable202408 LoongArch UEFI bios, CSR PGD register is set only
if its value is equal to zero for boot cpu, it causes reboot issue. Since
CSR PGD register is changed with linux kernel, UEFI BIOS cannot use it.

Add workaround to clear CSR registers relative with TLB in function
loongarch_cpu_reset_hold(), so that VM can reboot with edk2-stable202408
UEFI bios.

Signed-off-by: Bibo Mao <[email protected]>
Reviewed-by: Song Gao <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Song Gao <[email protected]>
For virtio VGA deivce libvirt will select VIRTIO_VGA firstly rather than
VIRTIO_GPU, VIRTIO_VGA device supports frame buffer however it requires
legacy VGA compatible support. Frame buffer area 0xa0000 -- 0xc0000
conflicts with low memory area 0 -- 0x10000000.

Here remove default support for VIRTIO_VGA device, VIRTIO_GPU is prefered
on LoongArch system. For frame buffer video card support, standard VGA can
be used.

Signed-off-by: Bibo Mao <[email protected]>
Reviewed-by: Song Gao <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Song Gao <[email protected]>
KVM provides interface KVM_REG_LOONGARCH_VCPU_RESET to reset vCPU,
it can be used to clear internal state about kvm kernel. vCPU reset
function is added here for kvm mode.

Signed-off-by: Bibo Mao <[email protected]>
Reviewed-by: Song Gao <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Song Gao <[email protected]>
Add the support needed for creating prstatus elf notes. This allows
us to use QMP dump-guest-memory.

Now ELF notes of LoongArch only supports general elf notes, LSX and
LASX is not supported, since it is mainly used to dump guest memory.

Signed-off-by: Bibo Mao <[email protected]>
Reviewed-by: Song Gao <[email protected]>
Tested-by: Song Gao <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Song Gao <[email protected]>
In order to support additional channels of communication using
`-serial`, add several serial ports, up to the standard 4 generally
supported by the 8250 driver.

Fixed: https://lore.kernel.org/all/[email protected]/

Signed-off-by: Jason A. Donenfeld <[email protected]>
Tested-by: Bibo Mao <[email protected]>
[gaosong: ACPI uart need't reverse order]
Signed-off-by: Song Gao <[email protected]>
Message-Id: <[email protected]>
If the FDT contains /chosen/rng-seed, then the Linux RNG will use it to
initialize early. Set this using the usual guest random number
generation function.

This is the same procedure that's done in b91b6b5 ("hw/microblaze:
pass random seed to fdt"), e4b4f0b ("hw/riscv: virt: pass random seed
to fdt"), c6fe3e6 ("hw/openrisc: virt: pass random seed to fdt"),
67f7e42 ("hw/i386: pass RNG seed via setup_data entry"), c287941
("hw/rx: pass random seed to fdt"), 5e19cc6 ("hw/mips: boston: pass
random seed to fdt"), 6b23a67 ("hw/nios2: virt: pass random seed to fdt")
c4b0753 ("hw/ppc: pass random seed to fdt"), and 5242876
("hw/arm/virt: dt: add rng-seed property").

These earlier commits later were amended to rerandomize the RNG seed on
snapshot load, but the LoongArch code somehow already does that, despite
not having this patch here, presumably due to some lucky copy and
pasting.

Signed-off-by: Jason A. Donenfeld <[email protected]>
Reviewed-by: Song Gao <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Song Gao <[email protected]>
Serial port console redirection table can be used for default serial
port selection, like chosen stdout-path selection with FDT method.

With acpi SPCR table added, early debug console can be parsed from
SPCR table with simple kernel parameter earlycon rather than
earlycon=uart,mmio,0x1fe001e0

Signed-off-by: Bibo Mao <[email protected]>
Reviewed-by: Song Gao <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Song Gao <[email protected]>
…st/qemu into staging

virtio,pc,pci: features, fixes, cleanups

i286 acpi speedup by precomputing _PRT by Ricardo Ribalda
vhost_net speedup by using MR transactions by Zuo Boqun
ich9 gained support for periodic and swsmi timer by Dominic Prinz

Fixes, cleanups all over the place.

Signed-off-by: Michael S. Tsirkin <[email protected]>

# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmbhoCUPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRptpUH/iR5AmJFpvAItqlPOvJiYDEch46C73tyrSws
# Kk/1EbGSL7mFFD5sfdSSV4Rw8CQBsmM/Dt5VDkJKsWnOLjkBQ2CYH0MYHktnrKcJ
# LlSk32HnY5p1DsXnJhgm5M7St8T3mV/oFdJCJAFgCmpx5uT8IRLrKETN8+30OaiY
# xo35xAKOAS296+xsWeVubKkMq7H4y2tdZLE/22gb8rlA8d96BJIeVLQ3y3IjeUPR
# 24q6c7zpObzGhYNZ/PzAKOn+YcVsV/lLAzKRZJTzTUPyG24BcjJTyyr/zNSYAgfk
# lLXzIZID3GThBmrCAiDZ1z6sfo3MRg2wNS/FBXtK6fPIuFxed+8=
# =ySRy
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 11 Sep 2024 14:50:29 BST
# gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Michael S. Tsirkin <[email protected]>" [full]
# gpg:                 aka "Michael S. Tsirkin <[email protected]>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu:
  hw/acpi/ich9: Add periodic and swsmi timer
  virtio-mem: don't warn about THP sizes on a kernel without THP support
  hw/audio/virtio-sound: fix heap buffer overflow
  hw/cxl: fix physical address field in get scan media results output
  virtio-pci: Add lookup subregion of VirtIOPCIRegion MR
  vhost_net: configure all host notifiers in a single MR transaction
  tests/acpi: pc: update golden masters for DSDT
  hw/i386/acpi-build: Return a pre-computed _PRT table
  tests/acpi: pc: allow DSDT acpi table changes
  intel_iommu: Make PASID-cache and PIOTLB type invalid in legacy mode
  intel_iommu: Fix invalidation descriptor type field
  virtio: rename virtio_split_packed_update_used_idx
  hw/pci/pci-hmp-cmds: Avoid displaying bogus size in 'info pci'
  pci: don't skip function 0 occupancy verification for devfn auto assign
  hw/isa/vt82c686.c: Embed i8259 irq in device state instead of allocating
  hw: Move declaration of IRQState to header and add init function
  virtio: Always reset vhost devices
  virtio: Allow .get_vhost() without vhost_started

Signed-off-by: Peter Maydell <[email protected]>
…cross-i686-tci

The cross-i686-tci CI job is persistently flaky with various tests
hitting timeouts.  One theory for why this is happening is that we're
running too many tests in parallel and so sometimes a test gets
starved of CPU and isn't able to complete within the timeout.

(The environment this CI job runs in seems to cause us to default
to a parallelism of 9 in the main CI.)

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Message-id: [email protected]
…to staging

target/sparc: Implement single entry FP Queue

# -----BEGIN PGP SIGNATURE-----
#
# iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmbifAAdHHJpY2hhcmQu
# aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV+XAwgAlj//8JuNoRB/2hi0
# gU3Ifjrs+r+AZrcsG7pTOmYTZa6cYqJX4XsYoNq1S4FHky239vNKPQOQEadkmLGv
# wKH0fBjzvydOKRfrhEK2VLlhMyhGyuv59psfCCUB5HZEiueSHFFAvfjUtKNpjzRT
# KE2fwL6iKK3IXeKC6ynq0bkC/OymnLUYSgSslA6C1x1sReNz5Y6ZsGUEZRwODY4f
# q6s6JS2aBn1L9nJTzwXH/J5Ue8iix53d6EZ42QHqqwzRvAWHtfFqoMLc9P6Dg8P7
# FmiwHAErwr7Pj5cqcnl2C0zTp3LXg5xXpTJysi8CFJvCsObNRh9gL15W3xy9qBFX
# 2WfqWQ==
# =kxM7
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 12 Sep 2024 06:28:32 BST
# gpg:                using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Richard Henderson <[email protected]>" [full]
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F

* tag 'pull-sparc-20240911' of https://gitlab.com/rth7680/qemu:
  target/sparc: Add gen_trap_if_nofpu_fpexception
  target/sparc: Implement STDFQ
  target/sparc: Add FSR_QNE to tb_flags
  target/sparc: Populate sparc32 FQ when raising fp exception
  target/sparc: Add FQ and FSR.QNE

Signed-off-by: Peter Maydell <[email protected]>
…into staging

pull-loongarch-20240912

# -----BEGIN PGP SIGNATURE-----
#
# iLMEAAEKAB0WIQS4/x2g0v3LLaCcbCxAov/yOSY+3wUCZuLmLgAKCRBAov/yOSY+
# 38JNA/9UdorT4a7H+H5PhNeEu2EHDgMPb7+gxyYKw03mOG2MB3KFzkK0LRQShaPt
# ADJmIqAFlc9SJLkbo6ELMDl+ZnUU9OdC/P6YU5iBG71zx1PonMwuyJTWhlBwxWcG
# +OB8aDBUALoe/Gb4za152I84cR08g58TgLnXNfEkCM8lnPfAug==
# =Plwu
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 12 Sep 2024 14:01:34 BST
# gpg:                using RSA key B8FF1DA0D2FDCB2DA09C6C2C40A2FFF239263EDF
# gpg: Good signature from "Song Gao <[email protected]>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: B8FF 1DA0 D2FD CB2D A09C  6C2C 40A2 FFF2 3926 3EDF

* tag 'pull-loongarch-20240912' of https://gitlab.com/gaosong/qemu:
  hw/loongarch: Add acpi SPCR table support
  hw/loongarch: virt: pass random seed to fdt
  hw/loongarch: virt: support up to 4 serial ports
  target/loongarch: Support QMP dump-guest-memory
  target/loongarch/kvm: Add vCPU reset function
  hw/loongarch: Remove default enable with VIRTIO_VGA device
  target/loongarch: Add compatible support about VM reboot

Signed-off-by: Peter Maydell <[email protected]>
Convert the TYPE_CCW_DEVICE to three-phase reset. This is a
device class which is subclassed, so it needs to be three-phase
before we can convert the subclass.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Nina Schoetterl-Glausch <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Acked-by: Thomas Huth <[email protected]>
Message-id: [email protected]
Convert the virtio-ccw code to three-phase reset.  This allows us to
remove a call to device_class_set_parent_reset(), replacing it with
the three-phase equivalent resettable_class_set_parent_phases().
Removing all the device_class_set_parent_reset() uses will allow us
to remove some of the glue code that interworks between three-phase
and legacy reset.

This is a simple conversion, with no behavioural changes.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Nina Schoetterl-Glausch <[email protected]>
Acked-by: Thomas Huth <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: [email protected]
Convert the s390 CPU to the Resettable interface.  This is slightly
more involved than the other CPU types were (see commits
9130cad..d66e64d) because S390 has its own set of
different kinds of reset with different behaviours that it needs to
trigger.

We handle this by adding these reset types to the Resettable
ResetType enum.  Now instead of having an underlying implementation
of reset that is s390-specific and which might be called either
directly or via the DeviceClass::reset method, we can implement only
the Resettable hold phase method, and have the places that need to
trigger an s390-specific reset type do so by calling
resettable_reset().

The other option would have been to smuggle in the s390 reset
type via, for instance, a field in the CPU state that we set
in s390_do_cpu_initial_reset() etc and then examined in the
reset method, but doing it this way seems cleaner.

The motivation for this change is that this is the last caller
of the legacy device_class_set_parent_reset() function, and
removing that will let us clean up some glue code that we added
for the transition to three-phase reset.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Nina Schoetterl-Glausch <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Acked-by: Thomas Huth <[email protected]>
Message-id: [email protected]
There are no callers of device_class_set_parent_reset() left in the tree,
as they've all been converted to use three-phase reset and the
corresponding resettable_class_set_parent_phases() function.
Remove device_class_set_parent_reset().

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: [email protected]
The Alpha and HPPA CPU class structs include a 'parent_reset'
field which is never used; delete them.

(These targets don't seem to implement reset at all; if they did they
should do it using the three-phase reset mechanism, which uses a
'ResettablePhases parent_phases' field instead of the old
'DeviceReset parent_reset' field.)

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: [email protected]
Define a device_class_set_legacy_reset() function which
sets the DeviceClass::reset field. This serves two purposes:
 * it makes it clearer to the person writing code that
   DeviceClass::reset is now legacy and they should look for
   the new alternative (which is Resettable)
 * it makes it easier to rename the reset field (which in turn
   makes it easier to find places that call it)

The Coccinelle script can be used to automatically convert code that
was doing an open-coded assignment to DeviceClass::reset to call
device_class_set_legacy_reset() instead.

Signed-off-by: Peter Maydell <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-id: [email protected]
stsquad and others added 28 commits September 19, 2024 15:58
This is useful information when debugging memory issues so lets
improve by:

  - include the ptr address for u8 fills (like the others)
  - indicate the number of operations for reads and writes
  - explicitly note when we are flushing
  - move the fill printf to after the reset

Reviewed-by: Pierrick Bouvier <[email protected]>
Signed-off-by: Alex Bennée <[email protected]>
Message-Id: <[email protected]>
While the compilers will generally happily synthesise a 64 bit value
for you on 32 bit systems it doesn't exercise anything on QEMU. It
also makes it hard to accurately compare the accesses to test_data
when instrumenting.

Reviewed-by: Pierrick Bouvier <[email protected]>
Signed-off-by: Alex Bennée <[email protected]>
Message-Id: <[email protected]>
The multiarch system tests output serial data which should be
redirected to the "output" chardev rather than echoed to the console.

Comment the use of EXTFLAGS variable while we are at it.

Acked-by: Ilya Leoshkevich <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Signed-off-by: Alex Bennée <[email protected]>
Message-Id: <[email protected]>
At first I thought I could compile the user-mode test for system mode
however we already have a fairly comprehensive test case for system
mode in "memory" so lets use that.

As tracking every access will quickly build up with "print-access" we
add a new mode to track groups of reads and writes to regions. Because
the test_data is 16k aligned we can be sure all accesses to it are
ones we can count.

First we extend the test to report where the test_data region is. Then
we expand the pdot() function to track the total number of reads and
writes to the region. We have to add some addition pdot() calls to
take into account multiple reads/writes in the test loops.

Finally we add a python script to integrate the data from the plugin
and the output of the test and validate they both agree on the total
counts. As some boot codes clear the bss we also add a flag to add a
regions worth of writes to the expected total.

Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Pierrick Bouvier <[email protected]>
Message-Id: <[email protected]>
When we shut down a guest we disable the timers. However this can
cause deadlock if the guest has queued some async work that is trying
to advance system time and spins forever trying to wind time forward.
Pay attention to the return code and bail early if we can't wind time
forward.

Reported-by: Elisha Hollander <[email protected]>
Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Pierrick Bouvier <[email protected]>
Message-Id: <[email protected]>
SimPoint is a widely used tool to find the ideal microarchitecture
simulation points so Valgrind[2] and Pin[3] support generating basic
block vectors for use with them. Let's add a corresponding plugin to
QEMU too.

Note that this plugin has a different goal with tests/plugin/bb.c.

This plugin creates a vector for each constant interval instead of
counting the execution of basic blocks for the entire run and able to
describe the change of execution behavior. Its output is also
syntactically simple and better suited for parsing, while the output of
tests/plugin/bb.c is more human-readable.

[1] https://cseweb.ucsd.edu/~calder/simpoint/
[2] https://valgrind.org/docs/manual/bbv-manual.html
[3] https://www.intel.com/content/www/us/en/developer/articles/tool/pin-a-dynamic-binary-instrumentation-tool.html

Signed-off-by: Yotaro Nada <[email protected]>
Signed-off-by: Akihiko Odaki <[email protected]>
Reviewed-by: Pierrick Bouvier <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Alex Bennée <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Rowan Hart <[email protected]>
Reviewed-by: Pierrick Bouvier <[email protected]>
Message-Id: <[email protected]>
[AJB: tweaked cpu_memory_rw_debug call]
Signed-off-by: Alex Bennée <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Rowan Hart <[email protected]>
Reviewed-by: Pierrick Bouvier <[email protected]>
Tested-by: Pierrick Bouvier <[email protected]>
Message-Id: <[email protected]>
[AJB: tweak fmt string for vaddr]
Signed-off-by: Alex Bennée <[email protected]>
Message-Id: <[email protected]>
Although we asks for instructions per second we work in quanta and
that cannot be 0. Fail to load the plugin instead and report the
minimum IPS we can handle.

Reported-by: Elisha Hollander <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Alex Bennée <[email protected]>
Reviewed-by: Pierrick Bouvier <[email protected]>
Message-Id: <[email protected]>
…quad/qemu into staging

TCG plugin memory instrumentation updates

  - deprecate plugins on 32 bit hosts
  - deprecate plugins with TCI
  - extend memory API to save value
  - add check-tcg tests to exercise new memory API
  - fix timer deadlock with non-changing timer
  - add basic block vector plugin to contrib
  - add cflow plugin to contrib
  - extend syscall plugin to dump write memory
  - validate ips plugin arguments meet minimum slice value

# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCgAdFiEEZoWumedRZ7yvyN81+9DbCVqeKkQFAmbsPCUACgkQ+9DbCVqe
# KkTm1gf9Hs5Zfdng0E+7sr5Dpa5F+cJOXU9QJhoTWJ4XC16CygWByqMXbyeX/kvm
# HXJEm6OnkADJhikIUCoBko8uK4/96iWSrDL0sEdzASX4SM/tXu684KeL+j9G/Ql8
# iqxm6tIjaJqmbSZRMp0l5jD+ZBltRMCzBNdK1suJR2ppQgqfKj3qMLVLtq2hhqPH
# qPgwKm44hk9BEpHYqXaivzSWN5GKCgvp5ECcFXCBhDcM+8W7Dl3Mv6X0pWOpYcKZ
# d2a5KUt+Xp7WB2jkOgJYr0zKCOQCiCjGSfm/30qRDOUnwiLRWbfamRI9jUDNUtfy
# RYR+GaspurGCwSkwICdlvj+vFp/16Q==
# =5wfo
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 19 Sep 2024 15:58:45 BST
# gpg:                using RSA key 6685AE99E75167BCAFC8DF35FBD0DB095A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <[email protected]>" [full]
# Primary key fingerprint: 6685 AE99 E751 67BC AFC8  DF35 FBD0 DB09 5A9E 2A44

* tag 'pull-tcg-plugin-memory-190924-1' of https://gitlab.com/stsquad/qemu:
  contrib/plugins: avoid hanging program
  plugins: add option to dump write argument to syscall plugin
  plugins: add plugin API to read guest memory
  contrib/plugins: Add a plugin to generate basic block vectors
  util/timer: avoid deadlock when shutting down
  tests/tcg: add a system test to check memory instrumentation
  tests/tcg: ensure s390x-softmmu output redirected
  tests/tcg: only read/write 64 bit words on 64 bit systems
  tests/tcg: clean up output of memory system test
  tests/tcg/multiarch: add test for plugin memory access
  tests/tcg/plugins/mem: add option to print memory accesses
  tests/tcg: allow to check output of plugins
  tests/tcg: add mechanism to run specific tests with plugins
  plugins: extend API to get latest memory value accessed
  plugins: save value during memory accesses
  contrib/plugins: control flow plugin
  deprecation: don't enable TCG plugins by default with TCI
  deprecation: don't enable TCG plugins by default on 32 bit hosts

Signed-off-by: Peter Maydell <[email protected]>
The IGVM library allows Independent Guest Virtual Machine files to be
parsed and processed. IGVM files are used to configure guest memory
layout, initial processor state and other configuration pertaining to
secure virtual machines.

This adds the --enable-igvm configure option, enabled by default, which
attempts to locate and link against the IGVM library via pkgconfig and
sets CONFIG_IGVM if found.

The library is added to the system_ss target in backends/meson.build
where the IGVM parsing will be performed by the ConfidentialGuestSupport
object.

Signed-off-by: Roy Hopkins <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
In preparation for supporting the processing of IGVM files to configure
guests, this adds a set of functions to ConfidentialGuestSupport
allowing configuration of secure virtual machines that can be
implemented for each supported isolation platform type such as Intel TDX
or AMD SEV-SNP. These functions will be called by IGVM processing code
in subsequent patches.

This commit provides a default implementation of the functions that
either perform no action or generate an error when they are called.
Targets that support ConfidentalGuestSupport should override these
implementations.

Signed-off-by: Roy Hopkins <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Adds an IGVM loader to QEMU which processes a given IGVM file and
applies the directives within the file to the current guest
configuration.

The IGVM loader can be used to configure both confidential and
non-confidential guests. For confidential guests, the
ConfidentialGuestSupport object for the system is used to encrypt
memory, apply the initial CPU state and perform other confidential guest
operations.

The loader is configured via a new IgvmCfg QOM object which allows the
user to provide a path to the IGVM file to process.

Signed-off-by: Roy Hopkins <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
An IGVM file contains configuration of guest state that should be
applied during configuration of the guest, before the guest is started.

This patch allows the user to add an igvm-cfg object to an X86 machine
configuration that allows an IGVM file to be configured that will be
applied to the guest before it is started.

If an IGVM configuration is provided then the IGVM file is processed at
the end of the board initialization, before the state transition to
PHASE_MACHINE_INITIALIZED.

Signed-off-by: Roy Hopkins <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
…h IGVM

When using an IGVM file the configuration of the system firmware is
defined by IGVM directives contained in the file. In this case the user
should not configure any pflash devices.

This commit skips initialization of the ROM mode when pflash0 is not set
then checks to ensure no pflash devices have been configured when using
IGVM, exiting with an error message if this is not the case.

Signed-off-by: Roy Hopkins <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
The class function and implementations for updating launch data return
a code in case of error. In some cases an error message is generated and
in other cases, just the error return value is used.

This small refactor adds an 'Error **errp' parameter to all functions
which consistently set an error condition if a non-zero value is
returned.

Signed-off-by: Roy Hopkins <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Pankaj Gupta <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
…ache()

The x86 segment registers are identified by the X86Seg enumeration which
includes LDTR and TR as well as the normal segment registers. The
function 'cpu_x86_load_seg_cache()' uses the enum to determine which
segment to set. However, specifying R_LDTR or R_TR results in an
out-of-bounds access of the segment array.

Possibly by coincidence, the function does correctly set LDTR or TR in
this case as the structures for these registers immediately follow the
array which is accessed out of bounds.

This patch adds correct handling for R_LDTR and R_TR in the function.

Signed-off-by: Roy Hopkins <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
When an SEV guest is started, the reset vector and state are
extracted from metadata that is contained in the firmware volume.

In preparation for using IGVM to setup the initial CPU state,
the code has been refactored to populate vmcb_save_area for each
CPU which is then applied during guest startup and CPU reset.

Signed-off-by: Roy Hopkins <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Acked-by: Stefano Garzarella <[email protected]>
The ConfidentialGuestSupport object defines a number of virtual
functions that are called during processing of IGVM directives to query
or configure initial guest state. In order to support processing of IGVM
files, these functions need to be implemented by relevant isolation
hardware support code such as SEV.

This commit implements the required functions for SEV-ES and adds
support for processing IGVM files for configuring the guest.

Signed-off-by: Roy Hopkins <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Acked-by: Stefano Garzarella <[email protected]>
IGVM support has been implemented for Confidential Guests that support
AMD SEV and AMD SEV-ES. Add some documentation that gives some
background on the IGVM format and how to use it to configure a
confidential guest.

Signed-off-by: Roy Hopkins <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Create an enum entry within FirmwareDevice for 'igvm' to describe that
an IGVM file can be used to map firmware into memory as an alternative
to pre-existing firmware devices.

Signed-off-by: Roy Hopkins <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
For confidential guests a policy can be provided that defines the
security level, debug status, expected launch measurement and other
parameters that define the configuration of the confidential platform.

This commit adds a new function named set_guest_policy() that can be
implemented by each confidential platform, such as AMD SEV to set the
policy. This will allow configuration of the policy from a
multi-platform resource such as an IGVM file without the IGVM processor
requiring specific implementation details for each platform.

Signed-off-by: Roy Hopkins <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
The initialization sections in IGVM files contain configuration that
should be applied to the guest platform before it is started. This
includes guest policy and other information that can affect the security
level and the startup measurement of a guest.

This commit introduces handling of the initialization sections during
processing of the IGVM file.

Signed-off-by: Roy Hopkins <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Adds a handler for the guest policy initialization IGVM section and
builds an SEV policy based on this information and the ID block
directive if present. The policy is applied using by calling
'set_guest_policy()' on the ConfidentialGuestSupport object.

Signed-off-by: Roy Hopkins <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Acked-by: Stefano Garzarella <[email protected]>
The new cgs_set_guest_policy() function is provided to receive the guest
policy flags, SNP ID block and SNP ID authentication from guest
configuration such as an IGVM file and apply it to the platform prior to
launching the guest.

The policy is used to populate values for the existing 'policy',
'id_block' and 'id_auth' parameters. When provided, the guest policy is
applied and the ID block configuration is used to verify the launch
measurement and signatures. The guest is only successfully started if
the expected launch measurements match the actual measurements and the
signatures are valid.

Signed-off-by: Roy Hopkins <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Acked-by: Stefano Garzarella <[email protected]>
IGVM files can contain an initial VMSA that should be applied to each
vcpu as part of the initial guest state. The sev_features flags are
provided as part of the VMSA structure. However, KVM only allows
sev_features to be set during initialization and not as the guest is
being prepared for launch.

This patch queries KVM for the supported set of sev_features flags and
processes the IGVM file during kvm_init to determine any sev_features
flags set in the IGVM file. These are then provided in the call to
KVM_SEV_INIT2 to ensure the guest state matches that specified in the
IGVM file.

This does cause the IGVM file to be processed twice. Firstly to extract
the sev_features then secondly to actually configure the guest. However,
the first pass is largely ignored meaning the overhead is minimal.

Signed-off-by: Roy Hopkins <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Acked-by: Stefano Garzarella <[email protected]>
Previously the VMSA could not be set directly. Instead the current CPU
state was automatically populated into a VMSA within kvm as part of
KVM_SEV_SNP_LAUNCH_FINISH. This meant that it was hard to ensure the
VMSA provided by IGVM matched the resulting one in kvm.

KVM has been updated to allow the VMSA to be provided via
KVM_SEV_SNP_LAUNCH_UPDATE. In this case, kvm does not perform any
specific synchronisation during FINISH and the VMSA is guaranteed to
match that provided by QEMU.

Signed-off-by: Roy Hopkins <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.