Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to linux-6.11 with SVSM support #7

Open
wants to merge 10,000 commits into
base: master
Choose a base branch
from

Conversation

roy-hopkins
Copy link

This version of Linux works as both a host and guest kernel for COCONUT-SVSM. It is based on the upstream 6.11 kernel plus the following patch series:

Deliver pending events to VMPL0

KVM: X86: Run higher VMPL if events are pending

Add SEV-SNP restricted injection hypervisor support (Melody Wang/AMD):

KVM: SVM: Enable restricted injection for an SEV-SNP guest
KVM: SVM: Inject MCEs when restricted injection is active
KVM: SVM: Inject NMIs when restricted injection is active
KVM: SVM: Inject #HV when restricted injection is active
KVM: SVM: Add support for the SEV-SNP #HV doorbell page NAE event
x86/sev: Define the #HV doorbell page structure

Direct setting of VMSA for SEV-SNP

KVM: SEV: Allow direct setting of VMSA for SEV-SNP guests

Extend SEV-SNP SVSM support with a kvm_vcpu per VMPL

KVM: x86: Scan for IOAPIC changes at lowest VMPL
KVM: Add functions to send request to all/masked CPUs at a particular VMPL
KVM: x86: Add target VMPL to IRQs and send to APIC for VMPL
KVM: x86: Add x86 field to find the default VMPL that IRQs should target
KVM: SVM: Update SEV VMPL handling to use multiple struct kvm_vcpus
KVM: Create a child struct kvm_vcpu for each VMPL
KVM: Move kvm_vcpu fields into common structure

KVM: SEV-SNP support for running an SVSM (Tom Lendacky/AMD)

KVM: SVM: Support initialization of an SVSM
KVM: SVM: Support launching an SVSM with Restricted Injection set
KVM: SVM: Prevent injection when restricted injection is active
KVM: SVM: Maintain per-VMPL SEV features in kvm_sev_info
KVM: SVM: Invoke a specified VMPL level VMSA for the vCPU
KVM: SEV: Allow for VMPL level specification in AP create
KVM: SVM: Implement GET_AP_APIC_IDS NAE event

This adds experimental support for handling multiple VMPLs within KVM, including independent APICs for each VMPL and delivery using restricted injection.

By default, no interrupt sources are routed to VMPL0. If you want to see a sample VMPL0 interrupt in action then you can use my SVSM branch: vmpl0_interruptshopkins/svsm/tree/vmpl0_interrupts). This configures the APIC at VMPL0 to deliver periodic timer interrupts which are handled either through the interrupt vector 0x20, or through the common_isr_handler depending on whether restricted injection is enabled or not via the VMSA in the IGVM file.

You need the corresponding QEMU that supports IGVM and direct setting of the VMSA to work with this kernel. This can be found in this PR: coconut-svsm/qemu#16.

mlankhorst and others added 30 commits September 4, 2024 12:24
Suspend fbdev sooner, and disable user access before suspending to
prevent some races. I've noticed this when comparing xe suspend to
i915's.

Matches the following commits from i915:
24b412b ("drm/i915: Disable intel HPD poll after DRM poll init/enable")
1ef28d8 ("drm/i915: Suspend the framebuffer console earlier during system suspend")
bd738d8 ("drm/i915: Prevent modesets during driver init/shutdown")

Thanks to Imre for pointing me to those commits.

Driver shutdown is currently missing, but I have some idea how to
implement it next.

Signed-off-by: Maarten Lankhorst <[email protected]>
Cc: Imre Deak <[email protected]>
Reviewed-by: Uma Shankar <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Maarten Lankhorst,,, <[email protected]>
(cherry picked from commit 492be2a)
Signed-off-by: Rodrigo Vivi <[email protected]>
Enable/Disable user access only during system suspend/resume.
This should not happen during runtime s/r

v2: rebased

Reviewed-by: Arun R Murthy <[email protected]>
Signed-off-by: Imre Deak <[email protected]>
Signed-off-by: Vinod Govindapillai <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit a64e7e5)
Signed-off-by: Rodrigo Vivi <[email protected]>
Fix circular locking dependency on runtime suspend.

<4> [74.952215] ======================================================
<4> [74.952217] WARNING: possible circular locking dependency detected
<4> [74.952219] 6.10.0-rc7-xe #1 Not tainted
<4> [74.952221] ------------------------------------------------------
<4> [74.952223] kworker/7:1/82 is trying to acquire lock:
<4> [74.952226] ffff888120548488 (&dev->mode_config.mutex){+.+.}-{3:3}, at: drm_modeset_lock_all+0x40/0x1e0 [drm]
<4> [74.952260]
but task is already holding lock:
<4> [74.952262] ffffffffa0ae59c0 (xe_pm_runtime_lockdep_map){+.+.}-{0:0}, at: xe_pm_runtime_suspend+0x2f/0x340 [xe]
<4> [74.952322]
which lock already depends on the new lock.

The commit 'b1d90a86 ("drm/xe: Use the encoder suspend helper also used
by the i915 driver")' didn't do anything wrong. It actually fixed a
critical bug, because the encoder_suspend was never getting actually
called because it was returning if (has_display(xe)) instead of
if (!has_display(xe)). However, this ended up introducing the encoder
suspend calls in the runtime routines as well, causing the circular
locking dependency.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2304
Fixes: b1d90a8 ("drm/xe: Use the encoder suspend helper also used by the i915 driver")
Cc: Imre Deak <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
(cherry picked from commit 8da1944)
Signed-off-by: Rodrigo Vivi <[email protected]>
…kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:
 "Two netfs fixes for this merge window:

   - Ensure that fscache_cookie_lru_time is deleted when the fscache
     module is removed to prevent UAF

   - Fix filemap_invalidate_inode() to use invalidate_inode_pages2_range()

     Before it used truncate_inode_pages_partial() which causes
     copy_file_range() to fail on cifs"

* tag 'vfs-6.11-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fscache: delete fscache_cookie_lru_timer when fscache exits to avoid UAF
  mm: Fix filemap_invalidate_inode() to use invalidate_inode_pages2_range()
Pull smb server fixes from Steve French:

 - Fix crash in session setup

 - Fix locking bug

 - Improve access bounds checking

* tag 'v6.11-rc6-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: Unlock on in ksmbd_tcp_set_interfaces()
  ksmbd: unset the binding mark of a reused connection
  smb: Annotate struct xattr_smb_acl with __counted_by()
…rnel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - followup fix for direct io and fsync under some conditions, reported
   by QEMU users

 - fix a potential leak when disabling quotas while some extent tracking
   work can still happen

 - in zoned mode handle unexpected change of zone write pointer in
   RAID1-like block groups, turn the zones to read-only

* tag 'for-6.11-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix race between direct IO write and fsync when using same fd
  btrfs: zoned: handle broken write pointer on zones
  btrfs: qgroup: don't use extent changeset when not needed
If the length of the name string is 1 and the value of name[0] is NULL
byte, an OOB vulnerability occurs in btf_name_valid_section() and the
return value is true, so the invalid name passes the check.

To solve this, you need to check if the first position is NULL byte and
if the first character is printable.

Suggested-by: Eduard Zingerman <[email protected]>
Fixes: bd70a8f ("bpf: Allow all printable characters in BTF DATASEC names")
Signed-off-by: Jeongjun Park <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Eduard Zingerman <[email protected]>
…/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

 - hp-wmi-sensors: Check if WMI event data exists before accessing it

 - ltc2991: fix register bits defines

* tag 'hwmon-for-v6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (hp-wmi-sensors) Check if WMI event data exists
  hwmon: ltc2991: fix register bits defines
….org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools fixes from Namhyung Kim:
 "A number of small fixes for the late cycle:

   - Two more build fixes on 32-bit archs

   - Fixed a segfault during perf test

   - Fixed spinlock/rwlock accounting bug in perf lock contention"

* tag 'perf-tools-fixes-for-v6.11-2024-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
  perf daemon: Fix the build on more 32-bit architectures
  perf python: include "util/sample.h"
  perf lock contention: Fix spinlock and rwlock accounting
  perf test pmu: Set uninitialized PMU alias to null
Add selftest for cases where btf_name_valid_section() does not properly
check for certain types of names.

Suggested-by: Eduard Zingerman <[email protected]>
Signed-off-by: Jeongjun Park <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Eduard Zingerman <[email protected]>
…id_section'

Jeongjun Park says:

====================
bpf: fix incorrect name check pass logic in btf_name_valid_section

This patch was written to fix an issue where btf_name_valid_section() would
not properly check names with certain conditions and would throw an OOB vuln.
And selftest was added to verify this patch.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
Pull bcachefs fixes from Kent Overstreet:

 - Fix a typo in the rebalance accounting changes

 - BCH_SB_MEMBER_INVALID: small on disk format feature which will be
   needed for full erasure coding support; this is only the minimum so
   that 6.11 can handle future versions without barfing.

* tag 'bcachefs-2024-09-04' of git://evilpiepirate.org/bcachefs:
  bcachefs: BCH_SB_MEMBER_INVALID
  bcachefs: fix rebalance accounting
Bareudp devices update their stats concurrently.
Therefore they need proper atomic increments.

Fixes: 571912c ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.")
Signed-off-by: Guillaume Nault <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Link: https://patch.msgid.link/04b7b9d0b480158eb3ab4366ec80aa2ab7e41fcb.1725031794.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <[email protected]>
We observed a null-ptr-deref in fou_gro_receive() while shutting down
a host.  [0]

The NULL pointer is sk->sk_user_data, and the offset 8 is of protocol
in struct fou.

When fou_release() is called due to netns dismantle or explicit tunnel
teardown, udp_tunnel_sock_release() sets NULL to sk->sk_user_data.
Then, the tunnel socket is destroyed after a single RCU grace period.

So, in-flight udp4_gro_receive() could find the socket and execute the
FOU GRO handler, where sk->sk_user_data could be NULL.

Let's use rcu_dereference_sk_user_data() in fou_from_sock() and add NULL
checks in FOU GRO handlers.

[0]:
BUG: kernel NULL pointer dereference, address: 0000000000000008
 PF: supervisor read access in kernel mode
 PF: error_code(0x0000) - not-present page
PGD 80000001032f4067 P4D 80000001032f4067 PUD 103240067 PMD 0
SMP PTI
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.216-204.855.amzn2.x86_64 #1
Hardware name: Amazon EC2 c5.large/, BIOS 1.0 10/16/2017
RIP: 0010:fou_gro_receive (net/ipv4/fou.c:233) [fou]
Code: 41 5f c3 cc cc cc cc e8 e7 2e 69 f4 0f 1f 80 00 00 00 00 0f 1f 44 00 00 49 89 f8 41 54 48 89 f7 48 89 d6 49 8b 80 88 02 00 00 <0f> b6 48 08 0f b7 42 4a 66 25 fd fd 80 cc 02 66 89 42 4a 0f b6 42
RSP: 0018:ffffa330c0003d08 EFLAGS: 00010297
RAX: 0000000000000000 RBX: ffff93d9e3a6b900 RCX: 0000000000000010
RDX: ffff93d9e3a6b900 RSI: ffff93d9e3a6b900 RDI: ffff93dac2e24d08
RBP: ffff93d9e3a6b900 R08: ffff93dacbce6400 R09: 0000000000000002
R10: 0000000000000000 R11: ffffffffb5f369b0 R12: ffff93dacbce6400
R13: ffff93dac2e24d08 R14: 0000000000000000 R15: ffffffffb4edd1c0
FS:  0000000000000000(0000) GS:ffff93daee800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000102140001 CR4: 00000000007706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <IRQ>
 ? show_trace_log_lvl (arch/x86/kernel/dumpstack.c:259)
 ? __die_body.cold (arch/x86/kernel/dumpstack.c:478 arch/x86/kernel/dumpstack.c:420)
 ? no_context (arch/x86/mm/fault.c:752)
 ? exc_page_fault (arch/x86/include/asm/irqflags.h:49 arch/x86/include/asm/irqflags.h:89 arch/x86/mm/fault.c:1435 arch/x86/mm/fault.c:1483)
 ? asm_exc_page_fault (arch/x86/include/asm/idtentry.h:571)
 ? fou_gro_receive (net/ipv4/fou.c:233) [fou]
 udp_gro_receive (include/linux/netdevice.h:2552 net/ipv4/udp_offload.c:559)
 udp4_gro_receive (net/ipv4/udp_offload.c:604)
 inet_gro_receive (net/ipv4/af_inet.c:1549 (discriminator 7))
 dev_gro_receive (net/core/dev.c:6035 (discriminator 4))
 napi_gro_receive (net/core/dev.c:6170)
 ena_clean_rx_irq (drivers/amazon/net/ena/ena_netdev.c:1558) [ena]
 ena_io_poll (drivers/amazon/net/ena/ena_netdev.c:1742) [ena]
 napi_poll (net/core/dev.c:6847)
 net_rx_action (net/core/dev.c:6917)
 __do_softirq (arch/x86/include/asm/jump_label.h:25 include/linux/jump_label.h:200 include/trace/events/irq.h:142 kernel/softirq.c:299)
 asm_call_irq_on_stack (arch/x86/entry/entry_64.S:809)
</IRQ>
 do_softirq_own_stack (arch/x86/include/asm/irq_stack.h:27 arch/x86/include/asm/irq_stack.h:77 arch/x86/kernel/irq_64.c:77)
 irq_exit_rcu (kernel/softirq.c:393 kernel/softirq.c:423 kernel/softirq.c:435)
 common_interrupt (arch/x86/kernel/irq.c:239)
 asm_common_interrupt (arch/x86/include/asm/idtentry.h:626)
RIP: 0010:acpi_idle_do_entry (arch/x86/include/asm/irqflags.h:49 arch/x86/include/asm/irqflags.h:89 drivers/acpi/processor_idle.c:114 drivers/acpi/processor_idle.c:575)
Code: 8b 15 d1 3c c4 02 ed c3 cc cc cc cc 65 48 8b 04 25 40 ef 01 00 48 8b 00 a8 08 75 eb 0f 1f 44 00 00 0f 00 2d d5 09 55 00 fb f4 <fa> c3 cc cc cc cc e9 be fc ff ff 66 66 2e 0f 1f 84 00 00 00 00 00
RSP: 0018:ffffffffb5603e58 EFLAGS: 00000246
RAX: 0000000000004000 RBX: ffff93dac0929c00 RCX: ffff93daee833900
RDX: ffff93daee800000 RSI: ffff93daee87dc00 RDI: ffff93daee87dc64
RBP: 0000000000000001 R08: ffffffffb5e7b6c0 R09: 0000000000000044
R10: ffff93daee831b04 R11: 00000000000001cd R12: 0000000000000001
R13: ffffffffb5e7b740 R14: 0000000000000001 R15: 0000000000000000
 ? sched_clock_cpu (kernel/sched/clock.c:371)
 acpi_idle_enter (drivers/acpi/processor_idle.c:712 (discriminator 3))
 cpuidle_enter_state (drivers/cpuidle/cpuidle.c:237)
 cpuidle_enter (drivers/cpuidle/cpuidle.c:353)
 cpuidle_idle_call (kernel/sched/idle.c:158 kernel/sched/idle.c:239)
 do_idle (kernel/sched/idle.c:302)
 cpu_startup_entry (kernel/sched/idle.c:395 (discriminator 1))
 start_kernel (init/main.c:1048)
 secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:310)
Modules linked in: udp_diag tcp_diag inet_diag nft_nat ipip tunnel4 dummy fou ip_tunnel nft_masq nft_chain_nat nf_nat wireguard nft_ct curve25519_x86_64 libcurve25519_generic nf_conntrack libchacha20poly1305 nf_defrag_ipv6 nf_defrag_ipv4 nft_objref chacha_x86_64 nft_counter nf_tables nfnetlink poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper mousedev psmouse button ena ptp pps_core crc32c_intel
CR2: 0000000000000008

Fixes: d92283e ("fou: change to use UDP socket GRO")
Reported-by: Alphonse Kurian <[email protected]>
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
generic_ocp_write() asks the parameter "size" must be 4 bytes align.
Therefore, write the bp would fail, if the mac->bp_num is odd. Align the
size to 4 for fixing it. The way may write an extra bp, but the
rtl8152_is_fw_mac_ok() makes sure the value must be 0 for the bp whose
index is more than mac->bp_num. That is, there is no influence for the
firmware.

Besides, I check the return value of generic_ocp_write() to make sure
everything is correct.

Fixes: e5c266a ("r8152: set bp in bulk")
Signed-off-by: Hayes Wang <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
When userspace wants to take over a fdb entry by setting it as
EXTERN_LEARNED, we set both flags BR_FDB_ADDED_BY_EXT_LEARN and
BR_FDB_ADDED_BY_USER in br_fdb_external_learn_add().

If the bridge updates the entry later because its port changed, we clear
the BR_FDB_ADDED_BY_EXT_LEARN flag, but leave the BR_FDB_ADDED_BY_USER
flag set.

If userspace then wants to take over the entry again,
br_fdb_external_learn_add() sees that BR_FDB_ADDED_BY_USER and skips
setting the BR_FDB_ADDED_BY_EXT_LEARN flags, thus silently ignores the
update.

Fix this by always allowing to set BR_FDB_ADDED_BY_EXT_LEARN regardless
if this was a user fdb entry or not.

Fixes: 710ae72 ("net: bridge: Mark FDB entries that were added by user as such")
Signed-off-by: Jonas Gorski <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
axienet_dma_err_handler can race with axienet_stop in the following
manner:

CPU 1                       CPU 2
======================      ==================
axienet_stop()
    napi_disable()
    axienet_dma_stop()
                            axienet_dma_err_handler()
                                napi_disable()
                                axienet_dma_stop()
                                axienet_dma_start()
                                napi_enable()
    cancel_work_sync()
    free_irq()

Fix this by setting a flag in axienet_stop telling
axienet_dma_err_handler not to bother doing anything. I chose not to use
disable_work_sync to allow for easier backporting.

Signed-off-by: Sean Anderson <[email protected]>
Fixes: 8a3b7a2 ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver")
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
…/kernel/git/wireless/wireless

Kalle Valo says:

====================
wireless fixes for v6.11

Hopefully final fixes for v6.11 and this time only fixes to ath11k
driver. We need to revert hibernation support due to reported
regressions and we have a fix for kernel crash introduced in
v6.11-rc1.

* tag 'wireless-2024-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  MAINTAINERS: wifi: cw1200: add net-cw1200.h
  Revert "wifi: ath11k: support hibernation"
  Revert "wifi: ath11k: restore country code during resume"
  wifi: ath11k: fix NULL pointer dereference in ath11k_mac_get_eirp_power()
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
…t/tnguy/net-queue

Tony Nguyen says:

====================
ice: fix synchronization between .ndo_bpf() and reset

Larysa Zaremba says:

PF reset can be triggered asynchronously, by tx_timeout or by a user. With some
unfortunate timings both ice_vsi_rebuild() and .ndo_bpf will try to access and
modify XDP rings at the same time, causing system crash.

The first patch factors out rtnl-locked code from VSI rebuild code to avoid
deadlock. The following changes lock rebuild and .ndo_bpf() critical sections
with an internal mutex as well and provide complementary fixes.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  ice: do not bring the VSI up, if it was down before the XDP setup
  ice: remove ICE_CFG_BUSY locking from AF_XDP code
  ice: check ICE_VSI_DOWN under rtnl_lock when preparing for reset
  ice: check for XDP rings instead of bpf program when unconfiguring
  ice: protect XDP configuration with a mutex
  ice: move netif_queue_set_napi to rtnl-protected sections
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
commit 9636be8 ("x86/hyperv: Fix hyperv_pcpu_input_arg handling when
CPUs go online/offline") introduces a new cpuhp state for hyperv
initialization.

cpuhp_setup_state() returns the state number if state is
CPUHP_AP_ONLINE_DYN or CPUHP_BP_PREPARE_DYN and 0 for all other states.
For the hyperv case, since a new cpuhp state was introduced it would
return 0. However, in hv_machine_shutdown(), the cpuhp_remove_state() call
is conditioned upon "hyperv_init_cpuhp > 0". This will never be true and
so hv_cpu_die() won't be called on all CPUs. This means the VP assist page
won't be reset. When the kexec kernel tries to setup the VP assist page
again, the hypervisor corrupts the memory region of the old VP assist page
causing a panic in case the kexec kernel is using that memory elsewhere.
This was originally fixed in commit dfe94d4 ("x86/hyperv: Fix kexec
panic/hang issues").

Get rid of hyperv_init_cpuhp entirely since we are no longer using a
dynamic cpuhp state and use CPUHP_AP_HYPERV_ONLINE directly with
cpuhp_remove_state().

Cc: [email protected]
Fixes: 9636be8 ("x86/hyperv: Fix hyperv_pcpu_input_arg handling when CPUs go online/offline")
Signed-off-by: Anirudh Rayabharam (Microsoft) <[email protected]>
Reviewed-by: Vitaly Kuznetsov <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
Message-ID: <[email protected]>
rm .*.cmd when make clean

Signed-off-by: zhang jiao <[email protected]>
Reviewed-by: Saurabh Sengar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
Message-ID: <[email protected]>
We were allowing any users to create a high priority group without any
permission checks. As a result, this was allowing possible denial of
service.

We now only allow the DRM master or users with the CAP_SYS_NICE
capability to set higher priorities than PANTHOR_GROUP_PRIORITY_MEDIUM.

As the sole user of that uAPI lives in Mesa and hardcode a value of
MEDIUM [1], this should be safe to do.

Additionally, as those checks are performed at the ioctl level,
panthor_group_create now only check for priority level validity.

[1]https://gitlab.freedesktop.org/mesa/mesa/-/blob/f390835074bdf162a63deb0311d1a6de527f9f89/src/gallium/drivers/panfrost/pan_csf.c#L1038

Signed-off-by: Mary Guillemard <[email protected]>
Fixes: de85488 ("drm/panthor: Add the scheduler logical block")
Cc: [email protected]
Reviewed-by: Boris Brezillon <[email protected]>
Signed-off-by: Boris Brezillon <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
The WL-355608-A8 is a 3.5" 640x480@60Hz RGB LCD display from an unknown
OEM used in a number of handheld gaming devices made by Anbernic.
Previously committed using the OEM serial without a vendor prefix,
however following subsequent discussion the preference is to use the
integrating device vendor and name where the OEM is unknown.

There are 4 RG35XX series devices from Anbernic based on an Allwinner
H700 SoC using this panel, with the -Plus variant introduced first.
Therefore the -Plus is used as the fallback for the subsequent -H,
-2024, and -SP devices.

Alter the filename and compatible string to reflect the convention.

Fixes: 45b888a ("dt-bindings: display: panel: Add WL-355608-A8 panel")
Signed-off-by: Ryan Walklin <[email protected]>
Acked-by: Rob Herring (Arm) <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
As per the previous dt-binding commit, update the WL-355608-A8 panel
compatible to reflect the the integrating device vendor and name as the
panel OEM is unknown.

Fixes: 62ea2ee ("drm: panel: nv3052c: Add WL-355608-A8 panel")
Signed-off-by: Ryan Walklin <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
In the off-chance that waiting for the firmware to signal its booted status
timed out in the fast reset path, one must flush the cache lines for the
entire FW VM address space before reloading the regions, otherwise stale
values eventually lead to a scheduler job timeout.

Fixes: 647810e ("drm/panthor: Add the MMU/VM logical block")
Cc: [email protected]
Signed-off-by: Adrián Larumbe <[email protected]>
Acked-by: Liviu Dudau <[email protected]>
Reviewed-by: Steven Price <[email protected]>
Reviewed-by: Boris Brezillon <[email protected]>
Signed-off-by: Boris Brezillon <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Document what was discussed multiple times on list and various
virtual / in-person conversations. guard() being okay in functions
<= 20 LoC is a bit of my own invention. If the function is trivial
it should be fine, but feel free to disagree :)

We'll obviously revisit this guidance as time passes and we and other
subsystems get more experience.

Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
Deferred I/O requires struct page for framebuffer memory, which is
not guaranteed for all DMA ranges. We thus only install deferred I/O
if we have a framebuffer that requires it.

A reported bug affected the ipu-v3 and pl111 drivers, which have video
memory in either Normal or HighMem zones

[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000010000000-0x000000003fffffff]
[    0.000000]   HighMem  [mem 0x0000000040000000-0x000000004fffffff]

where deferred I/O only works correctly with HighMem. See the Closes
tags for bug reports.

v2:
- test if screen_buffer supports deferred I/O (Sima)

Signed-off-by: Thomas Zimmermann <[email protected]>
Fixes: 808a40b ("drm/fbdev-dma: Implement damage handling and deferred I/O")
Reported-by: Alexander Stein <[email protected]>
Closes: https://lore.kernel.org/all/23636953.6Emhk5qWAg@steina-w/
Reported-by: Linus Walleij <[email protected]>
Closes: https://lore.kernel.org/dri-devel/CACRpkdb+hb9AGavbWpY-=uQQ0apY9en_tWJioPKf_fAbXMP4Hg@mail.gmail.com/
Tested-by: Alexander Stein <[email protected]>
Tested-by: Linus Walleij <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: Javier Martinez Canillas <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Maxime Ripard <[email protected]>
Reviewed-by: Simona Vetter <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
The pwm devices for a pwm_chip are numbered starting at 0, the first hw
channel however has the number 1. While introducing a parametrised macro
to simplify register bit usage and making that offset explicit, one of
the usages was converted wrongly. This is fixed here.

Fixes: 7cea05a ("pwm-stm32: Make use of parametrised register definitions")
Signed-off-by: Uwe Kleine-König <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Uwe Kleine-König <[email protected]>
…ub/scm/linux/kernel/git/qcom/linux into arm/fixes

One more Qualcomm driver fix for v6.11

This resolves a deadlock in the Qualcomm uefisecapp driver following the
attempt to acquire global context is acquired in the case the device
isn't probed.

* tag 'qcom-drivers-fixes-for-6.11-2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
  firmware: qcom: uefisecapp: Fix deadlock in qcuefi_acquire()

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
…/linux/kernel/git/mmind/linux-rockchip into arm/fixes

A number of pin fixes for Puma, Rock-Pi-E and rk356x, and as it turns
out the VO0 and VO1 general register files are not identical as suggested
by their original compatible. As there are no users of those yet,
everybody agreed that we should fix the compatibles.

* tag 'v6.11-rockchip-dtsfixes' of https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
  arm64: dts: rockchip: Fix compatibles for RK3588 VO{0,1}_GRF
  dt-bindings: soc: rockchip: Fix compatibles for RK3588 VO{0,1}_GRF
  arm64: dts: rockchip: override BIOS_DISABLE signal via GPIO hog on RK3399 Puma
  arm64: dts: rockchip: fix eMMC/SPI corruption when audio has been used on RK3399 Puma
  arm64: dts: rockchip: fix PMIC interrupt pin in pinctrl for ROCK Pi E
  arm64: dts: rockchip: Remove broken tsadc pinctrl binding for rk356x

Link: https://lore.kernel.org/r/7602696.A5hrfCrGMc@diego
Signed-off-by: Arnd Bergmann <[email protected]>
roy-hopkins and others added 18 commits November 4, 2024 09:59
In preparation for supporting running vCPUs at different privilege
levels it is necessary to determine the number of VMPL levels that are
supported by the platform. This commit adds an operation that can be
used to get the maximum supported VMPL, with the minimum being 0.

Signed-off-by: Roy Hopkins <[email protected]>
Isolation technologies such as SEV-SNP introduce the concept of
virtual machine privilege levels (VMPLs), separate to the processor
CPL. A guest runs in the context of one of these VMPLs which allows
for a different register context, memory privileges, etc.

KVM must maintain state for each supported VMPL and switch between
these states before entering the guest based on guest requests or
other factors.

This patch introduces the ability to create multiple
struct kvm_cpus: one for each VMPL related to a single vCPU. This is
achieved by introducing a new structure, struct kvm_vcpu_vmpl_state that
is included in struct kvm_vcpu to track the state of each VMPL for
supported platforms (currently only SEV-SNP). The state for each VMPL is
then stored in its own struct kvm_vcpu. State that is common to all VMPL
kvm_vcpus is managed by vcpu->common, allowing a pointer to the
common fields to be shared amongst all VMPL kvm_vcpu's for a single vCPU
id.

The patch supports switching VMPLs by changing the target_vmpl in the
state structure. However, no code to generate a VMPL switch invokes this
at present.

Signed-off-by: Roy Hopkins <[email protected]>
This commit builds on Tom Lendacky's SEV-SNP support RFC patch series
and reworks the handling of VMPL switching to use multiple struct
kvm_vcpus to store VMPL context.

Signed-off-by: Roy Hopkins <[email protected]>
When a CPU supports multiple VMPLs, injected interrupts need to be sent
to the correct context. This commit adds an operation that determines
the VMPL number that IRQs should be sent to in the absence of an explicit
target VMPL.

Signed-off-by: Roy Hopkins <[email protected]>
Systems that support VMPLs need to decide which VMPL each
IRQ is destined for; each VMPL can support its own set of hardware
devices that generate interrupts.

This commit extends kvm_lapic_irq to include a target_vmpl field the
sends the IRQ to the APIC instance at the target VMPL.

Signed-off-by: Roy Hopkins <[email protected]>
… VMPL

The struct kvm_vcpu for each VMPL manages its own requests. Currently,
kvm_make_all_cpus_request() and kvm_make_vcpus_request_mask() allow
requests to be sent for VMPL0. This patch introduces two new functions:
kvm_make_all_cpus_request_vmpl() and kvm_make_vcpus_request_mask_vmpl()
that allow a particular VMPL to be targeted.

Signed-off-by: Roy Hopkins <[email protected]>
Currently all IRQ requests are targeted to the lowest priority VMPL
which is where the OS will be running. This patch updates the IOAPIC
request scan to use the default IRQ VMPL from the KVM arch
configuration.

Signed-off-by: Roy Hopkins <[email protected]>
The VMSA containing the initial CPU state for an SEV-SNP guest is
measured as part of the launch process. Currently, KVM does this
automatically during the call to KVM_SEV_SNP_LAUNCH_FINISH where the CPU
state is synchronised to the VMSA for every vCPU, measured then the
guest is launched.

This poses a problem for guests that want to have full control over the
number and contents of VMSAs, such as when using an SVSM module or
paravisor. In which case, for example, you may only want the BSP VMSA to
be provided, or have full control over non-synced registers.

As soon as the VMSA is measured it is encrypted by hardware so KVM
immediately loses sight and control over the contents. With this in
mind, there is no need to keep the VMSA in sync with KVMs view of
register state. Therefore it makes sense to bypass the sync completely
and provide a way for the VMSA to be directly specified from userspace
if required.

This commit extends the KVM_SEV_SNP_LAUNCH_UPDATE ioctl to allow VMSA
pages to be updated. When encountered, this modifies the behaviour of
KVM_SEV_SNP_LAUNCH_FINISH to prevent the sync and measurement of CPU
state. This allows for both legacy functionaly and new functionality to
co-exist.

Signed-off-by: Roy Hopkins <[email protected]>
Restricted injection is a feature which enforces additional interrupt and event
injection security protections for a SEV-SNP guest. It disables all
hypervisor-based interrupt queuing and event injection of all vectors except
a new exception vector, #HV (28), which is reserved for SNP guest use, but
never generated by hardware. #HV is only allowed to be injected into VMSAs that
execute with Restricted Injection.

The guests running with the SNP restricted injection feature active limit the
host to ringing a doorbell with a #HV exception.

Define two fields in the #HV doorbell page: a pending event field, and an
EOI assist.

Create the structure definition for the #HV doorbell page as per GHCB
specification.

Co-developed-by: Thomas Lendacky <[email protected]>
Signed-off-by: Thomas Lendacky <[email protected]>
Signed-off-by: Melody Wang <[email protected]>
To support the SEV-SNP Restricted Injection feature, the SEV-SNP guest must
register a #HV doorbell page for use with the #HV.

The #HV doorbell page NAE event allows the guest to register a #HV doorbell
page. The NAE event consists of four actions: GET_PREFERRED, SET, QUERY, CLEAR.
Implement the NAE event as per GHCB specification.

Co-developed-by: Thomas Lendacky <[email protected]>
Signed-off-by: Thomas Lendacky <[email protected]>
Signed-off-by: Melody Wang <[email protected]>
When restricted injection is active, only #HV exceptions can be injected into
the SEV-SNP guest.

Detect that restricted injection feature is active for the guest, and then
follow the #HV doorbell communication from the GHCB specification to inject the
interrupt or exception.

Co-developed-by: Thomas Lendacky <[email protected]>
Signed-off-by: Thomas Lendacky <[email protected]>
Signed-off-by: Melody Wang <[email protected]>
When restricted injection is active, only #HV exceptions can be injected into
the SEV-SNP guest.

Detect that restricted injection feature is active for the guest, and then
follow the #HV doorbell communication from the GHCB specification to inject
NMIs.

Co-developed-by: Thomas Lendacky <[email protected]>
Signed-off-by: Thomas Lendacky <[email protected]>
Signed-off-by: Melody Wang <[email protected]>
When restricted injection is active, only #HV exceptions can be injected into
the SEV-SNP guest.

Detect that restricted injection feature is active for the guest, and then
follow the #HV doorbell communication from the GHCB specification to inject the
MCEs.

Co-developed-by: Thomas Lendacky <[email protected]>
Signed-off-by: Thomas Lendacky <[email protected]>
Signed-off-by: Melody Wang <[email protected]>
Enable the restricted injection in an SEV-SNP guest by setting the restricted
injection bit in the VMSA SEV features field (SEV_FEATURES[3]) from QEMU.

Add the restricted injection supporting the hypervisor advertised features.

Co-developed-by: Thomas Lendacky <[email protected]>
Signed-off-by: Thomas Lendacky <[email protected]>
Signed-off-by: Melody Wang <[email protected]>
Each vCPU VMPL maintains its own set of pending events. When a vCPU is
kicked or exits back to the host, check using an X86 op to see if events
are pending for a higher priority VMPL and allow switching to that VMPL.

This has been implemented for SEV-SNP where VMPL0 is checked when
running a lower VMPL and the current_vmpl is switched to VMPL0 if there
are pending events.

Signed-off-by: Roy Hopkins <[email protected]>
Each vCPU can maintain paging state for multiple VMPLs. This means that
when the state is updated and the vCPU is requested to free obsolete
roots then every VMPL needs to handle thre request.

This commit updates the request to target every VMPL of every vCPU.

Signed-off-by: Roy Hopkins <[email protected]>
This is primarily designed to support an enlightened driver for the
AMD svsm based vTPM, but it could be used by any platform which
communicates with a TPM device.  The platform must fill in struct
tpm_platform_ops as the platform_data and set the device name to "tpm"
to have the binding by name work correctly.  The sole sendrecv
function is designed to do a single buffer request/response conforming
to the MSSIM protocol.  For the svsm vTPM case, this protocol is
transmitted directly to the SVSM, but it could be massaged for other
function type platform interfaces.

Signed-off-by: James Bottomley <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Stefano Garzarella <[email protected]>
If the SNP boot has a SVSM, probe for the vTPM device by sending a
SVSM_VTPM_QUERY call (function 8). The SVSM will return a bitmap with the
TPM_SEND_COMMAND bit set only if the vTPM is present and it is able to handle
TPM commands at runtime.

If a vTPM is found, register a platform device as "platform:tpm" so it
can be attached to the tpm_platform.c driver.

Signed-off-by: James Bottomley <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>
[SG] Code adjusted with some changes introduced in 6.11
Signed-off-by: Stefano Garzarella <[email protected]>
@roy-hopkins
Copy link
Author

@roy-hopkins it looks like we're missing the TPM patches here, so I just opened #8 where I rebased the last two patches of svsm branch on this PR. I just adjusted some code to build it. Feel free to include them in this PR.

Thanks Stefano. I've pushed an update that now includes your TPM patches from PR #8

@roy-hopkins
Copy link
Author

The installation docs in the SVSM repo say the user should use the svsm branch. Should we change the target branch of this PR?

I think that once this PR has been approved then instead of being merged it will become a new branch. This means that the target branch does not really matter.

Fix an issue which causes a compilation error on certain recent versions
of the compiler.

Signed-off-by: Roy Hopkins <[email protected]>
@roy-hopkins
Copy link
Author

I've updated the branch with a fix that was found during tested when the -vga std option is passed to the QEMU launch command. This was (eventually) traced to an issue in updating the nested paging state for each VMPL in the vCPU.

The latest branch also includes the TPM patches from Stefano's PR (thanks Stefano).

Everything seems to work well and reliably now.

@stefano-garzarella
Copy link
Member

@roy-hopkins it looks like we're missing the TPM patches here, so I just opened #8 where I rebased the last two patches of svsm branch on this PR. I just adjusted some code to build it. Feel free to include them in this PR.

Thanks Stefano. I've pushed an update that now includes your TPM patches from PR #8

@roy-hopkins great, thanks! I'll close #8 and tomorrow I'll try running all together and provide my approval!

@stefano-garzarella
Copy link
Member

@roy-hopkins I tried this Linux on the host and QEMU from coconut-svsm/qemu#16, but I have qemu-system-x86_64: Convert non guest_memfd backed memory region (0xfee00000 ,+ 0x1000) to private error.

Full log:

sudo /home/sgarzare/repos/qemu-svsm/build/qemu-system-x86_64 -enable-kvm -cpu EPYC-v4 -machine q35,confidential-guest-support=sev0,memory-backend=mem0,igvm-cfg=igvm0 -object memory-backend-memfd,size=8G,id=mem0,share=true,prealloc=false,reserve=false -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1 -object igvm-cfg,id=igvm0,file=/home/sgarzare/repos/svsm-review/scripts/../bin/coconut-qemu.igvm -smp 4 -no-reboot -netdev user,id=vmnic -device e1000,netdev=vmnic,romfile= -drive file=/home/sgarzare/repos/snp-svsm-vtpm/images/fedora-luks.qcow2,if=none,id=disk0,format=qcow2,snapshot=on -device virtio-scsi-pci,id=scsi0,disable-legacy=on,iommu_platform=on -device scsi-hd,drive=disk0,bootindex=0 -nographic -monitor none -serial stdio -serial null -serial null -serial null
[Stage2] COCONUT Secure Virtual Machine Service Module
[Stage2] Order-00: total pages:    15 free pages:     0
[Stage2] Order-01: total pages:     0 free pages:     0
[Stage2] Order-02: total pages:     0 free pages:     0
[Stage2] Order-03: total pages:     0 free pages:     0
[Stage2] Order-04: total pages:     2 free pages:     2
[Stage2] Order-05: total pages:     3 free pages:     3
[Stage2] Total memory: 572KiB free memory: 512KiB
[Stage2]   kernel_region_phys_start = 0x0000008000000000
[Stage2]   kernel_region_phys_end   = 0x0000008001000000
[Stage2]   kernel_virtual_base   = 0xffffff8000000000
[SVSM] COCONUT Secure Virtual Machine Service Module
[SVSM] Order-00: total pages:    47 free pages:     0
[SVSM] Order-01: total pages:     1 free pages:     1
[SVSM] Order-02: total pages:     1 free pages:     1
[SVSM] Order-03: total pages:     1 free pages:     1
[SVSM] Order-04: total pages:     0 free pages:     0
[SVSM] Order-05: total pages:   106 free pages:   106
[SVSM] Total memory: 13812KiB free memory: 13624KiB
[SVSM] Boot stack starts        @ 0xffffff8000269000
[SVSM] BSP Runtime stack starts @ 0xffffff000020c000
[SVSM] Guest Memory Regions:
[SVSM]   000000000000000000-000000000080000000
[SVSM]   000000000100000000-000000000280000000
[SVSM] Invalidating boot region 000000000000000000-0000000000000a0000
[SVSM] Invalidating boot region 000000000000800000-000000000000894000
[SVSM] Invalidating boot region 000000000000894000-000000000000af5908
[SVSM] Invalidating boot region 000000000000af6000-000000000000af9000
[SVSM] CPU count is 4
[SVSM] 4 CPU(s) present
[SVSM] Launching AP with APIC-ID 1
[SVSM] AP with APIC-ID 1 is online
[SVSM] Launching AP with APIC-ID 2
[SVSM] Launching request-processing task on CPU 1
[SVSM] AP with APIC-ID 2 is online
[SVSM] Launching AP with APIC-ID 3
[SVSM] Launching request-processing task on CPU 2
[SVSM] AP with APIC-ID 3 is online
[SVSM] Brought 3 AP(s) online
[SVSM] Launching request-processing task on CPU 3
[SVSM] FW Meta Data
[SVSM]   CPUID Page   : 0x0080e000
[SVSM]   Secrets Page : 0x0080d000
[SVSM]   CAA Page     : 0x0080f000
[SVSM]   Pre-Validated Region 0x0000000000800000-0x0000000000809000
[SVSM]   Pre-Validated Region 0x000000000080a000-0x000000000080d000
[SVSM]   Pre-Validated Region 0x0000000000810000-0x0000000000820000
[SVSM] Validating 0x0000000000800000-0x0000000000809000
[SVSM] Validating 0x000000000080a000-0x0000000000820000
[SVSM] Flash region 0 at 0x00000000ffc00000 size 000000000000400000
[SVSM] Size of OBJECT = 1204
[SVSM] Size of components in TPMT_SENSITIVE = 744
[SVSM]     TPMI_ALG_PUBLIC                 2
[SVSM]     TPM2B_AUTH                      50
[SVSM]     TPM2B_DIGEST                    50
[SVSM]     TPMU_SENSITIVE_COMPOSITE        642
[SVSM] MAX_CONTEXT_SIZE can be reduced to 1264 (1344)
[SVSM] VTPM: Microsoft TPM 2.0 initialized
[SVSM] [CPU 0] Virtual memory pages used: 0 * 4K, 0 * 2M
[SVSM] VMSA PA: 0x8000f4f000
[SVSM] Launching Firmware
[SVSM] Launching request-processing task on CPU 0
[SVSM] Failed to launch /init
qemu-system-x86_64: Convert non guest_memfd backed memory region (0xfee00000 ,+ 0x1000) to private
EAX=00000000 EBX=00000000 ECX=00000000 EDX=00800f12
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 ffff0000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT=     00000000 0000ffff
IDT=     00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=c5 5a 08 2d e9 22 ff 90 90 90 90 90 00 00 00 00 56 54 46 00 <0f> 20 c0 a8 01 74 05 e9 21 ff ff ff e9 01 ff 90 00 00 00 00 00 00 00 00 00 00 00 00 00 00
qemu-system-x86_64: terminating on signal 2

Linux commit: fd20cbd4a899 KVM: X86: Fix compile error
QEMU commit: bcb7121d03 sev: Provide VMSA to kvm via KVM_SEV_SNP_LAUNCH_UPDATE

I also tried https://github.com/roy-hopkins/svsm/tree/vmpl0_interrupts with the same behavior.
What I'm missing?

@stefano-garzarella
Copy link
Member

I just tried to build svsm without edk2, and I don't have the issue anymore, but I can't boot Linux.
@roy-hopkins do we need to update also edk2?

@soelangen
Copy link

soelangen commented Nov 11, 2024

@stefano-garzarella in case you have not figured it out already and for others in coconut-svsm/qemu#15 (comment) it was mentioned that you would need to use the master branch of https://github.com/tianocore/edk2 due to some incompatibilities. Using either the master branch or edk2-stable202408.01 I am able to boot into the UEFI shell (have not tried to boot Linux).

I have also tried to adapt the patch for the SVSM vTPM to the current edk2 stable tag https://github.com/soelangen/edk2/tree/svsm_vTPM however OVMF gets stuck during the boot process at the following point.
I do not have the knowledge to debug why this is happening so maybe someone else with more knowledge is able to create an EDK2 patch that boots correctly.

Edit: Booting with TPM patch (https://github.com/soelangen/edk2/tree/svsm_vTPM) into UEFI shell is possible had some problems on my system.

@CookieComputing
Copy link

Thanks for pointing this out @soelangen! Using the edk2 master branch and the respective host patches from this PR + the patched QEMU, I was able to use direct boot to launch Linux.

@stefano-garzarella
Copy link
Member

@soelangen yeah, thanks for that! I confirm that using upstream edk2 master branch everything is fine and I can also boot linux from the disk.

@CookieComputing
Copy link

I don't know if anyone had issues with applying id-auth and id-block to this set-up, but at least it seems like I'm running into errors with this at the QEMU layer?

/root/qemu-system-x86_64 -cpu EPYC-v4 -smp 4 -m 63240 -enable-kvm -nographic -drive if=virtio,format=qcow2,file=/root/rootfs.qcow2 -monitor tcp:127.0.0.1:50000,server,nowait -qmp tcp:127.0.0.1:50001,server,nowait -object rng-random,filename=/dev/urandom,id=rng0 -machine q35 -serial tcp:127.0.0.1:55555,server,nowait -device virtio-net-pci,netdev=net0,romfile= -netdev user,id=net0 -kernel /root/cvm_linuz -initrd /root/initrd.cpio.gz -append console=ttyS0 audit=0 biosdevname=0 net.ifnames=0 -object igvm-cfg,file=/root/coconut-qemu.igvm,id=igvm0 -object memory-backend-memfd,id=ram1,size=63240M,share=true,reserve=false -machine q35,confidential-guest-support=sev0,memory-backend=ram1,igvm-cfg=igvm0 -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-block=YskhYwAw4c86GMYfBrM4zX7EZ5trk4wT/USxxZcHkA+YoG1bBgvIIcpCbtDRKottAAAAAAAAAAAAAAAAAAAAAGnQ0fvT6yQV2mgjv8+AclsBAAAAAAAAAAAAAwAAAAAA,author-key-enabled=true,kernel-hashes=on,id-auth=AQAAAAEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAHivsopavEy10YqZYJk9GstaO6emDzt3Gxfh46hSU/N+/RHJku+JOPE67JY8g4DjPAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAmueAD5T7sUuFBqqxH0uP5Jjtw/bCaG/omrNfhOdpYJZ5803SZ9SHOotvhpUgyGrgAAALnOePm50jaOdGT2kursQgYJoougvKz2puaCY7Vexr4qhU7rp7kFR6f3VfTIi8gmIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAALIeAkfrAbln/VsK0jVk9aPmsIiTPo9AGnkJaozePqoEEYZLrO+0g7WBz5hVtSz1CwfonsPXB1YkP3j96wrXkNGPGtegZD5pbCo/7RyLc9z/cbM5IR/+Eg30Eay4/vVHrEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACE/FMIVX73G8r1VZg7bekdKM+midDq37dLZOQDbfthxiXZynEh/m4TdccBK7w8+hfxNrMjV2XSXdR8XQ9dDgAyk6epBhKN77VIvMnp9XO9EBRwKuQqu4SMEgeBbhKbjUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAt570CZAqF7P/6LbhJgmJhDXCZwlXFJGY6fbr9pnBNIaV6WkLP90AAaAb64Zs+evga none
qemu-system-x86_64: ../../qemu/hw/i386/pc_sysfw_ovmf.c:113: pc_system_ovmf_table_find: Assertion `ovmf_flash_parsed' failed.

The same command without those two args, however, launches just fine on my setup:

/root/qemu-system-x86_64 -cpu EPYC-v4 -smp 4 -m 63240 -enable-kvm -nographic -drive if=virtio,format=qcow2,file=/root/rootfs.qcow2 -monitor tcp:127.0.0.1:50000,server,nowait -qmp tcp:127.0.0.1:50001,server,nowait -object rng-random,filename=/dev/urandom,id=rng0 -machine q35 -serial tcp:127.0.0.1:55555,server,nowait -device virtio-net-pci,netdev=net0,romfile= -netdev user,id=net0 -kernel /root/cvm_vmlinuz -initrd /root/initrd.cpio.gz -append console=ttyS0 audit=0 biosdevname=0 net.ifnames=0 -object igvm-cfg,file=/root/coconut-qemu.igvm,id=igvm0 -object memory-backend-memfd,id=ram1,size=63240M,share=true,reserve=false -machine q35,confidential-guest-support=sev0,memory-backend=ram1,igvm-cfg=igvm0 -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,author-key-enabled=false -vga none

Can anyone replicate this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.