Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
lib: sbi: Set the state of a hart to START_PENDING after the hart is …
…ready When a boot hart executes sbi_hsm_hart_start() to start a secondary hart, next_arg1, next_addr and next_mode for the latter are stored in the scratch area after the state has been set to SBI_HSM_STATE_START_PENDING. The secondary hart waits in the loop with wfi() in sbi_hsm_hart_wait() at that time. However, "wfi" instruction is not guaranteed to wait for an interrupt to be received by the hart, it is just a hint for the CPU. According to RISC-V Privileged Architectures spec. v20211203, even an implementation of "wfi" as "nop" is legal. So, the secondary might leave the loop in sbi_hsm_hart_wait() as soon as its state has been set to SBI_HSM_STATE_START_PENDING, even if it got no IPI or it got an IPI unrelated to sbi_hsm_hart_start(). This could lead to the following race condition when booting Linux, for example: Boot hart (#0) Secondary hart (starfive-tech#1) runs Linux startup code waits in sbi_hsm_hart_wait() sbi_ecall(SBI_EXT_HSM, SBI_EXT_HSM_HART_START, ...) enters sbi_hsm_hart_start() sets state of hart starfive-tech#1 to START_PENDING leaves sbi_hsm_hart_wait() runs to the end of init_warmboot() returns to scratch->next_addr (next_addr can be garbage here) sets next_addr, etc. for hart starfive-tech#1 (no good: hart starfive-tech#1 has already left) sends IPI to hart starfive-tech#1 (no good either) If this happens, the secondary hart jumps to a wrong next_addr at the end of init_warmboot(), which leads to a system hang or crash. To reproduce the issue more reliably, one could add a delay in sbi_hsm_hart_start() after setting the hart's state but before sending IPI to that hart: hstate = atomic_cmpxchg(&hdata->state, SBI_HSM_STATE_STOPPED, SBI_HSM_STATE_START_PENDING); ... + sbi_timer_mdelay(10); init_count = sbi_init_count(hartid); rscratch->next_arg1 = arg1; rscratch->next_addr = saddr; The issue can be reproduced, for example, in a QEMU VM with '-machine virt' and 2 or more CPUs, with Linux as the guest OS. This patch moves writing of next_arg1, next_addr and next_mode for the secondary hart before setting its state to SBI_HSM_STATE_START_PENDING. In theory, it is possible that two or more harts enter sbi_hsm_hart_start() for the same target hart simultaneously. To make sure the current hart has exclusive access to the scratch area of the target hart at that point, a per-hart 'start_ticket' is used. It is initially 0. The current hart tries to acquire the ticket first (set it to 1) at the beginning of sbi_hsm_hart_start() and only proceeds if it has successfully acquired it. The target hart reads next_addr, etc., and then the releases the ticket (sets it to 0) before calling sbi_hart_switch_mode(). This way, even if some other hart manages to enter sbi_hsm_hart_start() after the ticket has been released but before the target hart jumps to next_addr, it will not cause problems. atomic_cmpxchg() already has "acquire" semantics, among other things, so no additional barriers are needed in hsm_start_ticket_acquire(). No hart can perform or observe the update of *rscratch before setting of 'start_ticket' to 1. atomic_write() only imposes ordering of writes, so an explicit barrier is needed in hsm_start_ticket_release() to ensure its "release" semantics. This guarantees that reads of scratch->next_addr, etc., in sbi_hsm_hart_start_finish() cannot happen after 'start_ticket' has been released. Signed-off-by: Evgenii Shatokhin <[email protected]> Reviewed-by: Anup Patel <[email protected]>
- Loading branch information