Known issue: L2 cache flushing due to StarFive 7100 architecture #1

pdp7 · 2021-04-11T18:59:43Z

Please be aware there is an issue in the Linux 5.10 kernel fork that describes why L2 cache flushing is needed:

Known issue: L2 cache flushing due to StarFive 7100 architecture

Please discuss here issues related to how u-boot is handling the L2 cache flushing.

pdp7 · 2021-04-19T21:08:50Z

refer to starfive-tech/sft-riscv-linux-5.10#2

pdp7 · 2021-04-28T07:11:49Z

Note on JH7100 Cache Coherence has been published
https://github.com/starfive-tech/beaglev_doc/blob/main/JH7100%20Cache%20Coherence%20V1.0.pdf

brucehoult · 2021-05-06T02:35:41Z

Occasionally flushing L2 cache is one thing, but L2 cache is currently not enabled AT ALL.

tekkamanninja · 2021-08-26T01:58:41Z

Occasionally flushing L2 cache is one thing, but L2 cache is currently not enabled AT ALL.

we have enabled it in uboot by

u-boot/board/starfive/jh7100/jh7100.c

Line 1289 in 847782c

#if CONFIG_IS_ENABLED(STARFIVE_JH7100_CACHE_WAYENABLE)

Enable RTC command to be able to check available. And also enable ZynqMP RTC driver to be possible to use by default. Here is the list when both drivers are enabled: ZynqMP> rtc list RTC #0 - rtc_emul RTC #1 - rtc@ffa60000 Signed-off-by: Michal Simek <[email protected]>

Lot of PCI and PCIe controllers are using standard Config Address for PCI Configuration Mechanism #1 or its extended version. So add PCI_CONF1_ADDRESS() and PCI_CONF1_EXT_ADDRESS() macros into U-Boot's pci.h header file which can be suitable for most PCI and PCIe controller drivers. Drivers do not have to invent their own macros and can use these new U-Boot macros. Signed-off-by: Pali Rohár <[email protected]> Reviewed-by: Simon Glass <[email protected]>

PCI gt64120 driver uses standard format of Config Address for PCI Configuration Mechanism #1. So use new U-Boot macro PCI_CONF1_ADDRESS() and remove old custom driver address macros. Signed-off-by: Pali Rohár <[email protected]> Reviewed-by: Simon Glass <[email protected]>

PCI mpc85xx driver uses extended format of Config Address for PCI Configuration Mechanism #1. So use new U-Boot macro PCI_CONF1_EXT_ADDRESS(). Signed-off-by: Pali Rohár <[email protected]> Reviewed-by: Simon Glass <[email protected]>

PCI msc01 driver uses standard format of Config Address for PCI Configuration Mechanism #1 but with cleared Enable bit. So use new U-Boot macro PCI_CONF1_ADDRESS() with clearing PCI_CONF1_ENABLE bit and remove old custom driver address macros. Signed-off-by: Pali Rohár <[email protected]> Reviewed-by: Simon Glass <[email protected]>

PCI mvebu driver uses extended format of Config Address for PCI Configuration Mechanism #1. So use new U-Boot macro PCI_CONF1_EXT_ADDRESS() and remove old custom driver address macros. Signed-off-by: Pali Rohár <[email protected]> Reviewed-by: Simon Glass <[email protected]>

PCI tegra driver uses extended format of Config Address for PCI Configuration Mechanism #1 but with cleared Enable bit. So use new U-Boot macro PCI_CONF1_EXT_ADDRESS() with clearing PCI_CONF1_ENABLE bit and remove old custom driver address function. Signed-off-by: Pali Rohár <[email protected]> Reviewed-by: Simon Glass <[email protected]>

PCI fsl driver uses extended format of Config Address for PCI Configuration Mechanism #1. So use new U-Boot macro PCI_CONF1_EXT_ADDRESS(). Signed-off-by: Pali Rohár <[email protected]> Reviewed-by: Simon Glass <[email protected]>

PCI mediatek driver uses extended format of Config Address for PCI Configuration Mechanism #1 but with cleared Enable bit. So use new U-Boot macro PCI_CONF1_EXT_ADDRESS() with clearing PCI_CONF1_ENABLE bit and remove old custom driver address macros. Signed-off-by: Pali Rohár <[email protected]> Reviewed-by: Simon Glass <[email protected]>

PCI sh7780 driver uses standard format of Config Address for PCI Configuration Mechanism #1. So use new U-Boot macro PCI_CONF1_ADDRESS(). Signed-off-by: Pali Rohár <[email protected]> Reviewed-by: Simon Glass <[email protected]>

x86 platform uses standard format of Config Address for PCI Configuration Mechanism #1. So use new U-Boot macro PCI_CONF1_ADDRESS(). Signed-off-by: Pali Rohár <[email protected]> Reviewed-by: Simon Glass <[email protected]>

mcf5445x platform uses standard format of Config Address for PCI Configuration Mechanism #1. So use new U-Boot macro PCI_CONF1_ADDRESS(). Signed-off-by: Pali Rohár <[email protected]> Reviewed-by: Simon Glass <[email protected]>

sh7751 platform uses standard format of Config Address for PCI Configuration Mechanism #1. Commit 72c2f4a ("pci: sh7751: Convert to DM and DT probing") which did conversion of PCI sh7751 driver to DM, broke access to config space as that commit somehow swapped device and function bits in config address. Fix all these issues by using new U-Boot macro PCI_CONF1_ADDRESS() which calculates Config Address correctly. Also remove nonsense function sh7751_pci_addr_valid() which was introduced in commit 72c2f4a ("pci: sh7751: Convert to DM and DT probing") probably due to workarounded issues with mixing/swapping device and function bits of config address which probably resulted in non-working access to some devices. With correct composing of config address there should not be such issue anymore. Signed-off-by: Pali Rohár <[email protected]> Fixes: 72c2f4a ("pci: sh7751: Convert to DM and DT probing") Cc: Marek Vasut <[email protected]> Reviewed-by: Simon Glass <[email protected]>

Since commit de39dc7 ("arm: armv7-a: Compile and tune for armv7-a instead of armv5") is used -march=armv7-a option for Omap3 platforms. With directive ".arch_extension sec" it is possible for -march=armv7-a to directly use ARM SMC instruction. So enable ".arch_extension sec" in Omap3 lowlevel_init.S and replace hand assembled ".word 0xe1600071" by "SMC starfive-tech#1". Since commit 51d0638 ("arm: omap-common: add secure smc entry") same pattern is already used in arch/arm/cpu/armv7/omap-common/lowlevel_init.S. Signed-off-by: Pali Rohár <[email protected]>

…ready When a boot hart executes sbi_hsm_hart_start() to start a secondary hart, next_arg1, next_addr and next_mode for the latter are stored in the scratch area after the state has been set to SBI_HSM_STATE_START_PENDING. The secondary hart waits in the loop with wfi() in sbi_hsm_hart_wait() at that time. However, "wfi" instruction is not guaranteed to wait for an interrupt to be received by the hart, it is just a hint for the CPU. According to RISC-V Privileged Architectures spec. v20211203, even an implementation of "wfi" as "nop" is legal. So, the secondary might leave the loop in sbi_hsm_hart_wait() as soon as its state has been set to SBI_HSM_STATE_START_PENDING, even if it got no IPI or it got an IPI unrelated to sbi_hsm_hart_start(). This could lead to the following race condition when booting Linux, for example: Boot hart (#0) Secondary hart (starfive-tech#1) runs Linux startup code waits in sbi_hsm_hart_wait() sbi_ecall(SBI_EXT_HSM, SBI_EXT_HSM_HART_START, ...) enters sbi_hsm_hart_start() sets state of hart starfive-tech#1 to START_PENDING leaves sbi_hsm_hart_wait() runs to the end of init_warmboot() returns to scratch->next_addr (next_addr can be garbage here) sets next_addr, etc. for hart starfive-tech#1 (no good: hart starfive-tech#1 has already left) sends IPI to hart starfive-tech#1 (no good either) If this happens, the secondary hart jumps to a wrong next_addr at the end of init_warmboot(), which leads to a system hang or crash. To reproduce the issue more reliably, one could add a delay in sbi_hsm_hart_start() after setting the hart's state but before sending IPI to that hart: hstate = atomic_cmpxchg(&hdata->state, SBI_HSM_STATE_STOPPED, SBI_HSM_STATE_START_PENDING); ... + sbi_timer_mdelay(10); init_count = sbi_init_count(hartid); rscratch->next_arg1 = arg1; rscratch->next_addr = saddr; The issue can be reproduced, for example, in a QEMU VM with '-machine virt' and 2 or more CPUs, with Linux as the guest OS. This patch moves writing of next_arg1, next_addr and next_mode for the secondary hart before setting its state to SBI_HSM_STATE_START_PENDING. In theory, it is possible that two or more harts enter sbi_hsm_hart_start() for the same target hart simultaneously. To make sure the current hart has exclusive access to the scratch area of the target hart at that point, a per-hart 'start_ticket' is used. It is initially 0. The current hart tries to acquire the ticket first (set it to 1) at the beginning of sbi_hsm_hart_start() and only proceeds if it has successfully acquired it. The target hart reads next_addr, etc., and then the releases the ticket (sets it to 0) before calling sbi_hart_switch_mode(). This way, even if some other hart manages to enter sbi_hsm_hart_start() after the ticket has been released but before the target hart jumps to next_addr, it will not cause problems. atomic_cmpxchg() already has "acquire" semantics, among other things, so no additional barriers are needed in hsm_start_ticket_acquire(). No hart can perform or observe the update of *rscratch before setting of 'start_ticket' to 1. atomic_write() only imposes ordering of writes, so an explicit barrier is needed in hsm_start_ticket_release() to ensure its "release" semantics. This guarantees that reads of scratch->next_addr, etc., in sbi_hsm_hart_start_finish() cannot happen after 'start_ticket' has been released. Signed-off-by: Evgenii Shatokhin <[email protected]> Reviewed-by: Anup Patel <[email protected]>

pdp7 self-assigned this Apr 11, 2021

pdp7 added known issue 7100 SoC and removed known issue labels Apr 11, 2021

pdp7 mentioned this issue Apr 11, 2021

Known issue: L2 cache flushing due to StarFive 7100 architecture starfive-tech/linux#1

Closed

pdp7 closed this as completed Apr 19, 2021

pdp7 reopened this Apr 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Known issue: L2 cache flushing due to StarFive 7100 architecture #1

Known issue: L2 cache flushing due to StarFive 7100 architecture #1

pdp7 commented Apr 11, 2021

pdp7 commented Apr 19, 2021

pdp7 commented Apr 28, 2021

brucehoult commented May 6, 2021

tekkamanninja commented Aug 26, 2021

Known issue: L2 cache flushing due to StarFive 7100 architecture #1

Known issue: L2 cache flushing due to StarFive 7100 architecture #1

Comments

pdp7 commented Apr 11, 2021

pdp7 commented Apr 19, 2021

pdp7 commented Apr 28, 2021

brucehoult commented May 6, 2021

tekkamanninja commented Aug 26, 2021