Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modbus Guru Meditation Error: Core 1 panic'ed (Cache disabled but cached memory region accessed) (IDFGH-3775) #5694

Closed
simon-thiebaut opened this issue Aug 4, 2020 · 18 comments
Labels
Status: Done Issue is done internally

Comments

@simon-thiebaut
Copy link

simon-thiebaut commented Aug 4, 2020

Environment

  • Development Kit: Custom
  • Kit version (for WroverKit/PicoKit/DevKitC): Custom
  • Module or chip used: ESP32-WROOM-32D
  • IDF version (run git describe --tags to find it):
    • good behavior => v4.1-dev-1100-g18d435e26
    • bad behavior => v4.1-rc-10-g70675c7d1
  • Build System: CMake
  • Compiler version (run xtensa-esp32-elf-gcc --version to find it):
    • good behavior => (crosstool-NG esp-2019r2) 8.2.0
    • bad behavior => (crosstool-NG esp-2020r2) 8.2.0
  • Operating System: Windows
  • (Windows only) environment type: MSYS2 mingw32
  • Using an IDE?: Yes => eclipse CDT
  • Power Supply: Custom

Problem Description

We were encountering a memory leak in the wifi stack on the previous version (v4.1-dev-1100-g18d435e26). As it was fixed on release/v4.1, we decided to move to this branch head (also, we're going to production so it was better to go to a release branch rather than staying on master).

So now we are on v4.1-rc-10-g70675c7d1.
Problem is that we always get "Core 1 panic'ed (Cache disabled but cached memory region accessed)" errors causing reboot after some minutes, without explanation, as our code has not changed before and after SDK update.

We suspect it comes from Modbus/UART ISR handlers, because we've tried the following independently:

  • disable Modbus stack usage => problem disappears
  • removed UART and Modbus ISR handlers from IRAM by changing SDK config CONFIG_FMB_TIMER_ISR_IN_IRAM and CONFIG_UART_ISR_IN_IRAM to "n" => problem disappears.

Also, we've checked the differences between the 2 versions, and dug a bit in the code, and we've found inconsistencies for code that should be in IRAM, for example:

  • vTimerGroupIsr => runs in IRAM, but can access constants in flash such as usTimerIndex1 (not in DRAM)
  • uart_rx_intr_handler_default => runs in DRAM, but can call functions in flash such as uart_hal_write_txfifo or uart_hal_read_rxfifo (not in IRAM).
    There might be others as well, it is only the ones we've identified.
    So maybe this added to other amount of running code can cause more chances to get cache exceptions. We've tried to move the objects we've identified from flash to IRAM/DRAM, but without success...

Expected Behavior

Code shall not crash with the new SDK version, as before.

Actual Behavior

Code crashes with the new SDK version.

Steps to reproduce

I have not made the effort yet to try the simple Modbus examples. But I'm not sure this will fail... I'm afraid it could be due to the size of the binary: the more it is, the more cache accesses might be required, and so the more the problem could occur.

Code to reproduce this issue

NA

Debug Logs

Guru Meditation Error: Core  1 panic'ed (Cache disabled but cached memory region accessed)
Core 1 register dump:
PC      : 0x40130b2c  PS      : 0x00060034  A0      : 0x40082e64  A1      : 0x3ffbe840  
0x40130b2c: uart_ll_is_tx_idle at c:\users\simon.thiebaut\documents\projets\zs500_v3\zpce_embedded_zs500_ihm\build/../esp-idf/components/soc/esp32/include/hal/uart_ll.h:740

0x40082e64: _xt_lowint1 at c:\users\simon.thiebaut\documents\projets\zs500_v3\zpce_embedded_zs500_ihm\build/../esp-idf/components/freertos/xtensa_vectors.S:1153

A2      : 0x3ffd8a9c  A3      : 0x3ffc1e00  A4      : 0x3ffd1d6c  A5      : 0x00000020  
A6      : 0x00000001  A7      : 0x00004000  A8      : 0x8008c5ad  A9      : 0x00000080  
A10     : 0x3ff6e000  A11     : 0x00000000  A12     : 0x60d6de31  A13     : 0x60d6de31  
A14     : 0xffffffff  A15     : 0x3ffc8450  SAR     : 0x00000000  EXCCAUSE: 0x00000007  
EXCVADDR: 0x00000000  LBEG    : 0x00000000  LEND    : 0x00000000  LCOUNT  : 0x00000000  
Core 1 was running in ISR context:
EPC1    : 0x40089983  EPC2    : 0x00000000  EPC3    : 0x00000000  EPC4    : 0x40130b2c
0x40089983: spi_flash_op_block_func at c:\users\simon.thiebaut\documents\projets\zs500_v3\zpce_embedded_zs500_ihm\build/../esp-idf/components/spi_flash/cache_utils.c:101 (discriminator 1)

0x40130b2c: uart_ll_is_tx_idle at c:\users\simon.thiebaut\documents\projets\zs500_v3\zpce_embedded_zs500_ihm\build/../esp-idf/components/soc/esp32/include/hal/uart_ll.h:740

ELF file SHA256: 4b09f0aacc1d6396


Backtrace: 0x40130b29:0x3ffbe840 0x40082e61:0x3ffbe8a0 0x40089980:0x3ffb7e60 0x40082514:0x3ffb7e90
0x40130b29: timer_disable_intr at c:\users\simon.thiebaut\documents\projets\zs500_v3\zpce_embedded_zs500_ihm\build/../esp-idf/components/driver/timer.c:429

0x40082e61: _xt_lowint1 at c:\users\simon.thiebaut\documents\projets\zs500_v3\zpce_embedded_zs500_ihm\build/../esp-idf/components/freertos/xtensa_vectors.S:1153

0x40089980: spi_flash_op_block_func at c:\users\simon.thiebaut\documents\projets\zs500_v3\zpce_embedded_zs500_ihm\build/../esp-idf/components/spi_flash/cache_utils.c:101 (discriminator 1)

0x40082514: ipc_task at c:\users\simon.thiebaut\documents\projets\zs500_v3\zpce_embedded_zs500_ihm\build/../esp-idf/components/esp_common/src/ipc.c:62


Re-enable cpu cache.
Rebooting...
ets Jun  8 2016 00:22:57

rst:0x3 (SW_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:4
load:0x3fff0034,len:7360
ho 0 tail 12 room 4
load:0x40078000,len:13072
load:0x40080400,len:3896
0x40080400: _init at ??:?
entry 0x40080688

I (30) boot: ESP-IDF v4.1-rc-10-g70675c7d1 2nd stage bootloader
I (30) boot: compile time 09:09:21
I (30) boot: chip revision: 1
I (35) boot_comm: chip revision: 1, min. bootloader chip revision: 0
I (42) boot.esp32: SPI Speed      : 40MHz
I (46) boot.esp32: SPI Mode       : DIO
I (51) boot.esp32: SPI Flash Size : 8MB
I (55) boot: Enabling RNG early entropy source...
I (61) boot: Partition Table:
I (64) boot: ## Label            Usage          Type ST Offset   Length
I (72) boot:  0 nvs              WiFi data        01 02 00009000 00004000
I (79) boot:  1 otadata          OTA data         01 00 0000d000 00002000
I (87) boot:  2 phy_init         RF data          01 01 0000f000 00001000
I (94) boot:  3 factory          factory app      00 00 00010000 00200000
I (102) boot:  4 ota_ihm1         OTA app          00 10 00210000 00200000
I (109) boot:  5 ota_ihm2         OTA app          00 11 00410000 00200000
I (117) boot:  6 ota_lvl2         OTA app          00 12 00610000 00080000
I (124) boot: End of partition table
I (129) boot: Defaulting to factory image
I (133) boot_comm: chip revision: 1, min. application chip revision: 0
I (140) esp_image: segment 0: paddr=0x00010020 vaddr=0x3f400020 size=0x56244 (352836) map
I (284) esp_image: segment 1: paddr=0x0006626c vaddr=0x3ffbdb60 size=0x066e4 ( 26340) load
I (295) esp_image: segment 2: paddr=0x0006c958 vaddr=0x40080000 size=0x00404 (  1028) load
0x40080000: _WindowOverflow4 at c:\users\simon.thiebaut\documents\projets\zs500_v3\zpce_embedded_zs500_ihm\build/../esp-idf/components/freertos/xtensa_vectors.S:1778

I (296) esp_image: segment 3: paddr=0x0006cd64 vaddr=0x40080404 size=0x032b4 ( 12980) load
I (307) esp_image: segment 4: paddr=0x00070020 vaddr=0x400d0020 size=0xf61c0 (1008064) map
0x400d0020: _stext at ??:?

I (694) esp_image: segment 5: paddr=0x001661e8 vaddr=0x400836b8 size=0x160a4 ( 90276) load
0x400836b8: xMBPortEventPost at c:\users\simon.thiebaut\documents\projets\zs500_v3\zpce_embedded_zs500_ihm\build/../esp-idf/components/freemodbus/port/portevent.c:89 (discriminator 1)

I (748) boot: Loaded app from partition at offset 0x10000
I (748) boot: Disabling RNG early entropy source...
I (749) cpu_start: Pro cpu up.
I (752) cpu_start: Application information:
I (757) cpu_start: Project name:     zpce_embedded_zs500_ihm
I (763) cpu_start: App version:      V72I6.0.0-2-g74183db5b-dirty
I (770) cpu_start: Compile time:     Aug  4 2020 09:08:45
I (776) cpu_start: ELF file SHA256:  4b09f0aacc1d6396...
I (782) cpu_start: ESP-IDF:          v4.1-rc-10-g70675c7d1
I (788) cpu_start: Starting app cpu, entry point is 0x400815ec
0x400815ec: call_start_cpu1 at c:\users\simon.thiebaut\documents\projets\zs500_v3\zpce_embedded_zs500_ihm\build/../esp-idf/components/esp32/cpu_start.c:271
I (0) cpu_start: App cpu up.
I (799) heap_init: Initializing. RAM available for dynamic allocation:
I (806) heap_init: At 3FFAFF10 len 000000F0 (0 KiB): DRAM
I (812) heap_init: At 3FFB6388 len 00001C78 (7 KiB): DRAM

I (818) heap_init: At 3FFB9A20 len 00004108 (16 KiB): DRAM
I (824) heap_init: At 3FFBDB5C len 00000004 (0 KiB): DRAM
I (830) heap_init: At 3FFD7170 len 00008E90 (35 KiB): DRAM
I (836) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (843) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (849) heap_init: At 4009975C len 000068A4 (26 KiB): IRAM
I (855) cpu_start: Pro cpu start user code
I (874) spi_flash: detected chip: gd
I (875) spi_flash: flash io: dio
I (875) cpu_start: Starting scheduler on PRO CPU.
I (0) cpu_start: Starting scheduler on APP CPU.
I (887) STARTUP: FW VERSION 6.0.0

Other items if possible

  • good behavior (v4.1-dev-1100-g18d435e26)
    • sdkconfig attached
    • elf file attached
    • map file attached (you can see the mislocated function/constants that should be in IRAM/DRAM from our point of view)
  • bad behavior (v4.1-rc-10-g70675c7d1)
    • sdkconfig attached
    • elf file attached
    • map file attached (you can see the mislocated function/constants that should be in IRAM/DRAM from our point of view)

4.1-dev.zip
4.1-rc.zip

I've also checked tickets #5072 and #4049, but the do not present exactly the same symptoms.
Also, please let us know if we too the wrong branch to upgrade and get the fix of wifi leak, maybe it has an impact...

Thank you for your help.

Best regards,

Simon

@github-actions github-actions bot changed the title Modbus Guru Meditation Error: Core 1 panic'ed (Cache disabled but cached memory region accessed) Modbus Guru Meditation Error: Core 1 panic'ed (Cache disabled but cached memory region accessed) (IDFGH-3775) Aug 4, 2020
@Alvin1Zhang
Copy link
Collaborator

Thanks for the detailed report and sorry for the inconvenience, we will look into.

@alisitsyn
Copy link
Collaborator

alisitsyn commented Aug 5, 2020

Hi @simon-thiebaut,

Thanks for your issue. I investigated the issue and files you provided.

  1. You are using some modification to ESP-IDF. v4.1-rc-10-g70675c7d1 is not an official ESP-IDF version (no commit 70675c7 in official repo) this may cause some inconsistency during investigation.
  2. The modbus port code and other functions in your *.elf file:
    The functions below you mentioned in issue description are located in iram segment and placed there conditionally by the linker.lf file and do not cause the issues:
uart_hal_read_rxfifo in section .iram0.text
uart_hal_write_txfifo in section .iram0.text

I investigated some uart driver, modbus port functions and other as well as their references to .flash.text segment:

uart_hal_rxfifo_rst @0x4008bbf0 (.iram0.text) -> __assert_func @0x400e04f8 (.flash.text)
uart_hal_write_txfifo @0x4008bcac (.iram0.text) -> __assert_func @0x400e04f8 (.flash.text)
uart_hal_read_rxfifo @0x4008bd24 (.iram0.text) -> __assert_func @0x400e04f8 (.flash.text)

xMBMasterRTUTimerExpired @0x40082a70 (.iram0.text) -> __assert_func @0x400e04f8 (.flash.text)
vTimerGroupIsr @0x40082b80 (.iram0.text) -> __assert_func @0x400e04f8 (.flash.text)
vMBMasterPortTimersDisable @0x40082bb0 (.iram0.text) -> timer_pause @0x400d9fbc (.flash.text)
vMBMasterPortTimersDisable @0x40082bb0 (.iram0.text) -> timer_set_counter_value @0x400d9d9c (.flash.text)
vMBMasterPortTimersDisable @0x40082bb0 (.iram0.text) -> timer_disable_intr @0x400da758 (.flash.text)

The references above are ok and do not cause any issue.
Investigation of the timer handler assembler xMBMasterRTUTimerExpired showed it does not have any reference to flash memory.
However I found the issue with uart_ll_is_tx_idle() symbol, the function is inline in soc but this symbol is located in .flash.text segment as per your elf file (v4.1-rc_zpce_embedded_zs500_ihm.elf).
This function was called from interrupt handler uart.c::uart_rx_intr_handler_default located in IRAM and caused an issue in your application.
The issue is confirmed and happens when optimization for size is selected and (crosstool-NG esp-2020r2) 8.2.0 is used.
I continue to work with the issue and will update you with new information.

@simon-thiebaut
Copy link
Author

simon-thiebaut commented Aug 5, 2020

Hi @alisitsyn ,

Thank you for the quick reply.

Indeed my mistake I forgot to mention that we forked the repo. But as it was working before rebasing to release/v4.1 head, I didn't find relevant to state it. However this might be a combination of our changes + SDK changes that create the problem...

Also from your answer, I've been misreading the map files, actually I didn't think to check the elf.
So I've tried your suggestion, to build using latest offical RC (v4.1-rc b3b7fcf, is that correct ?) => still the issue in the elf, uart_ll_is_tx_idle is still in .flash.text though it should be inlined....

By the way, I've found this comment on this link => https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/peripherals/spi_master.html

Please do take care that the optimization level may affect the compiler behavior of inline, or transforming some code into lookup table in the const data, etc.

Could it be a wrong compiler option in the SDK config that optimizes badly this function location ?
Unless I'm wrong, an inlined function name should not appear in the elf, right ?

I'll continue checking compiler options.

Thank you for your feedback.

@alisitsyn
Copy link
Collaborator

alisitsyn commented Aug 5, 2020

@simon-thiebaut,

So I've tried your suggestion, to build using latest offical RC (v4.1-rc b3b7fcf, is that correct ?) => still the issue in the elf, uart_ll_is_tx_idle is still in .flash.text though it should be inlined....

Yes, this issue exists in even latest master when optimization for size is selected. The correct commit for v4.1-rc is
b3b7fcf. The compiler does not inline the functions always and I think it is known aspect of compiler not an issue with the compiler. I verify this against all release versions.

Unless I'm wrong, an inlined function name should not appear in the elf, right ?

Yes, the inlined function should not appear as a symbol in elf.

For quick fix I propose to change the uart_ll_is_tx_idle as below (it could work even without always_inline):
components/soc/esp32/include/hal/uart_ll.h :

static __attribute__((always_inline)) inline bool uart_ll_is_tx_idle(uart_dev_t *hw)
{
    return ((hw->status.txfifo_cnt == 0) && (hw->status.st_utx_out == 0));
}

In this case it is inlined correctly by both compilers. Let me know if the temporary fix above works for you.
The issue should be resolved reliably to place the function into IRAM conditionally but the same may be happened with other hal(ll*) functions called from UART handler. I try to find a good way to do it withing the scope of current driver architecture.
The resolution of the issue will take some time to complete workflow. I will inform you about progress.

AND Thanks for good finding...

@simon-thiebaut
Copy link
Author

@alisitsyn thank you for these confirmations, I was going crazy trying to put IRAM everywhere without success...

So far with your fix, I don't face the problem anymore after more than 1 hour (whereas before problem was showing up after roughly 4/5 minutes).

I need to do more test, but this is encouraging.
And I agree with you, function in IRAM shall be always put in RAM, whatever the compiler options are.
I understand this is going to take time to fix it. We've committed this patch in our fork, and run our project validation with it, no worry so. Hopefully this will be enough...

AND Thanks for good finding...

AND Thanks for good support 👍

Shall I close the ticket or will you do it after official fix delivery ?

Thank you again.

@alisitsyn
Copy link
Collaborator

@simon-thiebaut,

Good to hear it helps even as temporary solution. Let me know if you have some further results. Please leave the ticket opened, it will be closed automatically once fix is merged.

👍

@AshUK
Copy link
Contributor

AshUK commented Jan 13, 2021

We recently experienced this issue with both the UART (used for modbus) and the RMT peripheral using IRAM on v4.2. We require both of these peripherals be active during an OTA download i.e flash writes. Our project has not got to state yet where we need to fork the IDF to apply the workarounds mentioned above as we are looking to avoid this as much as possible.

Currently changing the compiler settings to -Og instead of -Os resolves the problem. However, this does result in significantly larger binary images which are downloaded on metered connections and there is currently no support for bindiff updates. Also now not everything will fit inside the IRAM and storing application code in IRAM is useful as it can free up RAM used so that you can run BLE, WIFI and TLS simultaneously. TLDR: -Os solution cascades to other problems

Oh, how I just love how embedded development is full of trade-offs, compromises and random limitations you would never could have foreseen, its what I wake up in the morning for.

@AshUK
Copy link
Contributor

AshUK commented Mar 31, 2021

Still present in v4.3-beta-2

@alisitsyn
Copy link
Collaborator

Can try to ask about merge but I do not control it. Unfortunately the MR is still not merged.

@AshUK
Copy link
Contributor

AshUK commented Mar 31, 2021

In the mean time can you share a patch that works with 4.3-beta-2 ?

I tried cherry picking 40636d6 but it has conflicts.

@alisitsyn
Copy link
Collaborator

Yes, sure I can provide the patches:
bugfix_uart_handler_call_inline_func v4.2 patch
bugfix_uart_handler_call_inline_func v4.3 patch

Let me know if you have any issues.

@espressif-bot espressif-bot added the Status: Done Issue is done internally label Apr 1, 2021
@AxelLin
Copy link
Contributor

AxelLin commented Apr 21, 2021

v4.3 fixed by d761226

@Alvin1Zhang
Copy link
Collaborator

@AxelLin Thanks for sharing the updates, fix on release/4.1 is available a4c72cf, feel free to reopen.

@RuslanSpartak
Copy link

Hello ESP32 experts. I have kind a problem, when I initialize I2C driver on core 1 with i2c_driver_install(i2c_port, conf.mode, I2C_MASTER_RX_BUF_DISABLE, I2C_MASTER_TX_BUF_DISABLE, 0); my firmware works, but I sometimes had problem with reading from I2C. When I use i2c_driver_install(i2c_port, conf.mode, I2C_MASTER_RX_BUF_DISABLE, I2C_MASTER_TX_BUF_DISABLE, (ESP_INTR_FLAG_IRAM | ESP_INTR_FLAG_LEVEL3)); I constantly receive Guru Meditation Error: Core 0 panic'ed (Cache disabled but cached memory region accessed). Firmware uses esp-idf version 4.3. With sdk version 4.2.1 it is working without any problem. So are you sure that sdk v4.3 is stable?

@AxelLin
Copy link
Contributor

AxelLin commented Jul 8, 2021

@RuslanSpartak
The fix was in uart which is not related to i2c driver.
I think you should create a new issue instead. (and remember to follow the issue template for required information).

@alisitsyn
Copy link
Collaborator

@RuslanSpartak,
As an addition to comment above: The current issue is related to UART but your issue is about I2C and looking very similar (attributes in low level functions of i2c driver). I would propose you to open separated issue against I2C driver. This will help us to handle this issue appropriately. Please also provide the log and let us know what kind of compiler optimization level you are using in your project.
Thanks.

@AxelLin
Copy link
Contributor

AxelLin commented Jul 12, 2021

I would propose you to open separated issue against I2C driver. This will help us to handle this issue appropriately. Please also provide the log and let us know what kind of compiler optimization level you are using in your project.

It's already reported in #7249

@alisitsyn
Copy link
Collaborator

@AxelLin,

Ok, Thank you for update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Done Issue is done internally
Projects
None yet
Development

No branches or pull requests

7 participants