Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVS write crashes with "Cache disabled but cached memory region accessed" (IDFGH-8634) #10079

Closed
3 tasks done
JustinOng opened this issue Oct 30, 2022 · 17 comments
Closed
3 tasks done
Assignees
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally Type: Bug bugs in IDF

Comments

@JustinOng
Copy link
Contributor

Answers checklist.

  • I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

IDF version.

ESP-IDF v4.4.2-378-g9269a536ac

Operating System used.

Linux

How did you build your project?

Command line with idf.py

If you are using Windows, please specify command line type.

No response

Development Kit.

Custom board with ESP32-WROOM-32E-N8

Power Supply used.

External 3.3V

What is the expected behavior?

The NVS write should succeed.

What is the actual behavior?

Writing to NVS causes a (rare) crash

Steps to reproduce.

A little hard to share my full environment because it involves external hardware peripherals and I'm not sure what are the trigger conditions.

Debug Logs.

Guru Meditation Error: Core  0 panic'ed (Cache disabled but cached memory region accessed). 

Core  0 register dump:
PC      : 0x400fa6d4  PS      : 0x00060034  A0      : 0x8008220c  A1      : 0x3ffb0b50  
0x400fa6d4: i2c_hal_get_intsts_mask at /opt/esp/idf/components/hal/i2c_hal.c:68

A2      : 0x3ffb8648  A3      : 0x3ffb1484  A4      : 0x00000000  A5      : 0x3ffb69e0  
A6      : 0x3ffb6a30  A7      : 0x3ffb69f0  A8      : 0x80083fac  A9      : 0x00000000  
A10     : 0x3ffb1484  A11     : 0x3ffb0b50  A12     : 0x00000010  A13     : 0x00060023  
A14     : 0x40085cec  A15     : 0x3ffb6bf0  SAR     : 0x00000020  EXCCAUSE: 0x00000007  
0x40085cec: flash_end_flush_cache at /opt/esp/idf/components/spi_flash/esp_flash_api.c:166

EXCVADDR: 0x00000000  LBEG    : 0x4000c2e0  LEND    : 0x4000c2f6  LCOUNT  : 0xffffffff  


Backtrace: 0x400fa6d1:0x3ffb0b50 0x40082209:0x3ffb0b90 0x40082bed:0x3ffb0bb0 0x4000bfed:0x3ffb6ab0 0x40085889:0x3ffb6ac0 0x40085c47:0x3ffb6af0 0x4008eb3a:0x3ffb6b20 0x40085d19:0x3ffb6b40 0x4008649b:0x3ffb6b60 0x400e1dbf:0x3ffb6bb0 0x400e33f2:0x3ffb6bd0 0x400fa4a9:0x3ffb6bf0 0x400e3a2e:0x3ffb6c20 0x400e3c67:0x3ffb6c40 0x400e2e1d:0x3ffb6cb0 0x400e31fa:0x3ffb6d20 0x400fa2ab:0x3ffb6d40 0x400e221e:0x3ffb6d70 0x400e2241:0x3ffb6da0 0x400d5f13:0x3ffb6dc0 0x400d6015:0x3ffb6e40 0x400faf00:0x3ffb6f70 0x4008b6f5:0x3ffb6f90
0x400fa6d1: i2c_hal_disable_intr_mask at ??:?

0x40082209: shared_intr_isr at /opt/esp/idf/components/esp_hw_support/intr_alloc.c:407

0x40082bed: _xt_lowint1 at /opt/esp/idf/components/freertos/port/xtensa/xtensa_vectors.S:1111

0x40085889: is_page_mapped_in_cache at /opt/esp/idf/components/spi_flash/flash_mmap.c:497

0x40085c47: spi_flash_check_and_flush_cache at /opt/esp/idf/components/spi_flash/flash_mmap.c:516

0x4008eb3a: memspi_host_flush_cache at /opt/esp/idf/components/spi_flash/memspi_host_driver.c:134

0x40085d19: flash_end_flush_cache at /opt/esp/idf/components/spi_flash/esp_flash_api.c:176

0x4008649b: esp_flash_write at /opt/esp/idf/components/spi_flash/esp_flash_api.c:929 (discriminator 1)

0x400e1dbf: esp_partition_write_raw at /opt/esp/idf/components/spi_flash/partition.c:517

0x400e33f2: nvs::NVSPartition::write_raw(unsigned int, void const*, unsigned int) at /opt/esp/idf/components/nvs_flash/src/nvs_partition.cpp:50

0x400fa4a9: nvs::Page::alterEntryState(unsigned int, nvs::Page::EntryState) at /opt/esp/idf/components/nvs_flash/src/nvs_page.cpp:821 (discriminator 2)

0x400e3a2e: nvs::Page::writeEntry(nvs::Item const&) at /opt/esp/idf/components/nvs_flash/src/nvs_page.cpp:109

0x400e3c67: nvs::Page::writeItem(unsigned char, nvs::ItemType, char const*, void const*, unsigned int, unsigned char) at /opt/esp/idf/components/nvs_flash/src/nvs_page.cpp:237

0x400e2e1d: nvs::Storage::writeItem(unsigned char, nvs::ItemType, char const*, void const*, unsigned int) at /opt/esp/idf/components/nvs_flash/src/nvs_storage.cpp:358 (discriminator 4)

0x400e31fa: nvs::NVSHandleSimple::set_typed_item(nvs::ItemType, char const*, void const*, unsigned int) at /opt/esp/idf/components/nvs_flash/src/nvs_handle_simple.cpp:29

0x400fa2ab: int nvs::NVSHandle::set_item<int>(char const*, int) at /opt/esp/idf/components/nvs_flash/include/nvs_handle.hpp:270

0x400e221e: int nvs_set<int>(unsigned int, char const*, int) at /opt/esp/idf/components/nvs_flash/src/nvs_api.cpp:349

0x400e2241: nvs_set_i32 at /opt/esp/idf/components/nvs_flash/src/nvs_api.cpp:374

0x400d5f13: sm::markHome(signed char) at /workspaces/project/main/sm.hpp:61
 (inlined by) processLine(char*) at /workspaces/project/main/main.cpp:95

0x400d6015: app_main at /workspaces/project/main/main.cpp:132

0x400faf00: main_task at /opt/esp/idf/components/freertos/port/port_common.c:141 (discriminator 2)

0x4008b6f5: vPortTaskWrapper at /opt/esp/idf/components/freertos/port/xtensa/port.c:131

More Information.

Based on the crash, it appears to be related to the I2C peripheral, which is enabled with

i2c_driver_install(I2C_PORT, I2C_MODE_MASTER, 0, 0, ESP_INTR_FLAG_SHARED | ESP_INTR_FLAG_IRAM)

Is the issue here that i2c_isr_handler_default calls functions like i2c_hal_get_intsts_mask which arn't placed in IRAM (and may not be inlined)?

static void IRAM_ATTR i2c_isr_handler_default(void *arg)
{
    i2c_obj_t *p_i2c = (i2c_obj_t *) arg;
    int i2c_num = p_i2c->i2c_num;
    // Interrupt protection.
    // On C3 and S3 targets, the I2C may trigger a spurious interrupt,
    // in order to detect these false positive, check the I2C's hardware interrupt mask
    uint32_t int_mask;
    i2c_hal_get_intsts_mask(&(i2c_context[i2c_num].hal), &int_mask);
    ....
@JustinOng JustinOng added the Type: Bug bugs in IDF label Oct 30, 2022
@espressif-bot espressif-bot added the Status: Opened Issue is new label Oct 30, 2022
@github-actions github-actions bot changed the title NVS write crashes with "Cache disabled but cached memory region accessed" NVS write crashes with "Cache disabled but cached memory region accessed" (IDFGH-8634) Oct 30, 2022
@kriegste
Copy link

Are you compiling with -Os (optimization for size)? I once had a similar problem with the code for RMT.

You could try to force inline all those low level functions called by the interrupt handler. For example i2c_ll_disable_intr_mask() is declared as:

static inline void i2c_ll_disable_intr_mask(i2c_dev_t *hw, uint32_t mask) {

"inline" alone does not force inlining, i.e. when compiling with -Os the compiler still keeps the function to save space. This is a bit misleading. The solution is to use

__attribute__((always_inline))

instead of just

inline

Try it and tell us about the results.

@JustinOng
Copy link
Contributor Author

JustinOng commented Oct 31, 2022

Thanks for the pointer. I actually ran into your issue #9487 which lead me down the same conclusion since the linked PR only touched RMT code.

Heres code that triggers the crash:

#include <stdio.h>
#include <string.h>

#include "driver/i2c.h"
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "nvs_flash.h"

void task1() {
  while (1) {
    uint8_t i = 0x01;
    i2c_cmd_handle_t cmd = i2c_cmd_link_create();
    i2c_master_start(cmd);
    i2c_master_write_byte(cmd, (i << 1) | I2C_MASTER_WRITE, 1);
    i2c_master_stop(cmd);
    i2c_master_cmd_begin(I2C_NUM_0, cmd, 100 / portTICK_RATE_MS);
    i2c_cmd_link_delete(cmd);
  }
}

void task2() {
  while (1) {
    uint32_t crashme;
    esp_flash_read(NULL, (void *)&crashme, 0, sizeof(crashme));
  }
}

void app_main(void) {
  i2c_config_t conf;
  memset(&conf, 0, sizeof(conf));
  conf.mode = I2C_MODE_MASTER;
  conf.sda_io_num = 4;
  conf.scl_io_num = 5;
  conf.sda_pullup_en = GPIO_PULLUP_ENABLE;
  conf.scl_pullup_en = GPIO_PULLUP_ENABLE;
  conf.master.clk_speed = 100000;

  i2c_param_config(I2C_NUM_0, &conf);
  i2c_driver_install(I2C_NUM_0, conf.mode, 0, 0, ESP_INTR_FLAG_SHARED | ESP_INTR_FLAG_IRAM);

  xTaskCreatePinnedToCore(task1, "task1", 8192, NULL, 10, NULL, 0);
  xTaskCreatePinnedToCore(task2, "task2", 8192, NULL, 10, NULL, 1);
}

Declaring these two functions accordingly dosen't seem to affect the crash, whether compiled with Og, Os or O2.

void __attribute__((always_inline)) inline i2c_hal_get_intsts_mask(i2c_hal_context_t *hal, uint32_t *mask)
{
    *mask = i2c_ll_get_intsts_mask(hal->dev);
}

static __attribute__((always_inline)) inline uint32_t i2c_ll_get_intsts_mask(i2c_dev_t *hw)
{
    return hw->int_status.val;
}

@mahavirj
Copy link
Member

mahavirj commented Nov 2, 2022

@JustinOng Do you have CONFIG_FREERTOS_PLACE_FUNCTIONS_INTO_FLASH enabled in your case? Can you please share your sdkconfig? (Relevant: #9198)

@JustinOng
Copy link
Contributor Author

CONFIG_FREERTOS_PLACE_FUNCTIONS_INTO_FLASH is not enabled. I have published the full project at https://github.com/JustinOng/esp-idf-10079

@mahavirj
Copy link
Member

mahavirj commented Nov 2, 2022

@JustinOng

Is the issue here that i2c_isr_handler_default calls functions like i2c_hal_get_intsts_mask which arn't placed in IRAM (and may not be inlined)?

You were right here!

Can you please try following patch on your side?

diff --git components/hal/i2c_hal.c components/hal/i2c_hal.c
index db2dd1a7e1..aec4e604cf 100644
--- components/hal/i2c_hal.c
+++ components/hal/i2c_hal.c
@@ -64,11 +64,6 @@ void i2c_hal_disable_intr_mask(i2c_hal_context_t *hal, uint32_t mask)
     i2c_ll_disable_intr_mask(hal->dev, mask);
 }
 
-void i2c_hal_get_intsts_mask(i2c_hal_context_t *hal, uint32_t *mask)
-{
-    *mask = i2c_ll_get_intsts_mask(hal->dev);
-}
-
 void i2c_hal_set_fifo_mode(i2c_hal_context_t *hal, bool fifo_mode_en)
 {
     i2c_ll_set_fifo_mode(hal->dev, fifo_mode_en);
diff --git components/hal/i2c_hal_iram.c components/hal/i2c_hal_iram.c
index 080d19c919..557a58c79b 100644
--- components/hal/i2c_hal_iram.c
+++ components/hal/i2c_hal_iram.c
@@ -67,3 +67,8 @@ void i2c_hal_get_txfifo_cnt(i2c_hal_context_t *hal, uint32_t *len)
 {
     *len =  i2c_ll_get_txfifo_len(hal->dev);
 }
+
+void i2c_hal_get_intsts_mask(i2c_hal_context_t *hal, uint32_t *mask)
+{
+    *mask = i2c_ll_get_intsts_mask(hal->dev);
+}

@JustinOng
Copy link
Contributor Author

Yep, that stops the PoC from crashing, thoughI'm not sure if the rest of the ISR has to be adjusted as well.

@kriegste
Copy link

kriegste commented Nov 3, 2022

i2c_isr_handler_default() calls i2c_master_cmd_begin_static() which calls i2c_ll_read_rxfifo(), i2c_cmd_is_single_byte(), i2c_ll_write_txfifo(), i2c_ll_write_cmd_reg(), i2c_ll_master_enable_tx_it() and i2c_ll_trans_start(), none of which are in IRAM or FORCE_INLINE_ATTR. I did not look any further.
Could this become a problem when compiled with -Os?

@mahavirj
Copy link
Member

mahavirj commented Nov 3, 2022

@JustinOng @kriegste

I did not find any other symbols from I2C ISR handler that would need IRAM placement. You may use following instructions to cross-check for any symbols that are placed in flash section but have callers in IRAM:

  • Enable CONFIG_COMPILER_DUMP_RTL_FILES
  • Build your project
  • Run following command
esp-idf/tools/ci/check_callgraph.py  --elf-file build/<project_name>.elf  --rtl-dir build/ find-refs  --from-sections=.iram0.text --to=.flash.text --exit-code
  • Look out for I2C symbols from the output of the above script

Alternatively, project .map file can also help to find address of the symbol (IRAM vs Flash)

@chipweinberger
Copy link
Contributor

I am also hitting this, see #10104

The Iram patch fixes my crash! 🎉🎉

@JustinOng
Copy link
Contributor Author

@mahavirj You're right, /opt/esp/idf/tools/ci/check_callgraph.py --elf-file build/app-template.elf --rtl-dir build/ find-refs --from-sections=.iram0.text --to=.flash.text --exit-code | grep i2c returns nothing.

@chipweinberger
Copy link
Contributor

@mahavirj do we have a timeline for getting this merged?

It's too bad it did not make 4.4.3...

@espressif-bot espressif-bot added Status: Selected for Development Issue is selected for development Status: Reviewing Issue is being reviewed and removed Status: Opened Issue is new Status: Selected for Development Issue is selected for development labels Nov 11, 2022
@mahavirj
Copy link
Member

@chipweinberger By the time this issue was filed, v4.4.3 testing had almost finished. Sorry but this fix shall be integrated soon and officially would come with 4.4.4.

@chipweinberger
Copy link
Contributor

No problem.

When it gets merged into the v4.4 branch, please let us know!

@AxelLin
Copy link
Contributor

AxelLin commented Dec 1, 2022

... but this fix shall be integrated soon and officially would come with 4.4.4.

@mahavirj
The fix is not yet in master now.
The fix looks not complex, not sure why it takes such a long time to fix the issue. (In the mean time, there are similar issues reported)
BTW, v4.3 branch also needs the fix.

@igrr
Copy link
Member

igrr commented Dec 2, 2022

We have reviewed the driver, and found that the fix is not required for the master branch (v5.1-dev) as the HAL layer has been refactored there.
The fix is required for release/v5.0, the merge request is ready. v5.0 has already been tagged, so this fix will be among the fixes we merge immediately after v5.0 release is done.
The fix for release/v4.4 has been merged a few hours ago. This issue will be closed automatically when the commit gets synced to Github.
The backport to release/v4.3 will be created soon.

@espressif-bot espressif-bot added Resolution: Done Issue is done internally Status: Done Issue is done internally Status: Reviewing Issue is being reviewed and removed Status: Reviewing Issue is being reviewed labels Dec 5, 2022
@espressif-bot espressif-bot removed the Status: Done Issue is done internally label Dec 8, 2022
@AxelLin
Copy link
Contributor

AxelLin commented Dec 16, 2022

v4.3 fix: aac5297

@Alvin1Zhang
Copy link
Collaborator

Thanks for reporting, feel free to reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally Type: Bug bugs in IDF
Projects
None yet
Development

No branches or pull requests

9 participants