Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTA update panic when flash encryption is enabled (IDFGH-3512) #5467

Closed
AxelLin opened this issue Jun 18, 2020 · 24 comments
Closed

OTA update panic when flash encryption is enabled (IDFGH-3512) #5467

AxelLin opened this issue Jun 18, 2020 · 24 comments

Comments

@AxelLin
Copy link
Contributor

AxelLin commented Jun 18, 2020

Environment

  • Module or chip used: ESP32-WROOM-32
  • IDF version: v4.2-dev-1905-g625bd5eb1806
  • Build System: idf.py
  • Compiler version: xtensa-esp32-elf-gcc (crosstool-NG esp-2020r2) 8.2.0
  • Operating System: Linux
  • Using an IDE?: NO
  • Power Supply: USB

Problem Description

I (44614) esp_https_ota: Starting OTA...
I (44616) esp_https_ota: Writing to partition subtype 17 at offset 0x260000
Guru Meditation Error: Core 0 panic'ed (IllegalInstruction). Exception was unhandled.
Memory dump at 0x40197338: 17193723 00909990 7641dc07
0x40197338: rcUpdateRxDone at ??:?

Core 0 register dump:
PC : 0x4019733f PS : 0x00060f30 A0 : 0x800923da A1 : 0x3ffc66a0
0x4019733f: rcUpdateRxDone at ??:?

A2 : 0x3ffbab70 A3 : 0x0000007f A4 : 0x00000000 A5 : 0x00000001
A6 : 0xffffffff A7 : 0x00000001 A8 : 0x0000003b A9 : 0x000000ab
A10 : 0x0000003b A11 : 0x0000003b A12 : 0x00000006 A13 : 0x00000000
A14 : 0x00000000 A15 : 0x00000000 SAR : 0x00000020 EXCCAUSE: 0x00000000
EXCVADDR: 0x00000000 LBEG : 0x40082371 LEND : 0x40082379 LCOUNT : 0x00000027
0x40082371: esp_timer_impl_get_counter_reg at /home/axel/esp/esp-idf/components/esp_timer/src/esp_timer_impl_lac.c:126 (discriminator 2)

0x40082379: esp_timer_impl_get_counter_reg at /home/axel/esp/esp-idf/components/esp_timer/src/esp_timer_impl_lac.c:136 (discriminator 1)

Backtrace:0x4019733c:0x3ffc66a0 0x400923d7:0x3ffc66c0 0x400924b1:0x3ffc66e0 0x40090837:0x3ffc6720
0x4019733c: rcUpdateRxDone at ??:?

0x400923d7: ppRxProtoProc at ??:?

0x400924b1: ppRxPkt at ??:?

0x40090837: ppTask at ??:?

ELF file SHA256: b23a527b4da3c8f2

Rebooting...
ets Jun 8 2016 00:22:57

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:4
load:0x3fff0034,len:6020
load:0x40078000,len:13524
ho 0 tail 12 room 4
load:0x40080400,len:3768
0x40080400: _init at ??:?

entry 0x40080648

Expected Behavior

It should just work.

Actual Behavior

panic.

Steps to reproduce

Just do OTA firmware update.

Code to reproduce this issue

I cannot share my application code, but what I did is the same as simple_ota_example or advanced_https_ota which calls esp_https_ota().

I have enabled flash encryption in development mode.
The firmware I'm updating is about 1.2M size.
Also note OTA was working fine when flash encryption is disabled.

@github-actions github-actions bot changed the title OTA update panic when flash encryption is enabled OTA update panic when flash encryption is enabled (IDFGH-3512) Jun 18, 2020
@AxelLin
Copy link
Contributor Author

AxelLin commented Jun 18, 2020

I got below messages if I download image to ota_1 before start testing ota:

I (27921) esp_https_ota: Starting OTA...
I (27922) esp_https_ota: Writing to partition subtype 17 at offset 0x260000
ets Jun 8 2016 00:22:57

rst:0x8 (TG1WDT_SYS_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:4
load:0x3fff0034,len:6020
load:0x40078000,len:13524
ho 0 tail 12 room 4
load:0x40080400,len:3768
0x40080400: _init at ??:?

entry 0x40080648
W (183) boot.esp32: PRO CPU has been reset by WDT.
W (183) boot.esp32: WDT reset info: PRO CPU PC=0x40087725
0x40087725: panic_print_char at /home/axel/esp/esp-idf/components/esp_system/panic.c:71

W (185) boot.esp32: WDT reset info: APP CPU PC=0x400879e6
0x400879e6: panic_handler at /home/axel/esp/esp-idf/components/esp_system/port/panic_handler.c:487 (discriminator 1)

@AxelLin
Copy link
Contributor Author

AxelLin commented Jun 18, 2020

I just tried simple_ota_example with SECURE_FLASH_ENC_ENABLED=y, the panic is reproduceable but with different exception.

I (3747) simple_ota_example: Starting OTA example
I (4027) esp_https_ota: Starting OTA...
I (4027) esp_https_ota: Writing to partition subtype 16 at offset 0x110000
Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled.

Core 0 register dump:
PC : 0x4011bf7f PS : 0x00060f30 A0 : 0x8008e832 A1 : 0x3ffc19c0
0x4011bf7f: ic_csi_set_config at ??:?

A2 : 0x3ffb7b34 A3 : 0x00000000 A4 : 0x3ffbb588 A5 : 0x00400000
A6 : 0x00000009 A7 : 0x00000080 A8 : 0x3ffbb5c4 A9 : 0x00000062
A10 : 0x3ffb1608 A11 : 0x004ea5b9 A12 : 0x00000000 A13 : 0x3ffbb5ec
A14 : 0x02000000 A15 : 0x00000001 SAR : 0x00000017 EXCCAUSE: 0x0000001c
EXCVADDR: 0x00400020 LBEG : 0x4000c2e0 LEND : 0x4000c2f6 LCOUNT : 0xffffffff

Backtrace:0x4011bf7c:0x3ffc19c0 0x4008e82f:0x3ffc1a20 0x4008faca:0x3ffc1a50 0x4008c1c4:0x3ffc1a70 0x400901dd:0x3ffc1b00 0x400877b1:0x3ffc1b30
0x4011bf7c: ic_csi_set_config at ??:?

0x4008e82f: lmacTxFrame at ??:?

0x4008faca: ppProcessTxQ at ??:?

0x4008c1c4: ieee80211_output_process at ??:?

0x400901dd: ppTask at ??:?

0x400877b1: vPortTaskWrapper at /home/axel/esp/esp-idf/components/freertos/xtensa/port.c:143

ELF file SHA256: 984948d004f6930c

Rebooting...
ets Jun 8 2016 00:22:57

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:4
load:0x3fff0034,len:7484
load:0x40078000,len:13956
load:0x40080400,len:4000
0x40080400: _init at ??:?

@shubhamkulkarni97
Copy link
Contributor

Hi @AxelLin,
I tried running simple_ota_example, with a firmware of size 860K. It worked fine for me and didn't see find any crash.

Is there any change in configuration which is causing crash? Can you share sdkconfig file?

@AxelLin
Copy link
Contributor Author

AxelLin commented Jun 18, 2020

Hi @shubhamkulkarni97
Attached sdkconfig for simple_ota_example.
My change is mainly enable flash encryption.

$ idf.py encrypted-flash monitor
then I got below:

I (4697) simple_ota_example: Starting OTA example
I (4767) esp_https_ota: Starting OTA...
I (4767) esp_https_ota: Writing to partition subtype 16 at offset 0x110000
Guru Meditation Error: Core 0 panic'ed (LoadStoreError). Exception was unhandled.

Core 0 register dump:
PC : 0x4010ba1f PS : 0x00060d30 A0 : 0x00825600 A1 : 0x3ffc1810
0x4010ba1f: wifi_set_config_process at ??:?

A2 : 0x00018cc3 A3 : 0xff5f99ce A4 : 0x00000002 A5 : 0x3ffbb134
A6 : 0x0000000a A7 : 0x0000100a A8 : 0x801159e0 A9 : 0x40082708
0x40082708: _xt_user_exit at /home/axel/esp/esp-idf/components/freertos/xtensa/xtensa_vectors.S:725

A10 : 0x3ffbb13a A11 : 0x3ffc1806 A12 : 0x00000004 A13 : 0x00000013
A14 : 0x00a066ca A15 : 0x00a066ca SAR : 0x00000010 EXCCAUSE: 0x00000003
EXCVADDR: 0x40082802 LBEG : 0x4000c349 LEND : 0x4000c36b LCOUNT : 0xffffffff
0x40082802: _xt_coproc_exc at /home/axel/esp/esp-idf/components/freertos/xtensa/xtensa_vectors.S:979

Backtrace:0x4010ba1c:0x3ffc1810 0x008255fd:0x3ffc1830 |<-CORRUPTED
0x4010ba1c: wifi_set_config_process at ??:?

ELF file SHA256: abbbb2bbd0e79f8d

Rebooting...
ets Jun 8 2016 00:22:57

thanks,
Axel
sdkconfig.txt

@AxelLin
Copy link
Contributor Author

AxelLin commented Jun 18, 2020

Hi @shubhamkulkarni97
Can you reproduce the issue now with provided sdkconfig?
Just want to know if I need to provide more information for you to reproduce the issue.

@shubhamkulkarni97
Copy link
Contributor

Hi @AxelLin,
simple_ota_example worked with provided sdkconfig as well.
Can you share logs by setting log level to debug(in menuconfig)?

@AxelLin
Copy link
Contributor Author

AxelLin commented Jun 19, 2020

Hi @shubhamkulkarni97
Attached my log with debug build.
ps. I have applied 8904.zip patch from #5343

ota-debug.txt

@shubhamkulkarni97
Copy link
Contributor

@AxelLin,
We are not able to decode backtrace in the logs you shared. Can you share the logs by running idf.py monitor?

Also can you run python $IDF_PATH/components/esptool_py/esptool/espefuse.py -p PORT summary and share the output?

@AxelLin
Copy link
Contributor Author

AxelLin commented Jun 19, 2020

Hi @shubhamkulkarni97
The previous ota-debug.txt is idf.py monitor output, it seems panic with different reasons.
I attached efuse-summanry and idf-monitor output again
ota-debug-monitor.txt
efuse-summary.txt

@AxelLin
Copy link
Contributor Author

AxelLin commented Jun 19, 2020

D (8034) HTTP_CLIENT: need_read=289, byte_to_read=289, rlen=289, ridx=0
D (8034) HTTP_CLIENT: http_on_body 289
D (8034) simple_ota_example: HTTP_EVENT_ON_DATA, len=289
Guru Meditation Error: Core 0 panic'ed (IllegalInstruction). Exception was unhandled.
Memory dump at 0x4012015c: 00000206 995b6399 9d2fe60e
0x4012015c: pm_on_beacon_rx at ??:?

Core 0 register dump:
PC : 0x40120161 PS : 0x00060130 A0 : 0x80091dc4 A1 : 0x3ffc1b60
0x40120161: pm_on_beacon_rx at ??:?

A2 : 0x00000000 A3 : 0x00000ab4 A4 : 0x00019000 A5 : 0x3ffcb073
A6 : 0x3ffb7c68 A7 : 0x3ffb1610 A8 : 0x8012012f A9 : 0x3ffc1b40
A10 : 0x00000000 A11 : 0x00000000 A12 : 0x00000000 A13 : 0x00000000
A14 : 0x00000011 A15 : 0x00000000 SAR : 0x0000000f EXCCAUSE: 0x00000000
EXCVADDR: 0x00000000 LBEG : 0x4000c2e0 LEND : 0x4000c2f6 LCOUNT : 0xffffffff

Backtrace:0x4012015e:0x3ffc1b60 0x40091dc1:0x3ffc1bd0 0x40091eb1:0x3ffc1bf0 0x40090237:0x3ffc1c30 0x400877f1:0x3ffc1c60
0x4012015e: pm_on_beacon_rx at ??:?

0x40091dc1: ppRxProtoProc at ??:?

0x40091eb1: ppRxPkt at ??:?

0x40090237: ppTask at ??:?

0x400877f1: vPortTaskWrapper at /home/axel/esp/esp-idf/components/freertos/xtensa/port.c:143

ELF file SHA256: 46ec82bc772be69b

Rebooting...
ets Jun 8 2016 00:22:57

@AxelLin
Copy link
Contributor Author

AxelLin commented Jun 19, 2020

D (9126) HTTP_CLIENT: http_on_body 289
D (9126) simple_ota_example: HTTP_EVENT_ON_DATA, len=289
Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled.

Core 0 register dump:
PC : 0x4011ffe2 PS : 0x00060130 A0 : 0x80091dc4 A1 : 0x3ffc1b60
0x4011ffe2: pm_on_beacon_rx at ??:?

A2 : 0x007323cd A3 : 0x3ffcb720 A4 : 0x3ffcb7f3 A5 : 0x829b2305
A6 : 0x00000001 A7 : 0x00060023 A8 : 0x3ffb7c68 A9 : 0x3ffc1b90
A10 : 0x00000001 A11 : 0x3ffb7c6b A12 : 0x00000006 A13 : 0x3ffc1c38
A14 : 0x00000011 A15 : 0x00000000 SAR : 0x00000018 EXCCAUSE: 0x0000001c
EXCVADDR: 0x829b2305 LBEG : 0x4000c2e0 LEND : 0x4000c2f6 LCOUNT : 0xffffffff

Backtrace:0x4011ffdf:0x3ffc1b60 0x40091dc1:0x3ffc1bd0 0x40091eb1:0x3ffc1bf0 0x40090237:0x3ffc1c30 0x400877f1:0x3ffc1c60
0x4011ffdf: pm_on_beacon_rx at ??:?

0x40091dc1: ppRxProtoProc at ??:?

0x40091eb1: ppRxPkt at ??:?

0x40090237: ppTask at ??:?

0x400877f1: vPortTaskWrapper at /home/axel/esp/esp-idf/components/freertos/xtensa/port.c:143

ELF file SHA256: 46ec82bc772be69b

Rebooting...
ets Jun 8 2016 00:22:57

@AxelLin
Copy link
Contributor Author

AxelLin commented Jun 19, 2020

Just found redirect idf.monitor output does not show backtrace.
I manually copy-paste the output as attached file.
ota-debug-monitor3.txt

@AxelLin
Copy link
Contributor Author

AxelLin commented Jun 20, 2020

Hi @shubhamkulkarni97

I tried the other device and it does not hit panic.
I notice this panic device's efuse-summary looks slightly different from the other device.
(Identity fuses now display at the end of the output)

I attached the efuse summary output for comparison.
efuse-summary-good.txt: The other device which is working (for comparison)
efuse-default.txt: This is the efuse summary before enabling flash encryption (of the panic device). It looks quite normal before enabling flash encryption.
For this device, I did below commands before downloading firmware:
$ espefuse.py burn_key flash_encryption my_flash_encryption_key.bin
$ espefuse.py burn_efuse FLASH_CRYPT_CONFIG 15
$ espefuse.py burn_efuse FLASH_CRYPT_CNT 1
$ espefuse.py burn_efuse JTAG_DISABLE
$ espefuse.py burn_efuse DISABLE_DL_DECRYPT
$ espefuse.py burn_efuse DISABLE_DL_CACHE

Do you find anything wrong?

efuse-default.txt
efuse-summary-good.txt

@AxelLin
Copy link
Contributor Author

AxelLin commented Jun 22, 2020

Hi @shubhamkulkarni97
I notice if I set FREERTOS_UNICORE=y, I always got "Task watchdog got triggered". See attached log file. This is probably off-topic because FREERTOS_UNICORE is not set in my application firmware. Just FYI.
ota_freertos_unicore.txt

@shubhamkulkarni97
Copy link
Contributor

Hi @AxelLin,

EFUSE summary looks good to me in both the cases, there is only change in order summary is displayed. The commands you used to enable flash encryption are also correct.

We think there might be a race condition in hardware acceleration when flash encryption is enabled.

Can you disable all hardware accelerators in mbedtls and try performing OTA on board which resulted in a crash?

You can disable following options in sdkconfig:

  • MBEDTLS_HARDWARE_AES
  • MBEDTLS_HARDWARE_MPI
  • MBEDTLS_HARDWARE_SHA

Please share console logs after disabling mbedtls hardware acceleration.

@AxelLin
Copy link
Contributor Author

AxelLin commented Jun 22, 2020

Hi @shubhamkulkarni97

I have disabled all hw acceleratiors:

# CONFIG_MBEDTLS_ECP_RESTARTABLE is not set
# CONFIG_MBEDTLS_CMAC_C is not set
# CONFIG_MBEDTLS_HARDWARE_AES is not set
# CONFIG_MBEDTLS_HARDWARE_MPI is not set
# CONFIG_MBEDTLS_HARDWARE_SHA is not set
# CONFIG_MBEDTLS_ATCA_HW_ECDSA_SIGN is not set
# CONFIG_MBEDTLS_ATCA_HW_ECDSA_VERIFY is not set

But still got panic, see attached log
ota_disable_hw_accel.txt

@AxelLin
Copy link
Contributor Author

AxelLin commented Jun 23, 2020

Hi @shubhamkulkarni97

I run git bisec and found:

dbdce93 is the first bad commit
commit dbdce93
Author: Angus Gratton [email protected]
Date: Thu Apr 16 21:31:13 2020 +1000

spi_flash: Use per-chip flash_ops files for legacy API

Looks like when ESP32-S2 Beta support was merged, the separate files
were dropped by accident.

components/spi_flash/CMakeLists.txt | 1 +
components/spi_flash/esp32/flash_ops_esp32.c | 4 +
components/spi_flash/esp32s2/flash_ops_esp32s2.c | 3 +
components/spi_flash/flash_ops.c | 129 +----------------------
4 files changed, 11 insertions(+), 126 deletions(-)

Please check what's wrong in above commit.

Note this issue also happen on release/v4.2 branch.

BTW, seems you cannot reproduce this issue on your device.
I can help testing it if you post the fix here.

@mahavirj
Copy link
Member

@AxelLin

We have tried on multiple boards/modules on our side but so far have not been able to reproduce this issue.

eFuse summary looks pretty similar for both devices that you had provided. Ordering issue in Identity Fuses will be fixed soon in espefuse.py script.

  • Can you please confirm that this issue is observed on only single ESP32-WROOM-32 module on your side? Any specific hardware or its standard ESP32-DevKitC?
  • Do you see any other issue during firmware execution apart from crash during OTA on that specific module?
  • For commit you mentioned above, can you please try enabling CONFIG_SPI_FLASH_USE_LEGACY_IMPL and provide your observations? (Additionally enabling CONFIG_SPI_FLASH_VERIFY_WRITE and CONFIG_SPI_FLASH_LOG_FAILED_WRITE can also help)
  • Also please help to confirm if you see similar issue with release/v3.3 branch?

Thanks.

@AxelLin
Copy link
Contributor Author

AxelLin commented Jun 23, 2020

Hi @mahavirj

  • We use ESP32-WROOM-32 module on our customized board not standard ESP32-DevKitC.
    I only tested on 2 devices (I was worried that if I did something wrong I may damage more devices):
    one hit panic but the other one looks fine.

  • I don't see any other issue running on this panic module.

  • Set SPI_FLASH_USE_LEGACY_IMPL=y does not help (on master tree)
    Set SPI_FLASH_VERIFY_WRITE && SPI_FLASH_LOG_FAILED_WRITE seems does not help, I don't see more log, and it still panic.
    The console output the same as these options are not set.

  • It seems I cannot build with v3.3 with my current gcc (2020r2).
    I checkout to v3.3 branch but the build fails.
    But I confirmed relese/4.0 works.

BTW, I tried revert commit dbdce93 on current master tree, then it works no matter SPI_FLASH_USE_LEGACY_IMPL is set or not.

@AxelLin
Copy link
Contributor Author

AxelLin commented Jun 23, 2020

Hi @mahavirj @shubhamkulkarni97

I'm wondering if esp32 also needs similar fix as esp32s2 (commit bb0a95b)
By comment out esp_rom_spiflash_unlock() in spi_flash_write_encrypted_chip(), then the panic seems gone.
https://gist.github.com/AxelLin/40da2529ad386662261cc84ff9608c68

@mahavirj
Copy link
Member

@AxelLin Thanks for sharing pointers, I am checking on this, will revert back soon.

@mahavirj
Copy link
Member

@AxelLin Update: Fix that you shared looks correct. It is not safe to invoke ROM API for disabling flash write protection whilst cache is enabled (it was regression that happened sometimes back). We are in process to add this fix (and backport upto v4.1), subsequently we will also ensure to have automated test for this.

antmak pushed a commit to espressif/esptool that referenced this issue Jun 30, 2020
When using set(), categories print in a non-deterministic order.

As reported espressif/esp-idf#5467 (comment)
@AxelLin
Copy link
Contributor Author

AxelLin commented Jul 20, 2020

@mahavirj

Just FYI, this issue was closed because the fix is in master tree.
However, the fix is not yet in 4.1 and 4.2 branches now.

@mahavirj
Copy link
Member

@AxelLin

Backport fixes have been merged in v4.2 and v4.1 branches as well. Once these branches pass internal CI, they will sync to github.

projectgus pushed a commit that referenced this issue Jul 22, 2020
In commit 309376f, it seems like regression
was added to use ROM level API for disabling flash write protection. This
started random firmware crashes (on specific modules) with exception
`IllegalInstruction` during encrypted flash writes.

Fix here removes relevant ROM API call, since disabling flash write protection
is already ensured by caller of this API.

Closes #5467
projectgus pushed a commit that referenced this issue Jul 22, 2020
In commit 309376f, it seems like regression
was added to use ROM level API for disabling flash write protection. This
started random firmware crashes (on specific modules) with exception
`IllegalInstruction` during encrypted flash writes.

Fix here removes relevant ROM API call, since disabling flash write protection
is already ensured by caller of this API.

Closes #5467
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants