Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nrf5 examples stop responding while attempting thread join #2187

Closed
mspang opened this issue Aug 15, 2020 · 4 comments · Fixed by #2189
Closed

nrf5 examples stop responding while attempting thread join #2187

mspang opened this issue Aug 15, 2020 · 4 comments · Fixed by #2189
Assignees
Labels
bug Something isn't working

Comments

@mspang
Copy link
Contributor

mspang commented Aug 15, 2020

As of openthread/openthread#5299 thread joiner initiated at boot results in unresponsive device. It takes several join attempts for this to occur.

Repro (fails):

  git checkout b15c292ee56a0505d3a3904eb295b38bac71cfc9
  git submodule update --init

  cd third_party/openthread/repo
  git checkout addb1936c3e7e14e59f617751e595596980e3ec8 # fails
  cd examples/lock-app/nrf5
  make flash
  
  # thread CLI
  factoryreset

Boot and just wait. Device hangs after a few join attempts.

No Repro: (stable)

  cd third_party/openthread/repo
  git revert HEAD

  cd examples/lock-app/nrf5
  make flash

Boot and just wait. Device stable.

Log:

[00000000] <info> chip: [DL] Setup PIN discriminator not found; using default: f00
[00000000] <info> chip: [DL] Joiner Discerner: 3840
[00000000] <info> chip: [DL] Setup PIN code not found; using default: 12345678
[00000000] <info> chip: [DL] Joiner PSKd: 12345678
[00000000] <info> chip: [DL] Joiner start: No Error
[00000000] <debug> chip: [DL] Thread joiner timer triggered: No Error
<hung>

@lanyuwen @bukepo

@issue-label-bot issue-label-bot bot added the bug Something isn't working label Aug 15, 2020
@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the label bug to this issue, with a confidence of 0.94. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

@mspang
Copy link
Contributor Author

mspang commented Aug 15, 2020

Fault stack:

Level,Function,Stack Frame,Source,PC,Return Address,Stack Used
0,,32 @ 0x2001FAE8,,0x00000A60,[0x2001FB00]: 0x0008FFC2,160
0,memcpy, 8 @ 0x2001FB08,,0x0008FFBE,[0x2001FB0C]: 0x0002CC32,128
0,xQueueReceive,48 @ 0x2001FB10,queue.c:1288:5,0x0002CC2E,[0x2001FB3C]: 0x0002DE38,120
0,prvProcessReceivedCommands, 0 @ 0x2001FB40,timers.c:696:7,0x0002DE34,: 0x0002DE98, 72
0,prvTimerTask,72 @ 0x2001FB40,timers.c:644:2,0x0002DE24,[0x2001FB84]: 0x03000000, 72
0,Top of stack - return address out of range: 0x3000000,,,,,

@mspang
Copy link
Contributor Author

mspang commented Aug 15, 2020

This is a timer task stack overflow. The timer task uses 1k stack by default. Enabling configCHECK_FOR_STACK_OVERFLOW diagnoses this issue.

@mspang mspang self-assigned this Aug 15, 2020
@mspang
Copy link
Contributor Author

mspang commented Aug 16, 2020

@bukepo I believe we should move all of the code from GenericThreadStackManagerImpl_FreeRTOS<ImplClass>::OnJoinerTimer off the timer task. Logging uses a 256 byte stack buffer.

In the meantime I will post a patch to increase the stack size.

mspang added a commit to mspang/connectedhomeip that referenced this issue Aug 16, 2020
As of b15c292 ("[nrf5-lock] start joiner role on boot (project-chip#1962)"),
we are using too much stack space in timer task. The timer task has a 1k
stack and logging along uses a 256 byte stack buffer.

The code in
GenericThreadStackManagerImpl_FreeRTOS<ImplClass>::OnJoinerTimer should
be moved off the timer task. In the meantime increase the stack size
to avoid overruns in the thread joiner.

Also enable the option configCHECK_FOR_STACK_OVERFLOW, and while we're
here also enable configUSE_MALLOC_FAILED_HOOK. These diagnostic options
are invaluable for saving debugging time.

Since logging uses significant stack space, try to catch stack overflows
in the platform LogV(). This fires reliably in OnJoinerTimer prior
to enlarging the stack.

Depends on project-chip#2185

Fixes project-chip#2187
mspang added a commit to mspang/connectedhomeip that referenced this issue Aug 16, 2020
As of b15c292 ("[nrf5-lock] start joiner role on boot (project-chip#1962)"),
we are using too much stack space in timer task. The timer task has a 1k
stack and logging along uses a 256 byte stack buffer.

The code in
GenericThreadStackManagerImpl_FreeRTOS<ImplClass>::OnJoinerTimer should
be moved off the timer task. In the meantime increase the stack size
to avoid overruns in the thread joiner.

Also enable the option configCHECK_FOR_STACK_OVERFLOW, and while we're
here also enable configUSE_MALLOC_FAILED_HOOK. These diagnostic options
are invaluable for saving debugging time.

Since logging uses significant stack space, try to catch stack overflows
in the platform LogV(). This fires reliably in OnJoinerTimer prior
to enlarging the stack.

Fixes project-chip#2187
BroderickCarlin pushed a commit that referenced this issue Aug 20, 2020
* nrf5: Enlarge stack to fix thread join overruns

As of b15c292 ("[nrf5-lock] start joiner role on boot (#1962)"),
we are using too much stack space in timer task. The timer task has a 1k
stack and logging along uses a 256 byte stack buffer.

The code in
GenericThreadStackManagerImpl_FreeRTOS<ImplClass>::OnJoinerTimer should
be moved off the timer task. In the meantime increase the stack size
to avoid overruns in the thread joiner.

Also enable the option configCHECK_FOR_STACK_OVERFLOW, and while we're
here also enable configUSE_MALLOC_FAILED_HOOK. These diagnostic options
are invaluable for saving debugging time.

Since logging uses significant stack space, try to catch stack overflows
in the platform LogV(). This fires reliably in OnJoinerTimer prior
to enlarging the stack.

Fixes #2187

* Reduce timer task memory to 2k

* Fix the stack size in EFR32 as well
andy31415 pushed a commit that referenced this issue Aug 20, 2020
* -Include the correct FreeRTOS Cortex files in the EFR32 makefiles correspoding to the defined MCU family (MG12 vs MG21)
-Init all IRQs priority to a lower priority valid for FreeRTOS API. An IRQ in the gecko radio libs
 with the default priority 0 (highest) was causing a assert failure in FreeRTOS

* Add the IRQ priority init for all EFR32 boards init
Restyle some file headers and copyright mentions

* Fix build script

* Fix sources for EFR32 platform

* Set mbedtls to external source

* Fix compilation with ninja

* Add BoltLockManager to manage the lock and unlock request and state
Add DataModelHandler to handle bolt actions from the cluster messages
Add Gen folder with the files for silicon lab cluster implementation
Strt a server session for UDP messages
Include some mbedtls sources  files from gsdk 2.7 in gni . TO BE FIX

* merge upstream

* Added Openthread to the example

* nrf5: Enlarge stack to fix thread join overruns

As of b15c292 ("[nrf5-lock] start joiner role on boot (#1962)"),
we are using too much stack space in timer task. The timer task has a 1k
stack and logging along uses a 256 byte stack buffer.

The code in
GenericThreadStackManagerImpl_FreeRTOS<ImplClass>::OnJoinerTimer should
be moved off the timer task. In the meantime increase the stack size
to avoid overruns in the thread joiner.

Also enable the option configCHECK_FOR_STACK_OVERFLOW, and while we're
here also enable configUSE_MALLOC_FAILED_HOOK. These diagnostic options
are invaluable for saving debugging time.

Since logging uses significant stack space, try to catch stack overflows
in the platform LogV(). This fires reliably in OnJoinerTimer prior
to enlarging the stack.

Fixes #2187

* Reduce timer task memory to 2k

* Fix the stack size in EFR32 as well

* Add BoltLockManager to manage the lock and unlock request and state
Add DataModelHandler to handle bolt actions from the cluster messages
Add Gen folder with the files for silicon lab cluster implementation
Start a server session for UDP messages
Include some mbedtls sources  files from gsdk 2.7 in gni . TO BE FIX

merge upstream

* Add support for Silabs dev board BRD4163A and BRD4164A

Merge Upstream into branch

Add BoltLockManager to manage the lock and unlock request and state
Add DataModelHandler to handle bolt actions from the cluster messages
Add Gen folder with the files for silicon lab cluster implementation
Strt a server session for UDP messages
Include some mbedtls sources  files from gsdk 2.7 in gni . TO BE FIX

merge upstream

Add the IRQ priority init for all EFR32 boards init
Restyle some file headers and copyright mentions

* Restyled by whitespace

* Fix submodules

* Format GN files

* Clean up & enable Thread on EFR32

* Fix initial thread stack overrun

* Fix entropy provider

* Restyled by clang-format

* Fix openthread commit

* Reformat build files

  gn format $(git ls-files HEAD '*.gn' '*.gni')
  git add $(git ls-files HEAD '*.gn' '*.gni')

Hopefully the last time now that restyled is working.

* Format GN files

* Apply fixes from master

Co-authored-by: jmartinez-silabs <[email protected]>
Co-authored-by: jfpenven <[email protected]>
Co-authored-by: Restyled.io <[email protected]>
Co-authored-by: jmartinez-silabs <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant