Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CERT-TEST-FAILURE] Chip-tool crashes in interactive mode #29275

Closed
cjandhyala opened this issue Sep 15, 2023 · 3 comments · Fixed by #29277
Closed

[CERT-TEST-FAILURE] Chip-tool crashes in interactive mode #29275

cjandhyala opened this issue Sep 15, 2023 · 3 comments · Fixed by #29277

Comments

@cjandhyala
Copy link
Contributor

Feature Area

Other

Test Case

Chip-tool Interactive Mode

Reproduction steps

ToT (d4cbdbc),

  1. build chip-tool using ./scripts/examples/gn_build_example.sh examples/chip-tool out/chip-tool 'chip_mdns="platform" chip_inet_config_enable_ipv4=false'
  2. build all cluster app (ACA) using./scripts/examples/gn_build_example.sh examples/all-clusters-app/linux examples/all-clusters-app/linux/out/all-cluster-app chip_inet_config_enable_ipv4=false chip_enable_icd_server=true
  3. Launch ACA using ./examples/all-clusters-app/linux/out/all-cluster-app/chip-all-clusters-app --KVS ~/storge.json
  4. launch ./out/chip-tool/chip-tool interactive start
  5. provision using chip-tool using , ./out/chip-tool/chip-tool pairing code 50 MT:-24J042C00KA0648G00
  6. Read attribute like , basicinformation read vendor-name 50 0
  7. kill and restart the ACA , using same cmd in step3
  8. repeat step6, chip-tool crashes after MRP retries, log below

basicinformation read vendor-name 50 0
[1694736729225] [39906:71817125] [TOO] Command: basicinformation read vendor-name 50 0
[1694736729226] [39906:71817127] [TOO] Sending command to node 0x32
[1694736729227] [39906:71817127] [CSM] FindOrEstablishSession: PeerId = [1:0000000000000032]
[1694736729227] [39906:71817127] [CSM] FindOrEstablishSession: No existing OperationalSessionSetup instance found
[1694736729227] [39906:71817127] [DIS] Found an existing secure session to [1:0000000000000032]!
[1694736729227] [39906:71817127] [DIS] OperationalSessionSetup[1:0000000000000032]: State change 1 --> 5
[1694736729227] [39906:71817127] [TOO] Sending ReadAttribute to:
[1694736729227] [39906:71817127] [TOO] cluster 0x0000_0028, attribute: 0x0000_0001, endpoint 0
[1694736729227] [39906:71817127] [DMG] SendReadRequest ReadClient[0x128804bb0]: Sending Read Request
[1694736729227] [39906:71817127] [EM] <<< [E:59723i S:45800 M:116333174] (S) Msg TX to 1:0000000000000032 [E009] --- Type 0001:02 (IM:ReadRequest)
[1694736729227] [39906:71817127] [IN] (S) Sending msg 116333174 on secure session with LSID: 45800
[1694736729228] [39906:71817127] [DMG] MoveToState ReadClient[0x128804bb0]: Moving to [AwaitingIn]
[1694736735322] [39906:71817127] [EM] Retransmitting MessageCounter:116333174 on exchange 59723i Send Cnt 1
[1694736735323] [39906:71817127] [IN] (S) Sending msg 116333174 on secure session with LSID: 45800
[1694736741317] [39906:71818695] [EM] Retransmitting MessageCounter:116333174 on exchange 59723i Send Cnt 2
[1694736741318] [39906:71818695] [IN] (S) Sending msg 116333174 on secure session with LSID: 45800
[1694736752199] [39906:71818917] [EM] Retransmitting MessageCounter:116333174 on exchange 59723i Send Cnt 3
[1694736752200] [39906:71818917] [IN] (S) Sending msg 116333174 on secure session with LSID: 45800
[1694736769466] [39906:71819421] [EM] Retransmitting MessageCounter:116333174 on exchange 59723i Send Cnt 4
[1694736769466] [39906:71819421] [IN] (S) Sending msg 116333174 on secure session with LSID: 45800
[1694736789226] [39906:71817125] [DL] Chip stack locking error at 'src/system/SystemLayerImplSelect.cpp:260'. Code is unsafe/racy
[1694736789226] [39906:71817125] [-] chipDie chipDie chipDie
zsh: abort ./out/chip-tool/chip-tool interactive start

Bug prevalence

every time

GitHub hash of the SDK that was being used

d4cbdbc

Platform

darwin

Anything else?

Same issue happens on linux as well.

@bzbarsky-apple
Copy link
Contributor

Assertion stack:

    frame #5: 0x0000000100623f2c chip-tool`chip::Platform::Internal::AssertChipStackLockedByCurrentThread(file="src/system/SystemLayerImplSelect.cpp", line=260) at LockTracker.cpp:36:9
    frame #6: 0x00000001005860b0 chip-tool`chip::System::LayerImplSelect::CancelTimer(this=0x00000001009db5c0, onComplete=(chip-tool`chip::Messaging::ReliableMessageMgr::Timeout(chip::System::Layer*, void*) at ReliableMessageMgr.cpp:189), appState=0x00000001029065a8)(chip::System::Layer*, void*), void*) at SystemLayerImplSelect.cpp:260:5
    frame #7: 0x00000001006c9e30 chip-tool`chip::Messaging::ReliableMessageMgr::StopTimer(this=0x00000001029065a8) at ReliableMessageMgr.cpp:438:19
...
    frame #14: 0x00000001006cb474 chip-tool`chip::Messaging::ReliableMessageMgr::ClearRetransTable(this=0x00000001029065a8, rc=0x0000600002c08108) at ReliableMessageMgr.cpp:375:19
    frame #15: 0x00000001006c2550 chip-tool`chip::Messaging::ExchangeContext::DoClose(this=0x0000600002c08100, clearRetransTable=true) at ExchangeContext.cpp:256:48
    frame #16: 0x00000001006c2a30 chip-tool`chip::Messaging::ExchangeContext::Abort(this=0x0000600002c08100) at ExchangeContext.cpp:308:5
    frame #17: 0x000000010009fca4 chip-tool`chip::Messaging::ExchangeHolder::Release(this=0x0000000107a06ce0) at ExchangeHolder.h:138:32
    frame #18: 0x000000010009fc24 chip-tool`chip::Messaging::ExchangeHolder::~ExchangeHolder(this=0x0000000107a06ce0) at ExchangeHolder.h:81:9
...
    frame #22: 0x0000000100780fac chip-tool`chip::app::ReadClient::~ReadClient(this=0x0000000107a06cd0) at ReadClient.cpp:91:1
    frame #23: 0x00000001000a96ec chip-tool`std::__1::default_delete<chip::app::ReadClient>::operator(this=0x000060000003c070, __ptr=0x0000000107a06cd0)(chip::app::ReadClient*) const at unique_ptr.h:57:5
...
    frame #31: 0x00000001000a983c chip-tool`std::__1::vector<std::__1::unique_ptr<chip::app::ReadClient, std::__1::default_delete<chip::app::ReadClient> >, std::__1::allocator<std::__1::unique_ptr<chip::app::ReadClient, std::__1::default_delete<chip::app::ReadClient> > > >::clear(this=0x0000000103c6cdb0 size=1) at vector:796:17
    frame #32: 0x00000001000a9804 chip-tool`InteractionModelReports::Shutdown(this=0x0000000103c6cdb0) at InteractionModel.h:123:36
    frame #33: 0x00000001000a5cf0 chip-tool`ReportCommand::Cleanup(this=0x0000000103c6c800) at ReportCommand.h:120:56
    frame #34: 0x00000001004e3038 chip-tool`CHIPCommand::Run(this=0x0000000103c6c800) at CHIPCommand.cpp:235:9

@bzbarsky-apple
Copy link
Contributor

Simpler way to reproduce:

  1. Run chip-tool interactive mode.
  2. Run all-clusters-app (built pretty much any way you want)
  3. In chip-tool: basicinformation read vendor-name 50 0
  4. Restart all-clusters-app
  5. In chip-tool: basicinformation read vendor-name 50 0 --timeout 1

When a command times out in chip-tool, it will try to do cleanup on the non-Matter thread, which will trigger the asserts we see here. We clearly need better asserts higher up the stack, though: the fact that we had to get to timers and MRP before an assert triggered is silly.

@bzbarsky-apple
Copy link
Contributor

Oh, and in non-interactive mode the cleanup happens after platform manager shutdown, so this assert can't get hit there.

bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Sep 15, 2023
Since the Matter event loop is still running in interacive mode, we need to do
the command cleanup on that event loop.

* Fixes project-chip#29275
* Fixes project-chip#27535
@mergify mergify bot closed this as completed in #29277 Sep 15, 2023
mergify bot pushed a commit that referenced this issue Sep 15, 2023
…29277)

* Fix chip-tool threading asserts if an interactive command times out.

Since the Matter event loop is still running in interacive mode, we need to do
the command cleanup on that event loop.

* Fixes #29275
* Fixes #27535

* Address review comment.
@github-project-automation github-project-automation bot moved this from Todo to Done in [Platform] Darwin Sep 15, 2023
@github-project-automation github-project-automation bot moved this from Open Cert Blockers to Complete in [Certification] Blockers Sep 15, 2023
HunsupJung pushed a commit to HunsupJung/connectedhomeip that referenced this issue Oct 23, 2023
…roject-chip#29277)

* Fix chip-tool threading asserts if an interactive command times out.

Since the Matter event loop is still running in interacive mode, we need to do
the command cleanup on that event loop.

* Fixes project-chip#29275
* Fixes project-chip#27535

* Address review comment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants