Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Crash] CHIP crashes sometimes when receiving a response from a CHIP accessory #6390

Closed
sagar-apple opened this issue Apr 30, 2021 · 8 comments · Fixed by #6398
Closed

[Crash] CHIP crashes sometimes when receiving a response from a CHIP accessory #6390

sagar-apple opened this issue Apr 30, 2021 · 8 comments · Fixed by #6398

Comments

@sagar-apple
Copy link
Contributor

This looks like a stack overflow since PeerAddress is not a pointer and IsInitialized should have just returned false if it wasn't init.

Note - this only happens when writing to the DoorLock Cluster. OnOff Cluster was working ok. Reports are enabled for that Cluster.

Last bit of Controller Logs:

2021-04-29 18:19:00.5785 -0700    info 0x77511 0x0 CHIPTool[21618] (CHIP): � [1619745540594] [0x77511] CHIP: [ZCL] RX len 4, ep 1, clus 0x101
2021-04-29 18:19:00.5785 -0700    info 0x77511 0x0 CHIPTool[21618] (CHIP): � [1619745540594] [0x77511] CHIP: [ZCL]  FC 9 seq 5 cmd 1 payload[
2021-04-29 18:19:00.5786 -0700    info 0x77511 0x0 CHIPTool[21618] (CHIP): � [1619745540594] [0x77511] CHIP: [ZCL] 00
2021-04-29 18:19:00.5786 -0700    info 0x77511 0x0 CHIPTool[21618] (CHIP): � [1619745540594] [0x77511] CHIP: [ZCL] ]
2021-04-29 18:19:00.5787 -0700    info 0x77511 0x0 CHIPTool[21618] (CHIP): � [1619745540594] [0x77511] CHIP: [ZCL] UnlockDoorResponse:
2021-04-29 18:19:00.5787 -0700    info 0x77511 0x0 CHIPTool[21618] (CHIP): � [1619745540594] [0x77511] CHIP: [ZCL]   status: EMBER_ZCL_STATUS_SUCCESS (0x00)
...
2021-04-29 18:19:00.6356 -0700    info 0x77511 0x0 CHIPTool[21618] (CHIP): � [1619745540651] [0x77511] CHIP: [ZCL] RX len 13, ep 1, clus 0x101
2021-04-29 18:19:00.6357 -0700    info 0x77511 0x0 CHIPTool[21618] (CHIP): � [1619745540651] [0x77511] CHIP: [ZCL]  FC 9 seq 12 cmd 20 payload[
2021-04-29 18:19:00.6357 -0700    info 0x77511 0x0 CHIPTool[21618] (CHIP): � [1619745540651] [0x77511] CHIP: [ZCL] 01 02 00 00 00 00 00 00 00 00
2021-04-29 18:19:00.6358 -0700    info 0x77511 0x0 CHIPTool[21618] (CHIP): � [1619745540651] [0x77511] CHIP: [ZCL] ]

Stack Trace:

Thread 4 Crashed:
0   CHIP                          	0x000000010155d450 chip::Transport::PeerAddress::IsInitialized() const + 12
1   CHIP                          	0x000000010155d3dc chip::Transport::PeerConnectionState::IsInitialized() + 36
2   CHIP                          	0x000000010155c134 chip::Transport::PeerConnections<16ul, (chip::Time::Source)0>::FindPeerConnectionState(chip::Optional<unsigned long long>, unsigned short, chip::Transport::PeerConnectionState*) + 132
3   CHIP                          	0x000000010155b8e0 chip::SecureSessionMgr::GetPeerConnectionState(chip::SecureSessionHandle) + 88
4   CHIP                          	0x000000010152d2e8 chip::Messaging::ExchangeContext::SendMessage(chip::Protocols::Id, unsigned char, chip::System::PacketBufferHandle, chip::BitFlags<chip::Messaging::SendMessageFlags, unsigned short> const&) + 124
5   CHIP                          	0x000000010142fb48 chipSendUnicast(unsigned long long, EmberApsFrame*, unsigned short, unsigned char*) + 660
6   CHIP                          	0x00000001014225f8 emAfSend(unsigned char, unsigned long long, EmberApsFrame*, unsigned char, unsigned char*, unsigned char*, unsigned short, unsigned char) + 232
7   CHIP                          	0x00000001014219bc send(unsigned char, unsigned long long, EmberApsFrame*, unsigned short, unsigned char*, bool, unsigned short, unsigned char, void (*)(unsigned char, unsigned long long, EmberApsFrame*, unsigned short, unsigned char*, unsigned char)) + 436
8   CHIP                          	0x0000000101421e14 emberAfSendUnicastWithCallback(unsigned char, unsigned long long, EmberApsFrame*, unsigned short, unsigned char*, void (*)(unsigned char, unsigned long long, EmberApsFrame*, unsigned short, unsigned char*, unsigned char)) + 164
9   CHIP                          	0x0000000101423698 emberAfSendResponseWithCallback(void (*)(unsigned char, unsigned long long, EmberApsFrame*, unsigned short, unsigned char*, unsigned char)) + 336
10  CHIP                          	0x0000000101423904 emberAfSendDefaultResponseWithCallback(EmberAfClusterCommand const*, EmberAfStatus, void (*)(unsigned char, unsigned long long, EmberApsFrame*, unsigned short, unsigned char*, unsigned char)) + 324
11  CHIP                          	0x0000000101423970 emberAfSendDefaultResponse(EmberAfClusterCommand const*, EmberAfStatus) + 40
12  CHIP                          	0x00000001014c2d80 emAfProcessClusterSpecificCommand(EmberAfClusterCommand*) + 208
13  CHIP                          	0x0000000101423354 dispatchZclMessage(EmberAfClusterCommand*) + 412
14  CHIP                          	0x0000000101422f58 emberAfProcessMessage(EmberApsFrame*, unsigned char, unsigned char*, unsigned short, unsigned long long, EmberAfInterpanHeader*) + 436
15  CHIP                          	0x00000001014c2ec0 HandleDataModelMessage(unsigned long long, chip::System::PacketBufferHandle) + 240
16  CHIP                          	0x00000001015086a0 chip::Controller::Device::OnMessageReceived(chip::PacketHeader const&, chip::PayloadHeader const&, chip::System::PacketBufferHandle) + 224
17  CHIP                          	0x000000010150b044 chip::Controller::DeviceController::OnMessageReceived(chip::Messaging::ExchangeContext*, chip::PacketHeader const&, chip::PayloadHeader const&, chip::System::PacketBufferHandle) + 292
18  CHIP                          	0x000000010152e1e4 chip::Messaging::ExchangeContext::HandleMessage(chip::PacketHeader const&, chip::PayloadHeader const&, chip::Transport::PeerAddress const&, chip::System::PacketBufferHandle) + 472
19  CHIP                          	0x000000010152feb8 chip::Messaging::ExchangeManager::OnMessageReceived(chip::PacketHeader const&, chip::PayloadHeader const&, chip::SecureSessionHandle, chip::Transport::PeerAddress const&, chip::System::PacketBufferHandle, chip::SecureSessionMgr*) + 996
20  CHIP                          	0x000000010155cd54 chip::SecureSessionMgr::SecureMessageDispatch(chip::PacketHeader const&, chip::Transport::PeerAddress const&, chip::System::PacketBufferHandle) + 1144
21  CHIP                          	0x000000010155c83c chip::SecureSessionMgr::OnMessageReceived(chip::PacketHeader const&, chip::Transport::PeerAddress const&, chip::System::PacketBufferHandle) + 120
22  CHIP                          	0x000000010155d8e4 chip::TransportMgrBase::HandleMessageReceived(chip::PacketHeader const&, chip::Transport::PeerAddress const&, chip::System::PacketBufferHandle) + 188
23  CHIP                          	0x00000001015ab4ac chip::Transport::Base::HandleMessageReceived(chip::PacketHeader const&, chip::Transport::PeerAddress const&, chip::System::PacketBufferHandle&&) + 156
24  CHIP                          	0x00000001015ab1f0 chip::Transport::UDP::OnUdpReceive(chip::Inet::IPEndPointBasis*, chip::System::PacketBufferHandle, chip::Inet::IPPacketInfo const*) + 144
25  CHIP                          	0x0000000101517744 chip::Inet::IPEndPointBasis::HandlePendingIO(unsigned short) + 1144
26  CHIP                          	0x000000010151e810 chip::Inet::UDPEndPoint::HandlePendingIO() + 88
27  CHIP                          	0x000000010151aa14 chip::Inet::InetLayer::HandleSelectResult(int, fd_set*, fd_set*, fd_set*) + 792
28  CHIP                          	0x00000001015429cc chip::DeviceLayer::Internal::GenericPlatformManagerImpl_POSIX<chip::DeviceLayer::PlatformManagerImpl>::SysProcess() + 300
29  CHIP                          	0x000000010154272c chip::DeviceLayer::Internal::GenericPlatformManagerImpl_POSIX<chip::DeviceLayer::PlatformManagerImpl>::_RunEventLoop() + 52
30  CHIP                          	0x0000000101542d58 chip::DeviceLayer::PlatformManager::RunEventLoop() + 28
31  CHIP                          	0x0000000101542b5c chip::DeviceLayer::Internal::GenericPlatformManagerImpl_POSIX<chip::DeviceLayer::PlatformManagerImpl>::EventLoopTaskMain(void*) + 52
32  libsystem_pthread.dylib       	0x00000001be26a8e4 0x1be269000 + 6372
33  libsystem_pthread.dylib       	0x00000001be273700 0x1be269000 + 42752
@sagar-apple
Copy link
Contributor Author

Accessory Logs(timestamps are not corelatable):

I (4326221) chip[DL]: Confirm received for CHIPoBLE TX characteristic indication (con 0) status= 0 
V (4327171) chip[EM]: ec pos: 1, id: 15097, Delegate: 0x3ffc97a8
I (4327171) chip[SVR]: Packet received from Node:3ffdb930: 1165691760 bytes
V (4327171) chip[ZCL]: APS frame processing success!
I (4327181) chip[ZCL]: RX len 4, ep 1, clus 0x101 
I (4327191) chip[ZCL]:  FC 1 seq 4 cmd 1 payload[
I (4327191) chip[ZCL]: 00 
I (4327191) chip[ZCL]: ]
I (4327201) chip[ZCL]: UNLOCK DOOR 
I (4327201) chip[ZCL]: SUCCESS!
I (4327211) chip[ZCL]: sched report event in 0 ms
I (4327211) app-devicecallbacks: PostAttributeChangeCallback - Cluster ID: '0x0101', EndPoint ID: '0x01', Attribute ID: '0x0000'
I (4327221) app-devicecallbacks: Unhandled cluster ID: 257
I (4327231) app-devicecallbacks: Current free heap: 116096

I (4327241) chip[ZCL]: Failed to read AutoRelockTime attribute: 0x86
V (4327241) chip[ZCL]: Measured APS frame size 11
V (4327251) chip[ZCL]: Successfully encoded 11 bytes
V (4327251) chip[IN]: Secure message was encrypted: Msg ID 505
I (4327261) chip[IN]: Sending msg from 0x0000000000000002 to 0xfb2fa413457b0b70 at utc time: 4326578 msec
I (4327271) chip[IN]: Sending secure msg on generic transport
I (4327281) chip[IN]: Secure msg send status No Error
I (4327281) chip[ZCL]: T   0:TX (0x3f40d20a) Ucast 0x00x3f400b36
I (4327291) chip[ZCL]: TX buffer: [
I (4327291) chip[ZCL]: 09 04 01 00 
I (4327301) chip[ZCL]: ]
V (4327301) chip[ZCL]: Measured APS frame size 11
V (4327301) chip[ZCL]: Successfully encoded 11 bytes
V (4327311) chip[IN]: Secure message was encrypted: Msg ID 506
I (4327311) chip[IN]: Sending msg from 0x0000000000000002 to 0xfb2fa413457b0b70 at utc time: 4326634 msec
I (4327321) chip[IN]: Sending secure msg on generic transport
I (4327331) chip[IN]: Secure msg send status No Error
V (4327341) chip[ZCL]: Data model processing success!
E (4327341) chip[DL]: Long dispatch time: 172 ms
V (4327351) chip[ZCL]: Measured APS frame size 11
V (4327351) chip[ZCL]: Successfully encoded 11 bytes
V (4327361) chip[IN]: Secure message was encrypted: Msg ID 507
I (4327361) chip[IN]: Sending msg from 0x0000000000000002 to 0xfb2fa413457b0b70 at utc time: 4326683 msec
I (4327371) chip[IN]: Sending secure msg on generic transport
I (4327381) chip[IN]: Secure msg send status No Error
I (4327391) chip[ZCL]: sched report event in 9962 ms
I (4328751) CHIP[DL]: Write request received for CHIPoBLE RX characteristic con 0 12
GATT procedure initiated: notify; att_handle=14

@sagar-apple sagar-apple changed the title [Crash] CHIP crashes when receiving a response from a CHIP accessory [Crash] CHIP crashes sometimes when receiving a response from a CHIP accessory Apr 30, 2021
@bzbarsky-apple
Copy link
Contributor

I can reproduce this with the command-line chip-tool running against all-clusters-app on an m5stack in bypass mode. In one terminal run:

./out/debug/standalone/chip-tool pairing bypass m5stack-ip-here 11097  
./out/debug/standalone/chip-tool binding bind 112233 0 1 257 1 
./out/debug/standalone/chip-tool doorlock report lock-state 0 10 1
./out/debug/standalone/chip-tool reporting listen                 

In another terminal run:

./out/debug/standalone/chip-tool doorlock lock-door "" 1

@bzbarsky-apple
Copy link
Contributor

The actual problem is in frame 4, in chip::Messaging::ExchangeContext::SendMessage on this line:

state = mExchangeMgr->GetSessionMgr()->GetPeerConnectionState(mSecureSession);

because mExchangeMgr->GetSessionMgr() is null.

@bzbarsky-apple
Copy link
Contributor

OK, so we are getting an "Operation Event Notification" command from the door lock cluster on the server. That lands on the client, the client is not set up to handle that command (in the sense that the generated process-cluster-message.cpp file does not have handling for it), so it tries to send back a "I don't know what you're talking about" default response. And when it tries to do that, we run into trouble because the ExchangeManager has a null mSessionMgr, and we crash.

It looks like we have two ExchangeManager instances around. One is statically allocated (i.e just a global variable). I don't know yet where that's coming from. The other is allocated in Controller::DeviceController::Init. That second one gets set up with a session manager and whatnot. But it's the first one that we end up using to try to send the default response!

@bzbarsky-apple
Copy link
Contributor

bzbarsky-apple commented Apr 30, 2021

And that's because in chipSendUnicast we have this:

  91  	    // TODO: temprary create a handle from node id, will be fix in PR 3602
  92  	    Messaging::ExchangeContext * exchange = ExchangeManager().NewContext({ destination, Transport::kAnyKeyId, 0 }, nullptr);

which is picking up the "wrong" exchange manager.... But we shouldn't have two of them to start with.

It looks like at least for the command-line chip-tool src/app/server/Server.cpp is compiled and linked in, and that's where the statically-allocated exchange manager is coming from.

@vivien-apple should DeviceController just use that (and perhaps the SecureSessionMgr from there as well)?

@vivien-apple
Copy link
Contributor

And that's because in chipSendUnicast we have this:

  91  	    // TODO: temprary create a handle from node id, will be fix in PR 3602
  92  	    Messaging::ExchangeContext * exchange = ExchangeManager().NewContext({ destination, Transport::kAnyKeyId, 0 }, nullptr);

which is picking up the "wrong" exchange manager.... But we shouldn't have two of them to start with.

It looks like at least for the command-line chip-tool src/app/server/Server.cpp is compiled and linked in, and that's where the statically-allocated exchange manager is coming from.

@vivien-apple should DeviceController just use that (and perhaps the SecureSessionMgr from there as well)?

It looks wrong. Server.cpp is definitively intended for accessories. The issue seems to be that src/app/server is included from https://github.com/project-chip/connectedhomeip/blob/master/src/lib/BUILD.gn#L48

If you remove this line and do a few changes to some BUILD.gn files it clearly shows the issue. It fails to build with:

Undefined symbols for architecture x86_64:
  "chip::ExchangeManager()", referenced from:
      chipSendUnicast(unsigned long long, EmberApsFrame*, unsigned short, unsigned char*) in data_model.chip-message-send.cpp.o
diff --git a/examples/bridge-app/bridge-common/BUILD.gn b/examples/bridge-app/bridge-common/BUILD.gn
index 46b94ab8d4..38c003a2a6 100644
--- a/examples/bridge-app/bridge-common/BUILD.gn
+++ b/examples/bridge-app/bridge-common/BUILD.gn
@@ -21,3 +21,4 @@ chip_data_model("bridge-common") {
 
   zap_pregenerated_dir = "gen"
+  is_server = true
 }
diff --git a/examples/chip-tool/BUILD.gn b/examples/chip-tool/BUILD.gn
index 8bd2c3ec41..494a23fdf7 100644
--- a/examples/chip-tool/BUILD.gn
+++ b/examples/chip-tool/BUILD.gn
@@ -45,5 +45,4 @@ executable("chip-tool") {
   deps = [
     ":data_model",
-    "${chip_root}/src/app/server",
     "${chip_root}/src/lib",
     "${chip_root}/src/platform",
diff --git a/examples/platform/linux/BUILD.gn b/examples/platform/linux/BUILD.gn
index ff8dd7cde5..9abbeae74b 100644
--- a/examples/platform/linux/BUILD.gn
+++ b/examples/platform/linux/BUILD.gn
@@ -32,5 +32,8 @@ source_set("app-main") {
   }
 
-  public_deps = [ "${chip_root}/src/lib" ]
+  public_deps = [
+    "${chip_root}/src/lib",
+    "${chip_root}/src/app/server",
+  ]
 
   public_configs = [ ":app-main-config" ]
diff --git a/src/app/chip_data_model.gni b/src/app/chip_data_model.gni
index a08807b283..4f92244c0e 100644
--- a/src/app/chip_data_model.gni
+++ b/src/app/chip_data_model.gni
@@ -62,4 +62,5 @@ template("chip_data_model") {
                              "zap_file",
                              "use_default_client_callbacks",
+                             "is_server",
                            ])
 
@@ -171,4 +172,8 @@ template("chip_data_model") {
     ]
 
+    if (defined(invoker.is_server)) {
+      public_deps += [ "${chip_root}/src/app/server" ]
+    }
+
     if (!defined(public_configs)) {
       public_configs = []
diff --git a/src/lib/BUILD.gn b/src/lib/BUILD.gn
index 19c6a1678d..685704e8cc 100644
--- a/src/lib/BUILD.gn
+++ b/src/lib/BUILD.gn
@@ -14,5 +14,4 @@
 
 import("//build_overrides/chip.gni")
-import("${chip_root}/src/platform/device.gni")
 
 declare_args() {
@@ -46,8 +45,4 @@ static_library("lib") {
   }
 
-  if (chip_device_platform != "none") {
-    public_deps += [ "${chip_root}/src/app/server" ]
-  }
-
   cflags = [ "-Wconversion" ]
``

@vivien-apple
Copy link
Contributor

I would say that we need to carry the ExchangeMgr in some ways so chipSendUnicast can retrieve the correct one.

@bzbarsky-apple
Copy link
Contributor

bzbarsky-apple commented Apr 30, 2021

I would say that we need to carry the ExchangeMgr in some ways so chipSendUnicast can retrieve the correct one.

We need to carry through the exchange, not the exchange manager. As in, we should be sending the response on the same exchange as the request. That's been the plan for months, it's not that hard to do, and I don't know why people put in these hacks instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants