Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pairing of M5 board via CHIPTool cli throws Timeout error #19135

Closed
phonnakasturi-apple opened this issue Jun 2, 2022 · 5 comments · Fixed by #19137
Closed

Pairing of M5 board via CHIPTool cli throws Timeout error #19135

phonnakasturi-apple opened this issue Jun 2, 2022 · 5 comments · Fixed by #19137

Comments

@phonnakasturi-apple
Copy link

With SHA 6c4b54e

M5 board cannot be paired via chiptool cli.

Steps to reproduce the issue;

Note: Ensure that M5 board is connected and flashed

  1. cd ~/connectedhomeip
  2. git reset --hard 6c4b54e
  3. source scripts/bootstrap.sh
  4. source scripts/activate.sh
  5. gn gen out/debug --args='chip_mdns="platform" chip_crypto="mbedtls" chip_inet_config_enable_ipv4=false'
  6. ninja -C out/debug
  7. sudo ./out/debug/chip-tool pairing ble-wifi 4 "" 20202021 3840 --paa-trust-store-path ./credentials/development/paa-root-certs

Error:

[1654206140364] [89910:2429897] CHIP: [EM] OnMessageReceived failed, err = ../../src/messaging/ExchangeMgr.cpp:265: CHIP Error 0x00000070: Unsolicited msg with originator bit clear
[1654206140421] [89910:2429898] CHIP: [EM] Received message of type 0x9 with protocolId (0, 1) and MessageCounter:205648405 on exchange 43505i
[1654206140421] [89910:2429898] CHIP: [EM] Found matching exchange: 43505i, Delegate: 0x7fcd34004250
[1654206140421] [89910:2429898] CHIP: [DMG] ICR moving to [ResponseRe]
[1654206140421] [89910:2429898] CHIP: [DMG] InvokeResponseMessage =
[1654206140421] [89910:2429898] CHIP: [DMG] {
[1654206140421] [89910:2429898] CHIP: [DMG] suppressResponse = false,
[1654206140421] [89910:2429898] CHIP: [DMG] InvokeResponseIBs =
[1654206140421] [89910:2429898] CHIP: [DMG] [
[1654206140421] [89910:2429898] CHIP: [DMG] InvokeResponseIB =
[1654206140421] [89910:2429898] CHIP: [DMG] {
[1654206140421] [89910:2429898] CHIP: [DMG] CommandDataIB =
[1654206140421] [89910:2429898] CHIP: [DMG] {
[1654206140421] [89910:2429898] CHIP: [DMG] CommandPathIB =
[1654206140421] [89910:2429898] CHIP: [DMG] {
[1654206140421] [89910:2429898] CHIP: [DMG] EndpointId = 0x0,
[1654206140421] [89910:2429898] CHIP: [DMG] ClusterId = 0x30,
[1654206140421] [89910:2429898] CHIP: [DMG] CommandId = 0x1,
[1654206140421] [89910:2429898] CHIP: [DMG] },
[1654206140421] [89910:2429898] CHIP: [DMG]
[1654206140421] [89910:2429898] CHIP: [DMG] CommandFields =
[1654206140421] [89910:2429898] CHIP: [DMG] {
[1654206140421] [89910:2429898] CHIP: [DMG] 0x0 = 0,
[1654206140421] [89910:2429898] CHIP: [DMG] 0x1 = "",
[1654206140421] [89910:2429898] CHIP: [DMG] },
[1654206140421] [89910:2429898] CHIP: [DMG] },
[1654206140421] [89910:2429898] CHIP: [DMG]
[1654206140421] [89910:2429898] CHIP: [DMG] },
[1654206140421] [89910:2429898] CHIP: [DMG]
[1654206140421] [89910:2429898] CHIP: [DMG] ],
[1654206140421] [89910:2429898] CHIP: [DMG]
[1654206140421] [89910:2429898] CHIP: [DMG] InteractionModelRevision = 1
[1654206140421] [89910:2429898] CHIP: [DMG] },
[1654206140422] [89910:2429898] CHIP: [DMG] Received Command Response Data, Endpoint=0 Cluster=0x0000_0030 Command=0x0000_0001
[1654206140422] [89910:2429898] CHIP: [CTL] Failsafe disarmed
[1654206140422] [89910:2429898] CHIP: [CTL] Successfully finished commissioning step 'Cleanup'
[1654206140422] [89910:2429898] CHIP: [TOO] Device commissioning Failure: ../../src/app/CommandSender.cpp:211: CHIP Error 0x00000032: Timeout
[1654206140422] [89910:2429898] CHIP: [DIS] Closing all BLE connections
[1654206140422] [89910:2429898] CHIP: [IN] Clearing BLE pending packets.
[1654206140422] [89910:2429898] CHIP: [BLE] Auto-closing end point's BLE connection.
[1654206140422] [89910:2429898] CHIP: [DMG] ICR moving to [AwaitingDe]
[1654206140422] [89910:2429893] CHIP: [CTL] Shutting down the commissioner
[1654206140422] [89910:2429893] CHIP: [CTL] Shutting down the controller
[1654206140422] [89910:2429893] CHIP: [CTL] Shutting down the commissioner
[1654206140423] [89910:2429893] CHIP: [CTL] Shutting down the controller
[1654206140423] [89910:2429893] CHIP: [IN] Expiring all connections for fabric 1!!
[1654206140423] [89910:2429893] CHIP: [CTL] Shutting down the commissioner
[1654206140423] [89910:2429893] CHIP: [CTL] Shutting down the controller
[1654206140423] [89910:2429893] CHIP: [IN] Expiring all connections for fabric 2!!
[1654206140423] [89910:2429893] CHIP: [CTL] Shutting down the commissioner
[1654206140423] [89910:2429893] CHIP: [CTL] Shutting down the controller
[1654206140423] [89910:2429893] CHIP: [IN] Expiring all connections for fabric 3!!
[1654206140423] [89910:2429893] CHIP: [CTL] Shutting down the System State, this will teardown the CHIP Stack
[1654206140423] [89910:2429893] CHIP: [DMG] IM WH moving to [Uninitialized]
[1654206140423] [89910:2429893] CHIP: [DMG] IM WH moving to [Uninitialized]
[1654206140423] [89910:2429893] CHIP: [DMG] IM WH moving to [Uninitialized]
[1654206140423] [89910:2429893] CHIP: [DMG] IM WH moving to [Uninitialized]
[1654206140423] [89910:2429893] CHIP: [DMG] All ReadHandler-s are clean, clear GlobalDirtySet
[1654206140423] [89910:2429893] CHIP: [BLE] CancelConnection
[1654206140423] [89910:2429893] CHIP: [DL] Inet Layer shutdown
[1654206140423] [89910:2429893] CHIP: [DL] BLE shutdown
[1654206140423] [89910:2429893] CHIP: [DL] System Layer shutdown
[1654206140423] [89910:2429893] CHIP: [IN] SecureSession Released 0x7fcd32482a80 Type:1 LSID:10578
[1654206140423] [89910:2429893] CHIP: [TOO] Run command failure: ../../src/app/CommandSender.cpp:211: CHIP Error 0x00000032: Timeout

Attaching the complete log file;

M5-cli-error-log.txt

@bzbarsky-apple
Copy link
Contributor

Here is the relevant part of the log:

[1654206138037] [89910:2429898] CHIP: [CTL] Performing next commissioning step 'SendNOC'
...
[1654206138037] [89910:2429898] CHIP: [CTL] Sent operational certificate to the device
...
[1654206140040] [89910:2429898] CHIP: [DMG] Time out! failed to receive invoke command response from Exchange: 43504i
[1654206140040] [89910:2429898] CHIP: [CTL] Device failed to receive the operational certificate Response: ../../src/app/CommandSender.cpp:211: CHIP Error 0x00000032: Timeout

Note that the AddNOC command timed out in 2 seconds, because that's the timeout that #18744 set. There's no way an m5stack can do an AddNOC in 2 seconds. I just tried commissioning an m5stack on the changeset right before that PR merged and here's what a successful log looks like:

[1654210729427] [72482:3413866] CHIP: [CTL] Sent operational certificate to the device
...
[1654210732363] [72482:3413865] CHIP: [EM] Received message of type 0x9 with protocolId (0, 1) and MessageCounter:92935151 on exchange 20798i

which is 2936 ms to get the response.

That said, why did we time out in 2s? I would have expected the exchange timeout to be set to "2s plus however long it takes to do all our MRP resends", not a flat 2s.

@mrjerryjohns @erjiaqing why is the code from #18744 not at least accounting for MRP bits here and doing a hard 2 second cutoff?

@bzbarsky-apple
Copy link
Contributor

Ah, this is BLE, so no MRP involved. So that's why the timeout is a flat 2s.

@bzbarsky-apple
Copy link
Contributor

It looks like for TCP we assume 30s for packet retransmits and whatnot, while for BLE we assume 0.

@bzbarsky-apple
Copy link
Contributor

Fix coming up.

@bzbarsky-apple bzbarsky-apple self-assigned this Jun 3, 2022
bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Jun 3, 2022
Several fixes here:

1. AutoCommissioner has a comment about how per spec everything during
   commissioning needs at least a 30s timeout, and it was passing that to
   PerformCommissioningStep, but DeviceCommissioner was ignoring the "timeout"
   parameter for a bunch of the cases, including crucially for AddNOC and
   CSRRequest.  Those can take a while to run, and were hitting the
   now-much-lower default 2s timeout.  The fix here is to stop ignoring the
   passed-in value.

2. The passed-in timeout value computation in AutoCommissioner was not quite
   right.  It was set to max(30s, network-connect-time), but the network connect
   time is the processing time on the server, not the total time including
   transport latency.  Fix the computation of the timeout to:
   a. Take the network connect times for the network enable steps, a "slow
      crypto" time of 15s for the AddNOC and CSRRequest steps, and the default
      IM timeout for all other steps and treat that as the server processing
      time.
   b. Add the transport timeout bits from our device's session to that server
      processing time.
   c. If the result is less than the spec-mandated 30s timeout, use 30s,
      otherwise use the result we computed.

3. Assuming that BLE has 0 transport latency is wrong.  Not clear what the right
   value is, but for now setting it to the same as TCP.

Fixes project-chip#19135
bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Jun 3, 2022
Several fixes here:

1. AutoCommissioner has a comment about how per spec everything during
   commissioning needs at least a 30s timeout, and it was passing that to
   PerformCommissioningStep, but DeviceCommissioner was ignoring the "timeout"
   parameter for a bunch of the cases, including crucially for AddNOC and
   CSRRequest.  Those can take a while to run, and were hitting the
   now-much-lower default 2s timeout.  The fix here is to stop ignoring the
   passed-in value.

2. The passed-in timeout value computation in AutoCommissioner was not quite
   right.  It was set to max(30s, network-connect-time), but the network connect
   time is the processing time on the server, not the total time including
   transport latency.  Fix the computation of the timeout to:
   a. Take the network connect times for the network enable steps, a "slow
      crypto" time of 15s for the AddNOC and CSRRequest steps, and the default
      IM timeout for all other steps and treat that as the server processing
      time.
   b. Add the transport timeout bits from our device's session to that server
      processing time.
   c. If the result is less than the spec-mandated 30s timeout, use 30s,
      otherwise use the result we computed.

3. Assuming that BLE has 0 transport latency is wrong.  Not clear what the right
   value is, but for now setting it to the same as TCP.

Fixes project-chip#19135
@woody-apple
Copy link
Contributor

Cert Blocker Review: Marking cert blocker.

woody-apple pushed a commit that referenced this issue Jun 3, 2022
* Fix command timeouts during commissioning.

Several fixes here:

1. AutoCommissioner has a comment about how per spec everything during
   commissioning needs at least a 30s timeout, and it was passing that to
   PerformCommissioningStep, but DeviceCommissioner was ignoring the "timeout"
   parameter for a bunch of the cases, including crucially for AddNOC and
   CSRRequest.  Those can take a while to run, and were hitting the
   now-much-lower default 2s timeout.  The fix here is to stop ignoring the
   passed-in value.

2. The passed-in timeout value computation in AutoCommissioner was not quite
   right.  It was set to max(30s, network-connect-time), but the network connect
   time is the processing time on the server, not the total time including
   transport latency.  Fix the computation of the timeout to:
   a. Take the network connect times for the network enable steps, a "slow
      crypto" time of 15s for the AddNOC and CSRRequest steps, and the default
      IM timeout for all other steps and treat that as the server processing
      time.
   b. Add the transport timeout bits from our device's session to that server
      processing time.
   c. If the result is less than the spec-mandated 30s timeout, use 30s,
      otherwise use the result we computed.

3. Assuming that BLE has 0 transport latency is wrong.  Not clear what the right
   value is, but for now setting it to the same as TCP.

Fixes #19135

* Address review comments.

* Address review comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants