Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Commissioning timeout not reported to application #15409

Closed
caipiblack opened this issue Feb 22, 2022 · 11 comments
Closed

Commissioning timeout not reported to application #15409

caipiblack opened this issue Feb 22, 2022 · 11 comments
Labels
p1 priority 1 work V1.0

Comments

@caipiblack
Copy link
Contributor

Problem

Commit: b0af6ba (Tue Jan 4 02:17:19 2022 +0100)

When I start a BLE-WIFI commissioning with incorrect ssid or password, there is no timeout or callback triggered in application side and the commissioning sessions is not cleaned.

Code:

        CommissioningParameters params;
         params.SetWifiCredentials(WifiCredentials(
                                chip::ByteSpan(.....), chip::ByteSpan(...)));
        SetupPayload payload;
        QRCodeSetupPayloadParser(qrCode).populatePayload(payload);

        RendezvousParameters rendezvous;
        rendezvous.SetDiscriminator(payload.discriminator);
        rendezvous.SetSetupPINCode(payload.setUpPINCode);
        error = commissioner.PairDevice(mNodeId, rendezvous, params);

When the ssid and password are correct, everything is good.
When the ssid and password are wrong the device try to connect to WiFi network and fail, on the commissioner side, OnSessionEstablishmentTimeout() is never called, so StopPairing() and mPairingDelegate->OnPairingComplete(CHIP_ERROR_TIMEOUT) are never called.

The result is:

  1. The application don't know about the timeout (as the callback void OnPairingComplete(CHIP_ERROR err) is not called)
  2. As StopPairing() is also never called FreeRendezvousSession() is never called, so we can't start another pairing until the end of the first one

Some kind of timeout is detected because BLE connexion is closed but the application is not notified and the "RendezvousSession" is not cleaned.

Here some logs:

ble_wifi_wrong_ssid.log

@caipiblack caipiblack changed the title Timeout not reported to higher level Commissioning timeout not reported to application Feb 22, 2022
@caipiblack
Copy link
Contributor Author

Note that the problem of the FreeRendezvousSession() which is not called, also happens when the commissioning failed for another reason (ex: BLE failure)

So most of the times, when pairing fails we have to reboot the software/stack/commissioner.

@bzbarsky-apple
Copy link
Contributor

@tehampson Did #17778 fix this by any chance?

@caipiblack
Copy link
Contributor Author

What I know is that on TE9 commit, we still have an issue when we reproduce what I said but it's possibly not the same problem!

  • Start commissioning a device with wrong SSID or PASSWORD
  • The stack notify a TIMEOUT, during the commissioning stage 16 (kWiFiNetworkEnable) (ok)
  • Wait some seconds and the BLE is disabled
  • Then reset the device and re-try to commission the device
  • We get this kind of error:
[1654084689.145597][26178:26178] CHIP:-: ../../src/controller/CHIPDeviceController.cpp:679: CHIP Error 0x00000003: Incorrect state at ../../src/controller/SetUpCodePairer.cpp:232

@bzbarsky-apple
Copy link
Contributor

That error happens if someone calls DeviceCommissioner::Commission for a device id that we don't have a PASE session to and aren't trying to set up a PASE session for.

@caipiblack is your code still calling into this API:

CHIP_ERROR DeviceCommissioner::PairDevice(NodeId remoteDeviceId, RendezvousParameters & rendezvousParams,
                                          CommissioningParameters & commissioningParams)

? Are you seeing "Invalid device for commissioning" before that error? In general, attaching (not pasting, please) a log would be useful here.

@caipiblack
Copy link
Contributor Author

caipiblack commented Jun 2, 2022

This is how the commissioning is made in my application:

CHIP_ERROR CommandPairing::PairWithCode(NodeId remoteId)
{
    CommissioningParameters commissioningParams = GetCommissioningParameters();
    return CurrentCommissioner().PairDevice(remoteId, mOnboardingPayload, commissioningParams);
}

Yes when I re-start the commissioning I get the error Invalid device for commissioning 0000000000000002

Note that the NodeId is incremented between the two tries and the device is resetted.

Attached you will find full logs: logs-double-commissioning.txt

And here the logs of the second comissioning:

INFO: RX: {"id":0,"jsonrpc":"2.0","method":"matter","params":{"params":{"payload":"MT:-24J042C00KA0648G00","ssid":"TP_9DEA","password":"1234567890"},"cluster":"pairing","command":"qrcode"}}
INFO: TX: {"id":0,"jsonrpc":"2.0","result":{"status":0}}
INFO: Executing command (0)...
DEBUG: CommandPairing::RunCommand(mSSID='TP_9DEA', mPassword='1234567890', mOperationalDataset='???', mOnboardingPayload='MT:-24J042C00KA0648G00')
[1654155261.573464][6987:6987] CHIP:CTL: Setting wifi credentials from parameters
[1654155261.573469][6987:6987] CHIP:CTL: Setting attestation nonce to random value
[1654155261.573480][6987:6987] CHIP:CTL: Setting CSR nonce to random value
[1654155261.573498][6987:6987] CHIP:CTL: Starting commissioning discovery over BLE
[1654155261.573502][6987:6987] CHIP:CTL: Starting commissioning discovery over DNS-SD
[1654155261.573577][6987:6987] CHIP:DIS: Warning: Attempt to mDNS broadcast failed on enp1s0:  ../../src/inet/UDPEndPointImplSockets.cpp:417: OS Error 0x02000065: Network is unreachable
[1654155261.573701][6987:6987] CHIP:DIS: mDNS broadcast had only partial success: 5 successes and 1 failures.
[1654155261.578591][6987:6994] CHIP:BLE: BLE removing known devices.
[1654155261.579031][6987:6994] CHIP:BLE: BLE initiating scan.
[1654155262.170613][6987:6994] CHIP:BLE: Device 35:A3:94:E8:64:89 does not look like a CHIP device.
[1654155262.171921][6987:6994] CHIP:BLE: Device 5C:9C:09:BF:39:7B does not look like a CHIP device.
[1654155262.175959][6987:6994] CHIP:BLE: Device E6:C9:F8:09:50:29 does not look like a CHIP device.
[1654155262.184222][6987:6994] CHIP:BLE: Device 78:7F:87:8B:8C:D9 does not look like a CHIP device.
[1654155262.186234][6987:6994] CHIP:BLE: New device scanned: 7E:1E:1E:79:4F:F7
[1654155262.186254][6987:6994] CHIP:BLE: Device discriminator match. Attempting to connect.
[1654155262.198361][6987:6994] CHIP:BLE: Scan complete notification without an active scan.
[1654155262.305227][6987:6994] CHIP:DL: ConnectDevice complete
[1654155262.573471][6987:6987] CHIP:DIS: Warning: Attempt to mDNS broadcast failed on enp1s0:  ../../src/inet/UDPEndPointImplSockets.cpp:417: OS Error 0x02000065: Network is unreachable
[1654155262.573965][6987:6987] CHIP:DIS: mDNS broadcast had only partial success: 4 successes and 1 failures.
[1654155263.125257][6987:6994] CHIP:DL: Char1 /org/bluez/hci0/dev_7E_1E_1E_79_4F_F7/service0006
[1654155263.125293][6987:6994] CHIP:DL: Char1 /org/bluez/hci0/dev_7E_1E_1E_79_4F_F7/service000a
[1654155263.125310][6987:6994] CHIP:DL: Char1 /org/bluez/hci0/dev_7E_1E_1E_79_4F_F7/service0006
[1654155263.125316][6987:6994] CHIP:DL: Char1 /org/bluez/hci0/dev_7E_1E_1E_79_4F_F7/service000a
[1654155263.125326][6987:6994] CHIP:DL: Char1 /org/bluez/hci0/dev_7E_1E_1E_79_4F_F7/service0006
[1654155263.125332][6987:6994] CHIP:DL: Char1 /org/bluez/hci0/dev_7E_1E_1E_79_4F_F7/service000a
[1654155263.125349][6987:6994] CHIP:DL: Char1 /org/bluez/hci0/dev_7E_1E_1E_79_4F_F7/service000a
[1654155263.125356][6987:6994] CHIP:DL: Char1 /org/bluez/hci0/dev_7E_1E_1E_79_4F_F7/service000a
[1654155263.125369][6987:6994] CHIP:DL: Char1 /org/bluez/hci0/dev_7E_1E_1E_79_4F_F7/service000a
[1654155263.125375][6987:6994] CHIP:DL: Char1 /org/bluez/hci0/dev_7E_1E_1E_79_4F_F7/service000a
[1654155263.125408][6987:6994] CHIP:DL: Char1 /org/bluez/hci0/dev_7E_1E_1E_79_4F_F7/service000a
[1654155263.125416][6987:6994] CHIP:DL: Char1 /org/bluez/hci0/dev_7E_1E_1E_79_4F_F7/service000a
[1654155263.125435][6987:6994] CHIP:DL: New BLE connection 0x7f52c403cf70, device 7E:1E:1E:79:4F:F7, path /org/bluez/hci0/dev_7E_1E_1E_79_4F_F7
[1654155263.125589][6987:6987] CHIP:DL: HandlePlatformSpecificBLEEvent 16385
[1654155263.125624][6987:6987] CHIP:CTL: Discovered device to be commissioned over BLE
[1654155263.125633][6987:6987] CHIP:CTL: Stopping commissioning discovery over DNS-SD
[1654155263.125660][6987:6987] CHIP:CTL: Attempting PASE connection to BLE
[1654155263.125671][6987:6987] CHIP:CTL: Stopping commissioning discovery over DNS-SD
DEBUG: CommandPairing::OnPairingComplete(error=0)
[1654155263.125683][6987:6987] CHIP:TOO: Pairing Success
[1654155263.125690][6987:6987] CHIP:CTL: Invalid device for commissioning 0000000000000002
[1654155263.125710][6987:6987] CHIP:-: ../../src/controller/CHIPDeviceController.cpp:679: CHIP Error 0x00000003: Incorrect state at ../../src/controller/SetUpCodePairer.cpp:232
[1654155264.574502][6987:6987] CHIP:DIS: Warning: Attempt to mDNS broadcast failed on enp1s0:  ../../src/inet/UDPEndPointImplSockets.cpp:417: OS Error 0x02000065: Network is unreachable
[1654155264.574998][6987:6987] CHIP:DIS: mDNS broadcast had only partial success: 4 successes and 1 failures.
[1654155268.575997][6987:6987] CHIP:DIS: Warning: Attempt to mDNS broadcast failed on enp1s0:  ../../src/inet/UDPEndPointImplSockets.cpp:417: OS Error 0x02000065: Network is unreachable
[1654155268.576552][6987:6987] CHIP:DIS: mDNS broadcast had only partial success: 4 successes and 1 failures.
[1654155276.581028][6987:6987] CHIP:DIS: Warning: Attempt to mDNS broadcast failed on enp1s0:  ../../src/inet/UDPEndPointImplSockets.cpp:417: OS Error 0x02000065: Network is unreachable
[1654155276.581578][6987:6987] CHIP:DIS: mDNS broadcast had only partial success: 4 successes and 1 failures.
[1654155292.582008][6987:6987] CHIP:DIS: Timeout waiting for mDNS resolution.
INFO: OnActionTerminated: (0) isPersistent: false, payload: 'null', result: (50) /home/mathieu/Projects/OTODO/matter-daemon/src/Matter/Commands/Common/CommandBase.cpp:11: CHIP Error 0x00000032: Timeout
INFO: TX: {"jsonrpc":"2.0","method":"on-command-terminated","params":{"chipError":{"message":"/home/mathieu/Projects/OTODO/matter-daemon/src/Matter/Commands/Common/CommandBase.cpp:11: CHIP Error 0x00000032: Timeout","status":50},"id":0,"persistent":false,"status":1}}
INFO: Command (0) terminated on error (/home/mathieu/Projects/OTODO/matter-daemon/src/Matter/Commands/Common/CommandBase.cpp:11: CHIP Error 0x00000032: Timeout).
DEBUG: Checking persistent commands, size: 0

Edit: It can be reproduced using linux chip-tool on TE9 commit with lighting-app example for ESP32:

./chip-tool interactive start
... wait for stack init ...
... Reset the device and wait for the device to be ready ...
>>> pairing qrcode-wifi 1 TP_9DEA 8234567890 MT:-24J042C00KA0648G00
... wait for the first commissioning to fail and BLE is stopped ...
... Reset the device and wait for the device to be ready ...
>>> pairing qrcode-wifi 2 TP_9DEA 8234567890 MT:-24J042C00KA0648G00

@bzbarsky-apple
Copy link
Contributor

bzbarsky-apple commented Jun 3, 2022

Edit: It can be reproduced using linux chip-tool on TE9 commit with lighting-app example for ESP32:

Perfect. This indeed reproduces things for me.

So what happens is that we land in DeviceCommissioner::EstablishPASEConnection for the second commissioning attempt. Our peer address is "BLE", which is basically a singleton as far as I can tell. We do FindCommissioneeDevice(peerAddress) and find a non-null thing, left over from the previous commissioning attempt. The reason that's still there is that we failed after AddNOC, so we can't just create a new PASE session to the same device, since it won't let us AddNOC again. So the PASE session remains open even on failure. So far so good.

Then we check current->IsSecureConnected() and get true. Presumably we have no way to actually tell that the BLE session was closed? @tcarmelveilleux do you know who might know about those bits? Anyway, this is the first place where we could reasonably recover, if we knew that in fact this BLE connection is dead.

Since IsSecureConnected() is true, EstablishPASEConnection decides we have a PASE session.

Now we call Commission. This looks for a commissionee device for the given device id (2 in this case). It does not find one, because the one we had before was for device id 1. So it errors out as described above.

Now if on the second commissioning we passed in node id 1, not 2, then we would get past this point but then we still think we have a PASE session, but we actually do not, that would time out.

And if I don't reset the device, so the BLE session is still alive, I get #19138 which needs to get sorted out separately.

Anyway, one obvious thing we should do is drop the existing device in EstablishPASEConnection if the device id does not match, since in that case it's not the device we are looking for. But that will not help with the case when the caller does in fact pass in the same device id again...

bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Jun 3, 2022
Found during investigation of
project-chip#15409 but there
may be more problems left in that issue.

If we get into EstablishPASEConnection and find an existing
commissionee for the peer address (which is easy for BLE, where the
peer address is basically a singleton), but the device id does not
match the passed-in one, we should not reuse that device.  If we try
to, Commission will fail to find the commissionee device by device id
and fail out.
@bzbarsky-apple
Copy link
Contributor

#19141 makes the "try again with a different device id" case work, at least.

bzbarsky-apple added a commit that referenced this issue Jun 6, 2022
)

Found during investigation of
#15409 but there
may be more problems left in that issue.

If we get into EstablishPASEConnection and find an existing
commissionee for the peer address (which is easy for BLE, where the
peer address is basically a singleton), but the device id does not
match the passed-in one, we should not reuse that device.  If we try
to, Commission will fail to find the commissionee device by device id
and fail out.
@caipiblack
Copy link
Contributor Author

Today during the commissioning of a thread device it has failed at the step 18:

../../src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:174: CHIP Error 0x00000032: Timeout

Once we get this error, all subsequent pairing was failing:

../../src/controller/CHIPDeviceController.cpp:679: CHIP Error 0x00000003: Incorrect state at ../../src/controller/SetUpCodePairer.cpp:232

But now I think it's more related to this issue generic issue : #16868

@bzbarsky-apple
Copy link
Contributor

Once we get this error, all subsequent pairing was failing:

When trying to pair with the same node id, or a different one?

Note that the comment in #16868 talks about the "TE9 commit", which does not have the fix from #19141.... Again, that fix should make things work as long as a new node id is used.

@caipiblack
Copy link
Contributor Author

I can confirm that your fix solve the problem in my case (as I increment the nodeId on each commissioning even if it fails).

@franck-apple franck-apple added the p1 priority 1 work label Oct 24, 2022
@woody-apple
Copy link
Contributor

Issue Scrub: Looks like this is resolved, closing. Please re-open if this is still an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p1 priority 1 work V1.0
Projects
None yet
Development

No branches or pull requests

4 participants