Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Commissioning over BLE often fails due to PBKDFResponse timeout #23438

Closed
Damian-Nordic opened this issue Nov 2, 2022 · 9 comments · Fixed by #23638 or #23598
Closed

[BUG] Commissioning over BLE often fails due to PBKDFResponse timeout #23438

Damian-Nordic opened this issue Nov 2, 2022 · 9 comments · Fixed by #23638 or #23598
Labels

Comments

@Damian-Nordic
Copy link
Contributor

Reproduction steps

1. Build nRF Connect Lighting Example
2. Commission it using chip-tool pairing ble-thread command

Bug prevalence

~80% attempts

GitHub hash of the SDK that was being used

fac7aae

Platform

nrf

Platform Version(s)

No response

Anything else?

This is probably caused by https://github.com/project-chip/connectedhomeip/pull/23335/files. After increasing kExpectedLowProcessingTime to 10 seconds the problem is gone. I am not sure but the timeout may not take into account the time needed to establish a connection over BLE.

@Damian-Nordic
Copy link
Contributor Author

@vivien-apple fyi

@bzbarsky-apple
Copy link
Contributor

@Damian-Nordic We currently return "5" from SecureSession::GetAckTimeout in the BLE case. It's possible that number should be higher...

That said, if this is always a problem only with the first message sent on a BLE connection, then maybe raising the timeout in PASE is the right call. I assume that "establish a BLE connection" has not happened yet by the time "discover a BLE device with this discriminator" completes?

@s07641069
Copy link
Contributor

@Damian-Nordic @bzbarsky-apple
Under Telink platform same stability issue.

@Damian-Nordic
Copy link
Contributor Author

@bzbarsky-apple I'll check tomorrow when exactly the timer starts.

@Damian-Nordic
Copy link
Contributor Author

Looks like the retransmission timer is kicked off before even discovering the device and the timeout is set to 2s exactly:

[1668072146.589415][18474:18476] CHIP:DL: XXX Start retransmission 2000
[1668072146.589426][18474:18476] CHIP:EM: <<< [E:48617i M:72150424] (U) Msg TX to 0:0000000000000000 [0000] --- Type 0000:20 (SecureChannel:PBKDFParamRequest)
[1668072146.589434][18474:18476] CHIP:IN: (U) Sending msg 72150424 to IP address 'BLE'
[1668072146.589439][18474:18476] CHIP:IN: Message appended to BLE send queue
[1668072146.589442][18474:18476] CHIP:SC: Sent PBKDF param request
...
[1668072146.593879][18474:18475] CHIP:BLE: BLE initiating scan.
[1668072146.718725][18474:18476] CHIP:DL: Long dispatch time: 129 ms, for event type 2
[1668072146.732799][18474:18475] CHIP:BLE: Device 68:95:B8:60:D7:13 does not look like a CHIP device.
[1668072146.736408][18474:18475] CHIP:BLE: Device 7F:04:69:1A:A1:47 does not look like a CHIP device.
[1668072146.737894][18474:18475] CHIP:BLE: Device 38:58:65:1E:B9:2E does not look like a CHIP device.
[1668072146.739342][18474:18475] CHIP:BLE: Device 11:1F:6F:B2:F8:77 does not look like a CHIP device.
[1668072146.744634][18474:18475] CHIP:BLE: Device 67:99:1F:A8:B4:2A does not look like a CHIP device.
[1668072146.749724][18474:18475] CHIP:BLE: Device 44:A7:71:82:65:05 does not look like a CHIP device.
[1668072146.750928][18474:18475] CHIP:BLE: Device 1F:B5:B4:9D:73:99 does not look like a CHIP device.
[1668072146.764997][18474:18475] CHIP:BLE: Device 7A:70:83:2A:EB:F5 does not look like a CHIP device.
[1668072146.770383][18474:18475] CHIP:BLE: Device 67:5C:42:95:5D:18 does not look like a CHIP device.
[1668072146.772782][18474:18475] CHIP:BLE: Device 4E:E1:0C:9B:F0:98 does not look like a CHIP device.
[1668072146.775742][18474:18475] CHIP:BLE: Device 53:A9:E2:2F:63:71 does not look like a CHIP device.
[1668072146.778850][18474:18475] CHIP:BLE: Device 46:F5:C1:46:7B:48 does not look like a CHIP device.
[1668072146.780486][18474:18475] CHIP:BLE: Device 49:2B:09:85:D2:DB does not look like a CHIP device.
[1668072146.782582][18474:18475] CHIP:BLE: Device 77:89:8C:05:11:05 does not look like a CHIP device.
[1668072146.784624][18474:18475] CHIP:BLE: Device 4D:31:9C:6D:1C:F3 does not look like a CHIP device.
[1668072146.785327][18474:18475] CHIP:BLE: Device 0D:BF:DB:22:41:3A does not look like a CHIP device.
[1668072146.788366][18474:18475] CHIP:BLE: Device 72:9E:EA:B8:2C:32 does not look like a CHIP device.
[1668072146.797155][18474:18475] CHIP:BLE: Device 69:8D:34:F7:23:00 does not look like a CHIP device.
[1668072146.809114][18474:18475] CHIP:BLE: Device 5B:AA:06:99:2F:52 does not look like a CHIP device.
[1668072146.811766][18474:18475] CHIP:BLE: Device 4C:87:5D:24:64:24 does not look like a CHIP device.
[1668072146.820884][18474:18475] CHIP:BLE: Device 6D:34:F9:4A:42:50 does not look like a CHIP device.
[1668072146.822911][18474:18475] CHIP:BLE: Device 76:B1:0F:BF:46:97 does not look like a CHIP device.
[1668072146.824390][18474:18475] CHIP:BLE: Device 7C:5F:1F:66:7E:C8 does not look like a CHIP device.
[1668072146.828022][18474:18475] CHIP:BLE: Device 6D:DF:50:70:7C:54 does not look like a CHIP device.
[1668072146.829906][18474:18475] CHIP:BLE: Device 71:0B:09:80:21:8F does not look like a CHIP device.
[1668072146.833008][18474:18475] CHIP:BLE: Device F5:C4:D9:3E:13:EE does not look like a CHIP device.
[1668072146.835894][18474:18475] CHIP:BLE: New device scanned: CA:73:5A:47:FD:8A
[1668072146.835913][18474:18475] CHIP:BLE: Device discriminator match. Attempting to connect.
[1668072146.838914][18474:18475] CHIP:BLE: Scan complete notification without an active scan.
[1668072147.504737][18474:18475] CHIP:DL: ConnectDevice complete
...
[1668072148.563341][18474:18475] CHIP:DL: New BLE connection 0x7fe968050790, device CA:73:5A:47:FD:8A, path /org/bluez/hci0/dev_CA_73_5A_47_FD_8A
...
[18474:18476] CHIP:SC: PASESession timed out while waiting for a response from the peer. Expected message type was 33

I guess the problem is that UnauthenticatedSession::GetAckTimeout returns zero for BLE. I guess we need to change it depending on if BLE connection has already been established or not. I'll try to propose something.

@bzbarsky-apple
Copy link
Contributor

UnauthenticatedSession::GetAckTimeout returns zero for BLE

Well, that's clearly no good.... SecureSession does something nonzero. If we can be smart and only return a larger number here if we have no connection, that would be awesome.

@vivien-apple
Copy link
Contributor

Hmm. Looking at the code it seems that the issue is that the PASESession timer is started right away in DeviceCommissioner::EstablishPASEConnection. In the case of chip-tool ble-thread ble discovery is using a discriminator. The search is just started and may takes seconds to discover anything :(

I suspect this is fine in when the the BLE connection is initiated with a connection object, but that is definitively not fine when it is only started with a discriminator.

@vivien-apple
Copy link
Contributor

@Damian-Nordic Can you confirm (or infirm) if #23638 helps ?

@Damian-Nordic
Copy link
Contributor Author

@vivien-apple Thanks for the change! It seems it will work after fixing the dropped pincode issue I mentioned in the PR :) Note that I also added #23598 which fixes GetAckTimeout for BLE sessions, but your changes will definitely help in the case of long BLE scanning/connection establishment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants