-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MRP failures trying to commission all-clusters-app on m5stack #17471
Comments
So on the client side we send the Sigma3 at monotonic time 3597350996 and then do the resends at:
which are ~300ms, ~300ms, and ~400ms apart previous message. Then we decide the message has failed:
We then eventually get the acks and treat them as unsolicited messages. We don't log timestamps on those, but it's after we send the message with message counter 2794651822, which happens at monotonic time 3597354263, so 3200ms or so after we sent Sigma3. On the server, we send sigma2 at monotonic time 35169, then receive a Sigma3 (time not logged), then send a StatusReport responding to the Sigma3 at monotonic time 38607, so 3450ms or so after sending Sigma2. So one plausible hypothesis is that Sigma3 processing takes a long time (several seconds) and happens on the Matter thread, so we never get a chance to send a standalone ack to the Sigma3 even though our processing is taking longer than the standalone ack timeout.... Then we finally finish whatever work it is Sigma3 processing does and all the piled-up resends get delivered... And it's possible that we did more backing off on our unauthenticated resends before, so didn't run into this problem. I expect a similar problem occurs with Sigma2 if the slow device (e.g. the m5stack in this case) is the CASE initiator... |
TL;DR: CASE ultimately succeeds, but sends excessive retries and logs those message failures as warnings. |
The reason being that the crypto work happens on the thread that messages are sent/received on... |
Maybe we should have manual FlushAcks calls before we go off into the long-running crypto bits when handling sigma2/sigma3? Or can we put those crypto bits off on a separate thread? |
Suggestion from @tcarmelveilleux to explicitly call |
Overall, I would recommend the following:
|
It works, but not ideal. Since m5 has a dual core SoC, we should defer the encryption work to another core while the chip core is handling chip events. Blocking chip thread masses up lots of timers, if there is any other on going exchanges, it will suffer the same problem |
@kghost Deferring the crypto requires a ton of changes which are riskier. This is why I suggest enable that feature with a |
Yes, ideally we would do the crypto on a background task on hardware that is multi-core, but we still need something here for slow single-core hardware. The proposed plan in #17471 (comment) sounds good to me. |
… in CASE and PASE. - Added CHIP_CONFIG_SLOW_CRYPTO with default of 1 - Force send a standalone ack response to all CASE and PASE when above flag enabled. - Adjusted expected # of expected packets in CASE and PASE tests.
…PASE. (#18773) * [mrp] Fix #17471 - force send standalone ack for CASE. * [mrp] Adjust expected # of packets in CASE test. * [mrp] Add CHIP_CONFIG_SLOW_CRYPTO and support to standalone ack for PASE. * [mrp] Move CHIP_DEVICE_CONFIG_SLOW_CRYPTO to device layer config. * [mrp] Only send slow crypto ack for crypto ops. * [mrp] Call FlushAcks rather than SendStandaloneAckMessage. Co-authored-by: Boris Zbarsky <[email protected]> * [mrp] Assume fast crypto on linux and darwin. Co-authored-by: Boris Zbarsky <[email protected]>
Problem
Steps to reproduce:
scripts/examples/gn_build_example.sh examples/chip-tool out/debug/standalone/
or equivalent)Expected results: no errors.
Actual results: commissioning does succeed, but before it does that the client prints:
and the server prints:
Git bisect says this started with #17003. @turon
Proposed Solution
Figure out what's going on and why we are not seeing our acks in a timely manner.
The text was updated successfully, but these errors were encountered: