-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] CASE timeout is to long on LIT ICD #35354
Comments
This should happen after 30+ seconds, when the device stops waiting for a response to the Sigma2 that it sent. @ldoering when you "Wait some time (at least the expected case timeout)", what was the actual wait time and where did the "expected case timeout" come from? Can you please attach, not paste, the logs from the device that was the CASE responder in this case? |
I expected the session to fail after 30s (observing the retries after several seconds) and waited multiple minutes without observing a timeout. To get the actual expected timeout I added logging to SetResponseTimeout in ExchangeContext.cpp. This timeout ist much longer than expected, it it based on the CHIP_DEVICE_CONFIG_ICD_SLOW_POLL_INTERVAL which is set to 3600000ms (1 hour) in the silabs LIT-ICD sample app (examples/lit-icd-app/silabs/openthread.gni), with additional backoff. What kind of timeouts are expected in the LIT case? Is the prolonged interval a bug or by design? (I assume a bug, since it also prevents the ICD from sleeping) |
Hmm.
in
What values are you seeing passed to |
@ldoering And definitely a timeout of 45896846s for a Sigma2 seems like a bug. |
For the LIT-Mode:
I also disabled the LIT Mode and received a timeout of 281907 ms for the SIT configuration.
I also attached the full device log for both modes: |
@ldoering Is I assume |
Oh, this is interesting. So
Does fixing the type of the |
Ah, this even came up in the review in #33093 but it was missed that this is in fact a serious problem because the actual timestamp might not fit in a 32-bit millisecond counter.... That said, I can't quite reconcile this possible issue with the values for |
Sorry to confuse you there. I converted the timestamps to 32bit timeout for printing since the lib did not support direct 64bit printing. I assumed that this would be fine, since all tests run directly after reboot, so 32bit should be sufficient to count the runtime in ms. To me it looks like a problem to use the LIT idle time to calculate timeouts. |
Sure, but why is it being used? I wonder whether one issue is that the threshold (5000) is so close to the active interval (4000). What that's saying is:
So when computing how MRP will work here, assuming we start in active mode.
What the device actually does is that it never transitions to idle mode at all. But that's not what it claims to do, either to the other side or to itself.... @turon @Damian-Nordic @mkardous-silabs we need to figure out what should actually be happening here, because this is clearly broken. |
The question is if we should treat the active threshold shorter than all the MRP backoffs (using the active interval) as misconfiguration and we should warn a user if such is configured, or we should just use it to cancel a transmission earlier (before retransmitting a message 4 times). In the latter case, we could make
However, then Btw, |
As @Damian-Nordic pointed out on Slack, |
Reproduction steps
After sending/receiving a Sigma1 (but no follow up). a CASE Session is opened but no timeout is ever triggered. This leads to an error when trying to etablish a new CASE Sesssion;
[SC] Received error (protocol code 4) during pairing process: src/protocols/secure_channel/CASESession.cpp:2144: CHIP Error 0x000000DB: The Resource is busy and cannot process the request
It also prevents an ICD to leave active mode, draining the battery (#35355).
I reproduced the issue with the LIT-ICD App on the Silabs EFR32 Platform, compiled with the Following Parameters: --icd --low-power enable_openthread_cli=false chip_enable_ota_requestor=true chip_persist_subscriptions=false
I also enabled the openthread log, so I can see poll atempts to check the interval.
The Issue can also be observed in the reverse way, if the DUT triggered the CASE Session, but I did not manage to do that with an unmodified example.
Bug prevalence
Everytime I ran the listed steps.
GitHub hash of the SDK that was being used
0e3434a
Platform
efr32
Platform Version(s)
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: