-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Follow-up issues for CASE async work #26280
Comments
WorkHelper was introduced to help with CASESession::SendSigma3. Now use it to help with CASESession::HandleSigma3. Part of issue project-chip#26280.
* Use work helper with CASE handle sigma3 WorkHelper was introduced to help with CASESession::SendSigma3. Now use it to help with CASESession::HandleSigma3. Part of issue #26280. * Unscope a block
Regarding "Improve handling if scheduling work fails"... Work is broken into A/B/C. The intent is to do A and C in foreground, and B in background. (If background processing is not enabled, then B will also be in foreground, but it's still scheduled by A, and still needs to schedule C.) If we're in A and scheduling B fails, that's not a problem, because we handle the error (send a status report, reset the session, etc.). If we're in C, there's no more scheduling, and we already have the result of any work (including any failure that occurred during B), so again we can handle the error appropriately, as we're in the foreground. (Sending the status report might fail, but that's a separate problem that was always a problem, even before the A/B/C split.) It's if we're in B, possibly in the background, and we cannot schedule C (on the foreground), that error handling becomes tricky. We can't just call a callback on the CASESession object, as we're not in a position to do so (background vs. foreground). We would need a ScheduleFailureCallback that is safe to call on the CASESession object, even in the background. We can't rely on mutex, because those are platform-specific, and we're in portable code. We can't rely on a timer (being created at this time), because those also would fail to schedule under the same circumstances (i.e. no more slots in event processing queue). One potential idea is to add an atomic zombie flag to CASESession, and set that in a ScheduleFailureCallback. Then, whenever something happens in CASEServer (in the foreground), check the pinned CASESession, and reap it if it's a zombie (including sending a status report). We'd have to check for a good spot to do this, ensure we get all paths, etc. Or, if we can ensure that nothing else happens on that pinned CASESession object in the foreground (e.g. timer goes off and it tries to clean up), then we can be sure it's safe to call a ScheduleFailureCallback from a background thread. I'd like to know (confidently) though that it's safe, and it will have to remain safe in the wake of future changes to CASESession; it should be robust not fragile. Any other ideas? |
Matter stack shutdown can always happen in the foreground, right? So I don't think we can ensure that. |
I guess this would get set specifically when the dispatch fails? That seems like it might work, if we can ensure that the CASESession does not die while we are trying to set the flag. Importantly, the CASE session itself has a pointer to the background work. So the background work could set the flag on itself and the CASE session could read the state from there, right? |
Yes, that's why I ended up making a separate background work object and going through the hassle of trying to manage its lifetime, because the CASESession object could be altered or more importantly destroyed at any time. The issue though is, when is the CASESession object checking this flag? That's what I was alluding to here:
|
We would need to change things to keep the unsolicited message handler registered but drop the messages in CASEServer itself if it's in the middle of a non-zombie handshake, etc. So basically keep ourselves registered, and when Sigma1 received:
|
PR #25695 introduces a refactoring (for CASESession::SendSigma3 as background/async work) that can use a bit more tweaking in subsequent PRs.
Firstly, the background work helper mechanism introduced in that PR should also be applied to the earlier HandleSigma3 changes.
--> done in #26300
WorkHelper constructor should be private, but smart pointers don't like that. Figure out if we can somehow declare sufficient friend-liness to make this build. (Note though that it should work if constructed normally...)
#25695 (comment)
--> done in #26339
There's some subtle/tricky smart/weak ptr logic going on, e.g. in WorkHandler. Document the traps and pitfalls.
#25695 (comment)
--> done in #26339
Document why fabric index is atomic, and why it's changed if we're going to cancel the work anyways. (Mostly because it's another chance to win the cancel race, and if we lose that race, theoretically the fabric index could have been reused with a different fabric, etc.)
#25695 (comment)
#25695 (comment)
--> done in #26339
Consider changing "exit" style to "return" style in SendSigma3b and HandleSigma3b, now that nothing really happens at the exit label.
--> done in #26339
Improve handling if scheduling work fails.
#25695 (comment)
Example failure: #26737
Improve handling if sending status report fails.
#25695 (comment)
The text was updated successfully, but these errors were encountered: