Follow-up issues for CASE async work #26280

mlepage-google · 2023-04-27T15:45:26Z

PR #25695 introduces a refactoring (for CASESession::SendSigma3 as background/async work) that can use a bit more tweaking in subsequent PRs.

Firstly, the background work helper mechanism introduced in that PR should also be applied to the earlier HandleSigma3 changes.
--> done in #26300

WorkHelper constructor should be private, but smart pointers don't like that. Figure out if we can somehow declare sufficient friend-liness to make this build. (Note though that it should work if constructed normally...)
#25695 (comment)
--> done in #26339

There's some subtle/tricky smart/weak ptr logic going on, e.g. in WorkHandler. Document the traps and pitfalls.
#25695 (comment)
--> done in #26339

Document why fabric index is atomic, and why it's changed if we're going to cancel the work anyways. (Mostly because it's another chance to win the cancel race, and if we lose that race, theoretically the fabric index could have been reused with a different fabric, etc.)
#25695 (comment)
#25695 (comment)
--> done in #26339

Consider changing "exit" style to "return" style in SendSigma3b and HandleSigma3b, now that nothing really happens at the exit label.
--> done in #26339

Improve handling if scheduling work fails.
#25695 (comment)
Example failure: #26737

Improve handling if sending status report fails.
#25695 (comment)

WorkHelper was introduced to help with CASESession::SendSigma3. Now use it to help with CASESession::HandleSigma3. Part of issue project-chip#26280.

* Use work helper with CASE handle sigma3 WorkHelper was introduced to help with CASESession::SendSigma3. Now use it to help with CASESession::HandleSigma3. Part of issue #26280. * Unscope a block

mlepage-google · 2023-06-16T20:23:36Z

Regarding "Improve handling if scheduling work fails"...

Work is broken into A/B/C. The intent is to do A and C in foreground, and B in background. (If background processing is not enabled, then B will also be in foreground, but it's still scheduled by A, and still needs to schedule C.)

If we're in A and scheduling B fails, that's not a problem, because we handle the error (send a status report, reset the session, etc.). If we're in C, there's no more scheduling, and we already have the result of any work (including any failure that occurred during B), so again we can handle the error appropriately, as we're in the foreground. (Sending the status report might fail, but that's a separate problem that was always a problem, even before the A/B/C split.)

It's if we're in B, possibly in the background, and we cannot schedule C (on the foreground), that error handling becomes tricky. We can't just call a callback on the CASESession object, as we're not in a position to do so (background vs. foreground). We would need a ScheduleFailureCallback that is safe to call on the CASESession object, even in the background. We can't rely on mutex, because those are platform-specific, and we're in portable code. We can't rely on a timer (being created at this time), because those also would fail to schedule under the same circumstances (i.e. no more slots in event processing queue).

One potential idea is to add an atomic zombie flag to CASESession, and set that in a ScheduleFailureCallback. Then, whenever something happens in CASEServer (in the foreground), check the pinned CASESession, and reap it if it's a zombie (including sending a status report). We'd have to check for a good spot to do this, ensure we get all paths, etc.

Or, if we can ensure that nothing else happens on that pinned CASESession object in the foreground (e.g. timer goes off and it tries to clean up), then we can be sure it's safe to call a ScheduleFailureCallback from a background thread. I'd like to know (confidently) though that it's safe, and it will have to remain safe in the wake of future changes to CASESession; it should be robust not fragile.

Any other ideas?

bzbarsky-apple · 2023-06-19T21:01:07Z

if we can ensure that nothing else happens on that pinned CASESession object in the foreground

Matter stack shutdown can always happen in the foreground, right? So I don't think we can ensure that.

bzbarsky-apple · 2023-06-19T21:03:23Z

one potential idea is to add an atomic zombie flag to CASESession

I guess this would get set specifically when the dispatch fails? That seems like it might work, if we can ensure that the CASESession does not die while we are trying to set the flag.

Importantly, the CASE session itself has a pointer to the background work. So the background work could set the flag on itself and the CASE session could read the state from there, right?

mlepage-google · 2023-06-21T20:01:46Z

Yes, that's why I ended up making a separate background work object and going through the hassle of trying to manage its lifetime, because the CASESession object could be altered or more importantly destroyed at any time.

The issue though is, when is the CASESession object checking this flag? That's what I was alluding to here:

One potential idea is to add an atomic zombie flag to CASESession, and set that in a ScheduleFailureCallback. Then, whenever something happens in CASEServer (in the foreground), check the pinned CASESession, and reap it if it's a zombie (including sending a status report). We'd have to check for a good spot to do this, ensure we get all paths, etc.

bzbarsky-apple · 2023-06-22T04:07:56Z

We would need to change things to keep the unsolicited message handler registered but drop the messages in CASEServer itself if it's in the middle of a non-zombie handshake, etc.

So basically keep ourselves registered, and when Sigma1 received:

If in middle of handshake but it's zombie, tear down the handshake.
If still in middle of handshake, return without responding.
Do a new handshake.

mlepage-google mentioned this issue Apr 27, 2023

Async send sigma3 #25695

Merged

mlepage-google changed the title ~~Follow-up issues for PR #25695~~ Follow-up issues for CASE async work Apr 28, 2023

mlepage-google mentioned this issue Apr 28, 2023

Use work helper with CASE handle sigma3 #26300

Merged

mlepage-google mentioned this issue May 2, 2023

Cleanup in aisle CASESession #26339

Merged

wqx6 mentioned this issue May 23, 2023

[BUG] CASE session establishment is disabled after ScheduleWork fails in WorkHandler #26737

Closed

shubhamdp mentioned this issue Jun 14, 2023

CASE: Handle failure if unable to schedule handle/send sigma3c #27226

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Follow-up issues for CASE async work #26280

Follow-up issues for CASE async work #26280

mlepage-google commented Apr 27, 2023 •

edited

Loading

mlepage-google commented Jun 16, 2023 •

edited

Loading

bzbarsky-apple commented Jun 19, 2023

bzbarsky-apple commented Jun 19, 2023

mlepage-google commented Jun 21, 2023

bzbarsky-apple commented Jun 22, 2023

Follow-up issues for CASE async work #26280

Follow-up issues for CASE async work #26280

Comments

mlepage-google commented Apr 27, 2023 • edited Loading

mlepage-google commented Jun 16, 2023 • edited Loading

bzbarsky-apple commented Jun 19, 2023

bzbarsky-apple commented Jun 19, 2023

mlepage-google commented Jun 21, 2023

bzbarsky-apple commented Jun 22, 2023

mlepage-google commented Apr 27, 2023 •

edited

Loading

mlepage-google commented Jun 16, 2023 •

edited

Loading