-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Queued deletion setup on Darwin leads to crashes #22320
Comments
This was referenced Aug 31, 2022
bzbarsky-apple
added a commit
to bzbarsky-apple/connectedhomeip
that referenced
this issue
Aug 31, 2022
The basic issue we could run into is that the Matter stack would shut down while our async block was still running on our client queue, and by the time the "delete this object" block was queued on the Matter queue that queue would be paused. Then if the stack was restarted the queue would be unpaused, and the deletion of the ReadClient would happen early in stack startup, when things were not in a good state yet. The fix is to make sure we queue the async deletion without going through the client queue first, and avoid doing the async bits altogether when we can (when the subscription itself errors out). Fixes project-chip#22320
bzbarsky-apple
added a commit
to bzbarsky-apple/connectedhomeip
that referenced
this issue
Aug 31, 2022
The basic issue we could run into is that the Matter stack would shut down while our async block was still running on our client queue, and by the time the "delete this object" block was queued on the Matter queue that queue would be paused. Then if the stack was restarted the queue would be unpaused, and the deletion of the ReadClient would happen early in stack startup, when things were not in a good state yet. The fix is to make sure we queue the async deletion without going through the client queue first, and avoid doing the async bits altogether when we can (when the subscription itself errors out). Fixes project-chip#22320
bzbarsky-apple
added a commit
that referenced
this issue
Sep 1, 2022
#22324) The basic issue we could run into is that the Matter stack would shut down while our async block was still running on our client queue, and by the time the "delete this object" block was queued on the Matter queue that queue would be paused. Then if the stack was restarted the queue would be unpaused, and the deletion of the ReadClient would happen early in stack startup, when things were not in a good state yet. The fix is to make sure we queue the async deletion without going through the client queue first, and avoid doing the async bits altogether when we can (when the subscription itself errors out). Fixes #22320
This was referenced Sep 1, 2022
isiu-apple
pushed a commit
to isiu-apple/connectedhomeip
that referenced
this issue
Sep 16, 2022
project-chip#22324) The basic issue we could run into is that the Matter stack would shut down while our async block was still running on our client queue, and by the time the "delete this object" block was queued on the Matter queue that queue would be paused. Then if the stack was restarted the queue would be unpaused, and the deletion of the ReadClient would happen early in stack startup, when things were not in a good state yet. The fix is to make sure we queue the async deletion without going through the client queue first, and avoid doing the async bits altogether when we can (when the subscription itself errors out). Fixes project-chip#22320
bzbarsky-apple
added a commit
to bzbarsky-apple/connectedhomeip
that referenced
this issue
Oct 7, 2022
project-chip#22978 accidentally reintroduced the crash that project-chip#22324 had fixed. To avoid more issues along these lines: 1) Add unit tests that reproduce the crashes described in project-chip#22320 (with the changes from project-chip#22978) and project-chip#22935 (without those changes). 2) Change MTRBaseSubscriptionCallback to always invoke its callbacks synchronously, on the Matter queue, so that we can clean up the MTRClusterStateCacheContainer's pointer to the ClusterStateCache before it gets deleted on the Matter queue. 3) Move the queueing of callbacks to the client queue into the consumers of MTRBaseSubscriptionCallback, so they can do whatever sync work they need (like the above cleanup) before going async. 4) Update documentation.
andy31415
pushed a commit
that referenced
this issue
Oct 11, 2022
#22978 accidentally reintroduced the crash that #22324 had fixed. To avoid more issues along these lines: 1) Add unit tests that reproduce the crashes described in #22320 (with the changes from #22978) and #22935 (without those changes). 2) Change MTRBaseSubscriptionCallback to always invoke its callbacks synchronously, on the Matter queue, so that we can clean up the MTRClusterStateCacheContainer's pointer to the ClusterStateCache before it gets deleted on the Matter queue. 3) Move the queueing of callbacks to the client queue into the consumers of MTRBaseSubscriptionCallback, so they can do whatever sync work they need (like the above cleanup) before going async. 4) Update documentation.
selissia
pushed a commit
to selissia/connectedhomeip
that referenced
this issue
Oct 12, 2022
…hip#23076) project-chip#22978 accidentally reintroduced the crash that project-chip#22324 had fixed. To avoid more issues along these lines: 1) Add unit tests that reproduce the crashes described in project-chip#22320 (with the changes from project-chip#22978) and project-chip#22935 (without those changes). 2) Change MTRBaseSubscriptionCallback to always invoke its callbacks synchronously, on the Matter queue, so that we can clean up the MTRClusterStateCacheContainer's pointer to the ClusterStateCache before it gets deleted on the Matter queue. 3) Move the queueing of callbacks to the client queue into the consumers of MTRBaseSubscriptionCallback, so they can do whatever sync work they need (like the above cleanup) before going async. 4) Update documentation.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Problem
Steps to reproduce:
sleep(5)
inCHIPCommandBridge::RestartCommissioners
between the controller shutdowns and the controller restarts, unless something has been added since then to make this possible without code changes.sleep(10)
at the beginning ofReadHandler::SendReportData
when!aReadMoreChunks
is true.darwin-framework-tool interactive start
.any subscribe-all-events 1 2 1 0xFFFF
This crashes with the stack observed in #20085
What's going on is the following sequence of events:
ReadClient
's secure session.SubscriptionCallback::ReportError
, which setsmHaveQueuedDeletion
to true and queues a task to do the following:a. Call our error callback.
b. Queue a task to the Matter queue to delete the
SubcriptionCallback
, and hence theReadClient
.SubscriptionCallback::OnDone
and it's a no-op becausemHaveQueuedDeletion
is true.SubscriptionCallback
on the Matter queue. But that queue is paused, so the block sits there.Proposed Solution
Still figuring this out.
The text was updated successfully, but these errors were encountered: