-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix lifetime of Darwin SubscriptionCallback to avoid shutdown crashes. #22324
Merged
bzbarsky-apple
merged 1 commit into
project-chip:master
from
bzbarsky-apple:fix-darwin-subscription-callback
Sep 1, 2022
Merged
Fix lifetime of Darwin SubscriptionCallback to avoid shutdown crashes. #22324
bzbarsky-apple
merged 1 commit into
project-chip:master
from
bzbarsky-apple:fix-darwin-subscription-callback
Sep 1, 2022
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The basic issue we could run into is that the Matter stack would shut down while our async block was still running on our client queue, and by the time the "delete this object" block was queued on the Matter queue that queue would be paused. Then if the stack was restarted the queue would be unpaused, and the deletion of the ReadClient would happen early in stack startup, when things were not in a good state yet. The fix is to make sure we queue the async deletion without going through the client queue first, and avoid doing the async bits altogether when we can (when the subscription itself errors out). Fixes project-chip#22320
pullapprove
bot
requested review from
anush-apple,
arkq,
Byungjoo-Lee,
carol-apple,
chrisdecenzo,
chshu,
chulspro,
Damian-Nordic,
dhrishi,
electrocucaracha,
emargolis,
erjiaqing,
franck-apple,
gjc13,
harimau-qirex,
harsha-rajendran,
hawk248,
isiu-apple,
jelderton,
jepenven-silabs,
jmartinez-silabs,
jtung-apple,
kpschoedel,
lazarkov,
LuDuda,
mlepage-google,
mrjerryjohns,
msandstedt and
rgoliver
August 31, 2022 21:52
pullapprove
bot
requested review from
saurabhst,
selissia,
tcarmelveilleux,
tecimovic,
tehampson,
vijs,
vivien-apple,
wbschiller,
woody-apple,
xylophone21 and
yufengwangca
August 31, 2022 21:52
PR #22324: Size comparison from 8873a20 to e3bb221 Increases (3 builds for cc13x2_26x2, psoc6, telink)
Decreases (4 builds for cc13x2_26x2, psoc6, qpg)
Full report (45 builds for bl602, cc13x2_26x2, cyw30739, efr32, esp32, k32w, linux, mbed, nrfconnect, psoc6, qpg, telink)
|
jmartinez-silabs
approved these changes
Aug 31, 2022
jtung-apple
approved these changes
Sep 1, 2022
isiu-apple
pushed a commit
to isiu-apple/connectedhomeip
that referenced
this pull request
Sep 16, 2022
project-chip#22324) The basic issue we could run into is that the Matter stack would shut down while our async block was still running on our client queue, and by the time the "delete this object" block was queued on the Matter queue that queue would be paused. Then if the stack was restarted the queue would be unpaused, and the deletion of the ReadClient would happen early in stack startup, when things were not in a good state yet. The fix is to make sure we queue the async deletion without going through the client queue first, and avoid doing the async bits altogether when we can (when the subscription itself errors out). Fixes project-chip#22320
bzbarsky-apple
added a commit
to bzbarsky-apple/connectedhomeip
that referenced
this pull request
Oct 7, 2022
project-chip#22978 accidentally reintroduced the crash that project-chip#22324 had fixed. To avoid more issues along these lines: 1) Add unit tests that reproduce the crashes described in project-chip#22320 (with the changes from project-chip#22978) and project-chip#22935 (without those changes). 2) Change MTRBaseSubscriptionCallback to always invoke its callbacks synchronously, on the Matter queue, so that we can clean up the MTRClusterStateCacheContainer's pointer to the ClusterStateCache before it gets deleted on the Matter queue. 3) Move the queueing of callbacks to the client queue into the consumers of MTRBaseSubscriptionCallback, so they can do whatever sync work they need (like the above cleanup) before going async. 4) Update documentation.
andy31415
pushed a commit
that referenced
this pull request
Oct 11, 2022
#22978 accidentally reintroduced the crash that #22324 had fixed. To avoid more issues along these lines: 1) Add unit tests that reproduce the crashes described in #22320 (with the changes from #22978) and #22935 (without those changes). 2) Change MTRBaseSubscriptionCallback to always invoke its callbacks synchronously, on the Matter queue, so that we can clean up the MTRClusterStateCacheContainer's pointer to the ClusterStateCache before it gets deleted on the Matter queue. 3) Move the queueing of callbacks to the client queue into the consumers of MTRBaseSubscriptionCallback, so they can do whatever sync work they need (like the above cleanup) before going async. 4) Update documentation.
selissia
pushed a commit
to selissia/connectedhomeip
that referenced
this pull request
Oct 12, 2022
…hip#23076) project-chip#22978 accidentally reintroduced the crash that project-chip#22324 had fixed. To avoid more issues along these lines: 1) Add unit tests that reproduce the crashes described in project-chip#22320 (with the changes from project-chip#22978) and project-chip#22935 (without those changes). 2) Change MTRBaseSubscriptionCallback to always invoke its callbacks synchronously, on the Matter queue, so that we can clean up the MTRClusterStateCacheContainer's pointer to the ClusterStateCache before it gets deleted on the Matter queue. 3) Move the queueing of callbacks to the client queue into the consumers of MTRBaseSubscriptionCallback, so they can do whatever sync work they need (like the above cleanup) before going async. 4) Update documentation.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The basic issue we could run into is that the Matter stack would shut down
while our async block was still running on our client queue, and by the time
the "delete this object" block was queued on the Matter queue that queue would
be paused. Then if the stack was restarted the queue would be unpaused, and
the deletion of the ReadClient would happen early in stack startup, when things
were not in a good state yet.
The fix is to make sure we queue the async deletion without going through the
client queue first, and avoid doing the async bits altogether when we can (when
the subscription itself errors out).
Fixes #22320
Problem
Common shutdown crashes.
Change overview
See above.
Testing
Used the steps in #22320 (comment) to verify that the crash happens without these changes and does not happen with these changes.