Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix lifetime of Darwin SubscriptionCallback to avoid shutdown crashes. #22324

Conversation

bzbarsky-apple
Copy link
Contributor

The basic issue we could run into is that the Matter stack would shut down
while our async block was still running on our client queue, and by the time
the "delete this object" block was queued on the Matter queue that queue would
be paused. Then if the stack was restarted the queue would be unpaused, and
the deletion of the ReadClient would happen early in stack startup, when things
were not in a good state yet.

The fix is to make sure we queue the async deletion without going through the
client queue first, and avoid doing the async bits altogether when we can (when
the subscription itself errors out).

Fixes #22320

Problem

Common shutdown crashes.

Change overview

See above.

Testing

Used the steps in #22320 (comment) to verify that the crash happens without these changes and does not happen with these changes.

The basic issue we could run into is that the Matter stack would shut down
while our async block was still running on our client queue, and by the time
the "delete this object" block was queued on the Matter queue that queue would
be paused.  Then if the stack was restarted the queue would be unpaused, and
the deletion of the ReadClient would happen early in stack startup, when things
were not in a good state yet.

The fix is to make sure we queue the async deletion without going through the
client queue first, and avoid doing the async bits altogether when we can (when
the subscription itself errors out).

Fixes project-chip#22320
@github-actions
Copy link

github-actions bot commented Aug 31, 2022

PR #22324: Size comparison from 8873a20 to e3bb221

Increases (3 builds for cc13x2_26x2, psoc6, telink)
platform target config section 8873a20 e3bb221 change % change
cc13x2_26x2 pump-controller-app LP_CC2652R7 (read/write) 173044 173052 8 0.0
psoc6 lock cy8ckit_062s2_43012 .debug_info 22287270 22287271 1 0.0
telink lighting-app tlsr9518adk80d text 589456 589458 2 0.0
Decreases (4 builds for cc13x2_26x2, psoc6, qpg)
platform target config section 8873a20 e3bb221 change % change
cc13x2_26x2 pump-controller-app LP_CC2652R7 (read only) 669491 669483 -8 -0.0
.text 583504 583496 -8 -0.0
psoc6 all-clusters cy8ckit_062s2_43012 .debug_info 26708620 26708618 -2 -0.0
all-clusters-minimal cy8ckit_062s2_43012 .debug_info 26445243 26445242 -1 -0.0
qpg lighting-app qpg6105+debug (read/write) 1129092 1129084 -8 -0.0
.text 576188 576180 -8 -0.0
Full report (45 builds for bl602, cc13x2_26x2, cyw30739, efr32, esp32, k32w, linux, mbed, nrfconnect, psoc6, qpg, telink)
platform target config section 8873a20 e3bb221 change % change
bl602 lighting-app bl602 (read/write) 1385934 1385934 0 0.0
.bss 120298 120298 0 0.0
.data 4488 4488 0 0.0
.text 1052316 1052316 0 0.0
bl602+rpc (read/write) 1431582 1431582 0 0.0
.bss 127730 127730 0 0.0
.data 4600 4600 0 0.0
.text 1084076 1084076 0 0.0
cc13x2_26x2 all-clusters-app LP_CC2652R7 (read only) 675271 675271 0 0.0
(read/write) 176136 176136 0 0.0
.bss 74300 74300 0 0.0
.data 3380 3380 0 0.0
.rodata 89231 89231 0 0.0
.text 585728 585728 0 0.0
all-clusters-minimal-app LP_CC2652R7 (read only) 639975 639975 0 0.0
(read/write) 157868 157868 0 0.0
.bss 73572 73572 0 0.0
.data 3380 3380 0 0.0
.rodata 78375 78375 0 0.0
.text 561280 561280 0 0.0
lock-ftd LP_CC2652R7 (read only) 676323 676323 0 0.0
(read/write) 165396 165396 0 0.0
.bss 71500 71500 0 0.0
.data 3304 3304 0 0.0
.rodata 77075 77075 0 0.0
.text 598768 598768 0 0.0
lock-mtd LP_CC2652R7 (read only) 659307 659307 0 0.0
(read/write) 178100 178100 0 0.0
.bss 67188 67188 0 0.0
.data 3304 3304 0 0.0
.rodata 102339 102339 0 0.0
.text 556488 556488 0 0.0
pump-app LP_CC2652R7 (read only) 684975 684975 0 0.0
(read/write) 157448 157448 0 0.0
.bss 71436 71436 0 0.0
.data 3296 3296 0 0.0
.rodata 89951 89951 0 0.0
.text 594540 594540 0 0.0
pump-controller-app LP_CC2652R7 (read only) 669491 669483 -8 -0.0
(read/write) 173044 173052 8 0.0
.bss 71548 71548 0 0.0
.data 3292 3292 0 0.0
.rodata 85507 85507 0 0.0
.text 583504 583496 -8 -0.0
shell LP_CC2652R7 (read only) 665914 665914 0 0.0
(read/write) 181012 181012 0 0.0
.bss 76620 76620 0 0.0
.data 3376 3376 0 0.0
.rodata 85786 85786 0 0.0
.text 579812 579812 0 0.0
cyw30739 light cyw930739m2evb_01 (read/write) 586786 586786 0 0.0
.app_xip_area 463444 463444 0 0.0
.bss 65776 65776 0 0.0
.data 744 744 0 0.0
.rodata 0 0 0 0.0
.text 112 112 0 0.0
lock cyw930739m2evb_01 (read/write) 592538 592538 0 0.0
.app_xip_area 464412 464412 0 0.0
.bss 70560 70560 0 0.0
.data 748 748 0 0.0
.rodata 0 0 0 0.0
.text 112 112 0 0.0
ota-requestor-no-progress-logging cyw930739m2evb_01 (read/write) 599570 599570 0 0.0
.app_xip_area 476948 476948 0 0.0
.bss 65088 65088 0 0.0
.data 716 716 0 0.0
.rodata 0 0 0 0.0
.text 112 112 0 0.0
efr32 lighting-app BRD4161A (read/write) 1108016 1108016 0 0.0
.bss 136332 136332 0 0.0
.data 2072 2072 0 0.0
.text 969592 969592 0 0.0
BRD4161A+rpc (read/write) 971892 971892 0 0.0
.bss 150844 150844 0 0.0
.data 2252 2252 0 0.0
.text 818776 818776 0 0.0
BRD4161A+rs911x (read/write) 1001708 1001708 0 0.0
.bss 169168 169168 0 0.0
.data 2064 2064 0 0.0
.text 830456 830456 0 0.0
lock-app BRD4161A+wf200 (read/write) 1150008 1150008 0 0.0
.bss 152248 152248 0 0.0
.data 2072 2072 0 0.0
.text 995668 995668 0 0.0
window-app BRD4161A (read/write) 1099272 1099272 0 0.0
.bss 137772 137772 0 0.0
.data 2096 2096 0 0.0
.text 959384 959384 0 0.0
esp32 all-clusters-app c3devkit (read only) 1033730 1033730 0 0.0
(read/write) 1493518 1493518 0 0.0
.dram0.bss 71120 71120 0 0.0
.dram0.data 13696 13696 0 0.0
.flash.rodata 218032 218032 0 0.0
.flash.text 1033730 1033730 0 0.0
.iram0.text 65204 65204 0 0.0
m5stack (read only) 1086083 1086083 0 0.0
(read/write) 490804 490804 0 0.0
.dram0.bss 76640 76640 0 0.0
.dram0.data 34152 34152 0 0.0
.flash.rodata 247344 247344 0 0.0
.flash.text 1080699 1080699 0 0.0
.iram0.text 123939 123939 0 0.0
k32w light k32w0+release (read/write) 648108 648108 0 0.0
.bss 70712 70712 0 0.0
.data 2068 2068 0 0.0
.text 572600 572600 0 0.0
lock k32w0+release (read/write) 705112 705112 0 0.0
.bss 71160 71160 0 0.0
.data 2076 2076 0 0.0
.text 629148 629148 0 0.0
linux all-clusters-app debug (read only) 3043945 3043945 0 0.0
(read/write) 156032 156032 0 0.0
.bss 61792 61792 0 0.0
.data 2096 2096 0 0.0
.data.rel.ro 85768 85768 0 0.0
.dynamic 608 608 0 0.0
.got 4568 4568 0 0.0
.init 27 27 0 0.0
.init_array 1176 1176 0 0.0
.rodata 275307 275307 0 0.0
.text 2589202 2589202 0 0.0
all-clusters-minimal-app debug (read only) 2879745 2879745 0 0.0
(read/write) 147632 147632 0 0.0
.bss 61024 61024 0 0.0
.data 2064 2064 0 0.0
.data.rel.ro 78264 78264 0 0.0
.dynamic 608 608 0 0.0
.got 4488 4488 0 0.0
.init 27 27 0 0.0
.init_array 1160 1160 0 0.0
.rodata 275467 275467 0 0.0
.text 2427618 2427618 0 0.0
bridge-app debug+rpc (read only) 2378009 2378009 0 0.0
(read/write) 127752 127752 0 0.0
.bss 50656 50656 0 0.0
.data 3600 3600 0 0.0
.data.rel.ro 67640 67640 0 0.0
.dynamic 608 608 0 0.0
.got 4392 4392 0 0.0
.init 27 27 0 0.0
.init_array 832 832 0 0.0
.rodata 204168 204168 0 0.0
.text 2010930 2010930 0 0.0
chip-tool debug (read only) 10909881 10909881 0 0.0
(read/write) 657320 657320 0 0.0
.bss 25240 25240 0 0.0
.data 3266 3266 0 0.0
.data.rel.ro 622288 622288 0 0.0
.dynamic 608 608 0 0.0
.got 5096 5096 0 0.0
.init 27 27 0 0.0
.init_array 776 776 0 0.0
.rodata 564245 564245 0 0.0
.text 8824820 8824820 0 0.0
chip-tool-ipv6only arm64 (read only) 10290484 10290484 0 0.0
(read/write) 705169 705169 0 0.0
.bss 33297 33297 0 0.0
.data 3280 3280 0 0.0
.data.rel.ro 649784 649784 0 0.0
.dynamic 560 560 0 0.0
.got 13840 13840 0 0.0
.init 24 24 0 0.0
.init_array 200 200 0 0.0
.rodata 494708 494708 0 0.0
.text 8143364 8143364 0 0.0
lighting-app debug+rpc (read only) 2602937 2602937 0 0.0
(read/write) 130536 130536 0 0.0
.bss 49792 49792 0 0.0
.data 2096 2096 0 0.0
.data.rel.ro 72680 72680 0 0.0
.dynamic 608 608 0 0.0
.got 4392 4392 0 0.0
.init 27 27 0 0.0
.init_array 928 928 0 0.0
.rodata 221008 221008 0 0.0
.text 2210738 2210738 0 0.0
lock-app debug (read only) 2585905 2585905 0 0.0
(read/write) 125712 125712 0 0.0
.bss 48288 48288 0 0.0
.data 1712 1712 0 0.0
.data.rel.ro 69688 69688 0 0.0
.dynamic 608 608 0 0.0
.got 4464 4464 0 0.0
.init 27 27 0 0.0
.init_array 904 904 0 0.0
.rodata 238000 238000 0 0.0
.text 2180962 2180962 0 0.0
ota-provider-app debug (read only) 2363161 2363161 0 0.0
(read/write) 119144 119144 0 0.0
.bss 47808 47808 0 0.0
.data 1936 1936 0 0.0
.data.rel.ro 63512 63512 0 0.0
.dynamic 608 608 0 0.0
.got 4488 4488 0 0.0
.init 27 27 0 0.0
.init_array 768 768 0 0.0
.rodata 209976 209976 0 0.0
.text 1989426 1989426 0 0.0
ota-requestor-app debug (read only) 2528409 2528409 0 0.0
(read/write) 127552 127552 0 0.0
.bss 50368 50368 0 0.0
.data 2304 2304 0 0.0
.data.rel.ro 68920 68920 0 0.0
.dynamic 608 608 0 0.0
.got 4480 4480 0 0.0
.init 27 27 0 0.0
.init_array 856 856 0 0.0
.rodata 216704 216704 0 0.0
.text 2138802 2138802 0 0.0
shell debug (read only) 2612249 2612249 0 0.0
(read/write) 142184 142184 0 0.0
.bss 57704 57704 0 0.0
.data 1264 1264 0 0.0
.data.rel.ro 77376 77376 0 0.0
.dynamic 608 608 0 0.0
.got 4136 4136 0 0.0
.init 27 27 0 0.0
.init_array 1048 1048 0 0.0
.rodata 235410 235410 0 0.0
.text 2218130 2218130 0 0.0
thermostat-no-ble arm64 (read only) 2361956 2361956 0 0.0
(read/write) 141857 141857 0 0.0
.bss 55233 55233 0 0.0
.data 1680 1680 0 0.0
.data.rel.ro 76112 76112 0 0.0
.dynamic 560 560 0 0.0
.got 5056 5056 0 0.0
.init 24 24 0 0.0
.init_array 416 416 0 0.0
.rodata 141276 141276 0 0.0
.text 1982640 1982640 0 0.0
tv-app debug (read only) 3190409 3190409 0 0.0
(read/write) 258040 258040 0 0.0
.bss 167352 167352 0 0.0
.data 4752 4752 0 0.0
.data.rel.ro 79368 79368 0 0.0
.dynamic 608 608 0 0.0
.got 4856 4856 0 0.0
.init 27 27 0 0.0
.init_array 1080 1080 0 0.0
.rodata 259976 259976 0 0.0
.text 2740402 2740402 0 0.0
tv-casting-app debug (read only) 5508801 5508801 0 0.0
(read/write) 160536 160536 0 0.0
.bss 51352 51352 0 0.0
.data 2432 2432 0 0.0
.data.rel.ro 100304 100304 0 0.0
.dynamic 608 608 0 0.0
.got 4776 4776 0 0.0
.init 27 27 0 0.0
.init_array 1048 1048 0 0.0
.rodata 344913 344913 0 0.0
.text 4891986 4891986 0 0.0
mbed lock-app CY8CPROTO_062_4343W+release (read only) 6224 6224 0 0.0
(read/write) 2454936 2454936 0 0.0
.bss 215044 215044 0 0.0
.data 5872 5872 0 0.0
.text 1417580 1417580 0 0.0
nrfconnect all-clusters-app nrf52840dk_nrf52840 (read/write) 1180835 1180835 0 0.0
bss 143641 143641 0 0.0
rodata 143380 143380 0 0.0
text 814872 814872 0 0.0
all-clusters-minimal-app nrf52840dk_nrf52840 (read/write) 1160015 1160015 0 0.0
bss 142868 142868 0 0.0
rodata 134968 134968 0 0.0
text 803264 803264 0 0.0
psoc6 all-clusters cy8ckit_062s2_43012 (read only) 841960 841960 0 0.0
(read/write) 1742236 1742236 0 0.0
.ARM.attributes 46 46 0 0.0
.ARM.exidx 8 8 0 0.0
.bss 188720 188720 0 0.0
.comment 204 204 0 0.0
.copy.table 24 24 0 0.0
.cy_m0p_image 6216 6216 0 0.0
.cy_sharedmem 8 8 0 0.0
.data 2664 2664 0 0.0
.debug_abbrev 1221468 1221468 0 0.0
.debug_aranges 111712 111712 0 0.0
.debug_frame 372900 372900 0 0.0
.debug_info 26708620 26708618 -2 -0.0
.debug_line 3655730 3655730 0 0.0
.debug_loc 3569692 3569692 0 0.0
.debug_ranges 337616 337616 0 0.0
.debug_str 3426920 3426920 0 0.0
.heap 841960 841960 0 0.0
.noinit 148 148 0 0.0
.ramVectors 736 736 0 0.0
.shstrtab 288 288 0 0.0
.stab 156 156 0 0.0
.stabstr 335 335 0 0.0
.stack_dummy 4096 4096 0 0.0
.strtab 570576 570576 0 0.0
.symtab 421488 421488 0 0.0
.text 1542464 1542464 0 0.0
.zero.table 8 8 0 0.0
text 0 0 0 0.0
all-clusters-minimal cy8ckit_062s2_43012 (read only) 842696 842696 0 0.0
(read/write) 1685420 1685420 0 0.0
.ARM.attributes 46 46 0 0.0
.ARM.exidx 8 8 0 0.0
.bss 187984 187984 0 0.0
.comment 204 204 0 0.0
.copy.table 24 24 0 0.0
.cy_m0p_image 6216 6216 0 0.0
.cy_sharedmem 8 8 0 0.0
.data 2664 2664 0 0.0
.debug_abbrev 1213307 1213307 0 0.0
.debug_aranges 111184 111184 0 0.0
.debug_frame 375980 375980 0 0.0
.debug_info 26445243 26445242 -1 -0.0
.debug_line 3676246 3676246 0 0.0
.debug_loc 3557329 3557329 0 0.0
.debug_ranges 336232 336232 0 0.0
.debug_str 3415925 3415925 0 0.0
.heap 842696 842696 0 0.0
.noinit 148 148 0 0.0
.ramVectors 736 736 0 0.0
.shstrtab 288 288 0 0.0
.stab 156 156 0 0.0
.stabstr 335 335 0 0.0
.stack_dummy 4096 4096 0 0.0
.strtab 535050 535050 0 0.0
.symtab 408080 408080 0 0.0
.text 1486384 1486384 0 0.0
.zero.table 0 0 0 0.0
8 8 0 0.0
light cy8ckit_062s2_43012 (read only) 850928 850928 0 0.0
(read/write) 1602700 1602700 0 0.0
.ARM.attributes 46 46 0 0.0
.ARM.exidx 8 8 0 0.0
.bss 179960 179960 0 0.0
.comment 204 204 0 0.0
.copy.table 24 24 0 0.0
.cy_m0p_image 6216 6216 0 0.0
.cy_sharedmem 8 8 0 0.0
.data 2456 2456 0 0.0
.debug_abbrev 1048126 1048126 0 0.0
.debug_aranges 103360 103360 0 0.0
.debug_frame 346248 346248 0 0.0
.debug_info 21907524 21907524 0 0.0
.debug_line 3246711 3246711 0 0.0
.debug_loc 3255672 3255672 0 0.0
.debug_ranges 301704 301704 0 0.0
.debug_str 3221145 3221145 0 0.0
.heap 850928 850928 0 0.0
.noinit 148 148 0 0.0
.ramVectors 736 736 0 0.0
.shstrtab 288 288 0 0.0
.stab 156 156 0 0.0
.stabstr 335 335 0 0.0
.stack_dummy 4096 4096 0 0.0
.strtab 468349 468349 0 0.0
.symtab 375168 375168 0 0.0
.text 1411896 1411896 0 0.0
.zero.table 0 0 0 0.0
8 8 0 0.0
lock cy8ckit_062s2_43012 (read only) 845896 845896 0 0.0
(read/write) 1640380 1640380 0 0.0
.ARM.attributes 46 46 0 0.0
.ARM.exidx 8 8 0 0.0
.bss 184976 184976 0 0.0
.comment 204 204 0 0.0
.copy.table 24 24 0 0.0
.cy_m0p_image 6216 6216 0 0.0
.cy_sharedmem 8 8 0 0.0
.data 2472 2472 0 0.0
.debug_abbrev 1055561 1055561 0 0.0
.debug_aranges 104032 104032 0 0.0
.debug_frame 349076 349076 0 0.0
.debug_info 22287270 22287271 1 0.0
.debug_line 3255532 3255532 0 0.0
.debug_loc 3295525 3295525 0 0.0
.debug_ranges 305048 305048 0 0.0
.debug_str 3248566 3248566 0 0.0
.heap 845896 845896 0 0.0
.noinit 148 148 0 0.0
.ramVectors 736 736 0 0.0
.shstrtab 288 288 0 0.0
.stab 156 156 0 0.0
.stabstr 335 335 0 0.0
.stack_dummy 4096 4096 0 0.0
.strtab 474564 474564 0 0.0
.symtab 378352 378352 0 0.0
.text 1444544 1444544 0 0.0
.zero.table 0 0 0 0.0
8 8 0 0.0
qpg lighting-app qpg6105+debug (read/write) 1129092 1129084 -8 -0.0
.bss 106112 106112 0 0.0
.data 1028 1028 0 0.0
.text 576188 576180 -8 -0.0
lock-app qpg6105+debug (read/write) 1100080 1100080 0 0.0
.bss 102344 102344 0 0.0
.data 1032 1032 0 0.0
.text 547180 547180 0 0.0
telink light-switch-app tlsr9518adk80d (read/write) 808744 808744 0 0.0
bss 71344 71344 0 0.0
noinit 43488 43488 0 0.0
text 571346 571346 0 0.0
lighting-app tlsr9518adk80d (read/write) 830644 830644 0 0.0
bss 72200 72200 0 0.0
noinit 43488 43488 0 0.0
text 589456 589458 2 0.0

@bzbarsky-apple bzbarsky-apple merged commit e7486d2 into project-chip:master Sep 1, 2022
@bzbarsky-apple bzbarsky-apple deleted the fix-darwin-subscription-callback branch September 1, 2022 04:14
isiu-apple pushed a commit to isiu-apple/connectedhomeip that referenced this pull request Sep 16, 2022
project-chip#22324)

The basic issue we could run into is that the Matter stack would shut down
while our async block was still running on our client queue, and by the time
the "delete this object" block was queued on the Matter queue that queue would
be paused.  Then if the stack was restarted the queue would be unpaused, and
the deletion of the ReadClient would happen early in stack startup, when things
were not in a good state yet.

The fix is to make sure we queue the async deletion without going through the
client queue first, and avoid doing the async bits altogether when we can (when
the subscription itself errors out).

Fixes project-chip#22320
bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this pull request Oct 7, 2022
project-chip#22978 accidentally
reintroduced the crash that
project-chip#22324 had fixed.  To avoid
more issues along these lines:

1) Add unit tests that reproduce the crashes described in
   project-chip#22320 (with the
   changes from project-chip#22978) and
   project-chip#22935 (without those
   changes).
2) Change MTRBaseSubscriptionCallback to always invoke its callbacks
   synchronously, on the Matter queue, so that we can clean up the
   MTRClusterStateCacheContainer's pointer to the ClusterStateCache before it
   gets deleted on the Matter queue.
3) Move the queueing of callbacks to the client queue into the consumers of
   MTRBaseSubscriptionCallback, so they can do whatever sync work they need
   (like the above cleanup) before going async.
4) Update documentation.
andy31415 pushed a commit that referenced this pull request Oct 11, 2022
#22978 accidentally
reintroduced the crash that
#22324 had fixed.  To avoid
more issues along these lines:

1) Add unit tests that reproduce the crashes described in
   #22320 (with the
   changes from #22978) and
   #22935 (without those
   changes).
2) Change MTRBaseSubscriptionCallback to always invoke its callbacks
   synchronously, on the Matter queue, so that we can clean up the
   MTRClusterStateCacheContainer's pointer to the ClusterStateCache before it
   gets deleted on the Matter queue.
3) Move the queueing of callbacks to the client queue into the consumers of
   MTRBaseSubscriptionCallback, so they can do whatever sync work they need
   (like the above cleanup) before going async.
4) Update documentation.
selissia pushed a commit to selissia/connectedhomeip that referenced this pull request Oct 12, 2022
…hip#23076)

project-chip#22978 accidentally
reintroduced the crash that
project-chip#22324 had fixed.  To avoid
more issues along these lines:

1) Add unit tests that reproduce the crashes described in
   project-chip#22320 (with the
   changes from project-chip#22978) and
   project-chip#22935 (without those
   changes).
2) Change MTRBaseSubscriptionCallback to always invoke its callbacks
   synchronously, on the Matter queue, so that we can clean up the
   MTRClusterStateCacheContainer's pointer to the ClusterStateCache before it
   gets deleted on the Matter queue.
3) Move the queueing of callbacks to the client queue into the consumers of
   MTRBaseSubscriptionCallback, so they can do whatever sync work they need
   (like the above cleanup) before going async.
4) Update documentation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Queued deletion setup on Darwin leads to crashes
3 participants