Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop dropping reads on the floor if more than one happens at once. #15338

Merged

Conversation

bzbarsky-apple
Copy link
Contributor

Fixes #15304

The fix for #15304 is the change in the while loop condition in
Engine::Run. Before that change, we would compare numReadHandled to
the current count of allocated read handlers. But processing a read
handler would deallocate it, so we would only end up processing half
the read handlers that were outstanding on entry to Run (because after
that numReadHandled would be larger than the remaining number
allocated).

The change in InteractionModelEngine::OnDone and the management
of mRunningReadHandler are to handle a slightly more complicated
situation I ran into while writing some tests for this code. If we
have at least two subscriptions and some number of reads after them
pending when Engine::Run is entered, when we would process the first
read, it would be deallocated, mCurReadHandlerIdx would get reset to
0, we would then increment it by 1, and end up walking all but the
first subscription again. Which means that our numReadHandled would
increase to the point where the loop terminates before we had gotten
to all the read handlers. If we had N subscriptions we would miss N-1
read handlers. Those could then get stuck in limbo indefinitely,
until either a subscription heartbeat had to happen or someone else
issued a read.

The change in Engine::OnReportConfirm is to handle the case when we
have more than CHIP_IM_MAX_REPORTS_IN_FLIGHT subscriptions that all
need reporting, fire off the first CHIP_IM_MAX_REPORTS_IN_FLIGHT of
them, and then never call ScheduleRun() after that, so all the other
subscriptions get stuck.

The issue with CHIP_IM_MAX_REPORTS_IN_FLIGHT was not being caught by
the existing tests because those tests manually called Run() on the
reporting engine (in a loop, in fact). That was needed because we
could end up in a situation where DrainAndServiceIO() has processed
all the pending messages, but we have a queued task (from ScheduleRun)
that is not a message and will not get run, so Engine::Run was not
getting called properly via the "normal" codepath in the test. This
was fixed by removing all the manual Run() calls and making
DrainAndServiceIO() do a better job of handling async things queued by
message reception.

TestReadAttributeTimeout had to be modified slightly because in the
new setup we could no longer rely on DrainAndServiceIO() after we send
the reads not triggering the reports and giving us a chance to tear
down the session before the reports did get triggered. So instead of
first expiring the client-to-server session, we expire the
server-to-client one before doing DrainAndServiceIO. This ensures
that we never get replies to our reads, and then we can proceed to
expire the client-to-server session, which gets treated as a timeout.

Problem

See above

Change overview

See above

Testing

Yes, that's what took up most of the time for this PR.

Fixes project-chip#15304

The fix for project-chip#15304 is the change in the while loop condition in
Engine::Run.  Before that change, we would compare numReadHandled to
the current count of allocated read handlers.  But processing a read
handler would deallocate it, so we would only end up processing half
the read handlers that were outstanding on entry to Run (because after
that numReadHandled would be larger than the remaining number
allocated).

The change in InteractionModelEngine::OnDone and the management
of mRunningReadHandler are to handle a slightly more complicated
situation I ran into while writing some tests for this code.  If we
have at least two subscriptions and some number of reads after them
pending when Engine::Run is entered, when we would process the first
read, it would be deallocated, mCurReadHandlerIdx would get reset to
0, we would then increment it by 1, and end up walking all but the
first subscription again.  Which means that our numReadHandled would
increase to the point where the loop terminates before we had gotten
to all the read handlers.  If we had N subscriptions we would miss N-1
read handlers.  Those could then get stuck in limbo indefinitely,
until either a subscription heartbeat had to happen or someone else
issued a read.

The change in Engine::OnReportConfirm is to handle the case when we
have more than CHIP_IM_MAX_REPORTS_IN_FLIGHT subscriptions that all
need reporting, fire off the first CHIP_IM_MAX_REPORTS_IN_FLIGHT of
them, and then never call ScheduleRun() after that, so all the other
subscriptions get stuck.

The issue with CHIP_IM_MAX_REPORTS_IN_FLIGHT was not being caught by
the existing tests because those tests manually called Run() on the
reporting engine (in a loop, in fact).  That was needed because we
could end up in a situation where DrainAndServiceIO() has processed
all the pending messages, but we have a queued task (from ScheduleRun)
that is not a message and will not get run, so Engine::Run was not
getting called properly via the "normal" codepath in the test.  This
was fixed by removing all the manual Run() calls and making
DrainAndServiceIO() do a better job of handling async things queued by
message reception.

TestReadAttributeTimeout had to be modified slightly because in the
new setup we could no longer rely on DrainAndServiceIO() after we send
the reads not triggering the reports and giving us a chance to tear
down the session before the reports did get triggered.  So instead of
first expiring the client-to-server session, we expire the
server-to-client one _before_ doing DrainAndServiceIO.  This ensures
that we never get replies to our reads, and then we can proceed to
expire the client-to-server session, which gets treated as a timeout.
@github-actions
Copy link

github-actions bot commented Feb 18, 2022

PR #15338: Size comparison from d7bcb65 to 1154ed4

Increases (37 builds for cyw30739, efr32, esp32, k32w, linux, mbed, nrfconnect, p6, qpg, telink)
platform target config section d7bcb65 1154ed4 change % change
cyw30739 light cyw930739m2evb_01 (read/write) 599234 599274 40 0.0
.app_xip_area 503160 503200 40 0.0
lock cyw930739m2evb_01 (read/write) 557270 557310 40 0.0
.app_xip_area 462740 462780 40 0.0
ota-requestor cyw930739m2evb_01 (read/write) 578506 578546 40 0.0
.app_xip_area 474552 474592 40 0.0
efr32 lighting-app BRD4161A (read only) 916160 916208 48 0.0
.text 916152 916200 48 0.0
BRD4161A+rpc (read only) 944868 944916 48 0.0
.text 944860 944908 48 0.0
window-app BRD4161A (read only) 850000 850032 32 0.0
.text 849992 850024 32 0.0
esp32 all-clusters-app c3devkit (read only) 950350 950392 42 0.0
.flash.text 950350 950392 42 0.0
m5stack (read only) 999843 999871 28 0.0
.flash.text 994459 994487 28 0.0
k32w light k32w061+release (read/write) 692416 692448 32 0.0
.text 606312 606344 32 0.0
lock k32w061+release (read/write) 694956 695004 48 0.0
.text 608596 608644 48 0.0
linux all-clusters-app debug (read only) 2386281 2386425 144 0.0
.text 2015682 2015826 144 0.0
bridge-app debug+rpc (read only) 1734805 1734949 144 0.0
.text 1475237 1475381 144 0.0
chip-tool debug (read only) 8988461 8988605 144 0.0
.text 7856149 7856293 144 0.0
chip-tool-ipv6only arm64 (read only) 8714068 8714196 128 0.0
.text 7352164 7352292 128 0.0
door-lock-app debug (read only) 1947257 1947401 144 0.0
.text 1623554 1623698 144 0.0
lighting-app debug+rpc (read only) 2073745 2073889 144 0.0
.text 1751298 1751442 144 0.0
ota-provider-app debug (read only) 1880713 1880857 144 0.0
.text 1570002 1570146 144 0.0
ota-requestor-app debug (read only) 1893745 1893873 128 0.0
.text 1590322 1590450 128 0.0
shell debug (read only) 2361169 2361313 144 0.0
.text 1995666 1995810 144 0.0
thermostat-no-ble arm64 (read only) 2167564 2167676 112 0.0
(read/write) 151137 151153 16 0.0
.bss 67505 67521 16 0.0
.text 1814032 1814144 112 0.0
tv-app debug (read only) 2542969 2543113 144 0.0
.text 2169506 2169650 144 0.0
mbed lighting-app CY8CPROTO_062_4343W+release (read/write) 2393028 2393092 64 0.0
.text 1355600 1355664 64 0.0
shell CY8CPROTO_062_4343W+release (read/write) 2319428 2319492 64 0.0
.text 1282000 1282064 64 0.0
nrfconnect lighting-app nrf52840dk_nrf52840 (read/write) 1023275 1023323 48 0.0
text 699824 699864 40 0.0
nrf52840dk_nrf52840+rpc (read/write) 992587 992619 32 0.0
text 679724 679764 40 0.0
nrf52840dongle_nrf52840 (read/write) 1038031 1038063 32 0.0
text 703636 703676 40 0.0
nrf5340dk_nrf5340_cpuapp (read/write) 929886 929918 32 0.0
text 614848 614888 40 0.0
lock-app nrf52840dk_nrf52840 (read/write) 952039 952087 48 0.0
text 641940 641980 40 0.0
nrf5340dk_nrf5340_cpuapp (read/write) 859518 859550 32 0.0
text 557740 557784 44 0.0
pump-app nrf52840dk_nrf52840 (read/write) 950615 950663 48 0.0
text 641760 641800 40 0.0
pump-controller-app nrf52840dk_nrf52840 (read/write) 946579 946611 32 0.0
text 638000 638040 40 0.0
p6 all-clusters-app default (read/write) 2489304 2489336 32 0.0
.text 1447568 1447600 32 0.0
light-app default (read/write) 2394568 2394600 32 0.0
.text 1352832 1352864 32 0.0
lock-app default (read/write) 2358128 2358160 32 0.0
.text 1316392 1316424 32 0.0
qpg lighting-app qpg6105+debug (read only) 600060 600100 40 0.0
.text 594740 594780 40 0.0
lock-app qpg6105+debug (read only) 565828 565868 40 0.0
.text 560508 560548 40 0.0
telink lighting-app tlsr9518adk80d (read/write) 878662 878702 40 0.0
text 618826 618868 42 0.0
Full report (43 builds for cyw30739, efr32, esp32, k32w, linux, mbed, nrfconnect, p6, qpg, telink)
platform target config section d7bcb65 1154ed4 change % change
cyw30739 light cyw930739m2evb_01 (read/write) 599234 599274 40 0.0
.app_xip_area 503160 503200 40 0.0
.bss 78772 78772 0 0.0
.data 644 644 0 0.0
.rodata 0 0 0 0.0
.text 0 0 0 0.0
lock cyw930739m2evb_01 (read/write) 557270 557310 40 0.0
.app_xip_area 462740 462780 40 0.0
.bss 77268 77268 0 0.0
.data 608 608 0 0.0
.rodata 0 0 0 0.0
.text 0 0 0 0.0
ota-requestor cyw930739m2evb_01 (read/write) 578506 578546 40 0.0
.app_xip_area 474552 474592 40 0.0
.bss 86364 86364 0 0.0
.data 552 552 0 0.0
.rodata 0 0 0 0.0
.text 112 112 0 0.0
efr32 lighting-app BRD4161A (read only) 916160 916208 48 0.0
(read/write) 129512 129512 0 0.0
.bss 127472 127472 0 0.0
.data 2036 2036 0 0.0
.text 916152 916200 48 0.0
BRD4161A+rpc (read only) 944868 944916 48 0.0
(read/write) 146424 146424 0 0.0
.bss 144248 144248 0 0.0
.data 2176 2176 0 0.0
.text 944860 944908 48 0.0
window-app BRD4161A (read only) 850000 850032 32 0.0
(read/write) 127424 127424 0 0.0
.bss 125520 125520 0 0.0
.data 1904 1904 0 0.0
.text 849992 850024 32 0.0
esp32 all-clusters-app c3devkit (read only) 950350 950392 42 0.0
(read/write) 1401938 1401938 0 0.0
.dram0.bss 68512 68512 0 0.0
.dram0.data 14156 14156 0 0.0
.flash.rodata 200376 200376 0 0.0
.flash.text 950350 950392 42 0.0
.iram0.text 62056 62056 0 0.0
m5stack (read only) 999843 999871 28 0.0
(read/write) 467216 467216 0 0.0
.dram0.bss 73656 73656 0 0.0
.dram0.data 34064 34064 0 0.0
.flash.rodata 227368 227368 0 0.0
.flash.text 994459 994487 28 0.0
.iram0.text 123399 123399 0 0.0
k32w light k32w061+release (read/write) 692416 692448 32 0.0
.bss 78392 78392 0 0.0
.data 1912 1912 0 0.0
.text 606312 606344 32 0.0
lock k32w061+release (read/write) 694956 695004 48 0.0
.bss 78608 78608 0 0.0
.data 1952 1952 0 0.0
.text 608596 608644 48 0.0
linux all-clusters-app debug (read only) 2386281 2386425 144 0.0
(read/write) 151456 151456 0 0.0
.bss 65376 65376 0 0.0
.data 1328 1328 0 0.0
.data.rel.ro 79048 79048 0 0.0
.dynamic 592 592 0 0.0
.got 4160 4160 0 0.0
.init 27 27 0 0.0
.init_array 920 920 0 0.0
.rodata 207013 207013 0 0.0
.text 2015682 2015826 144 0.0
bridge-app debug+rpc (read only) 1734805 1734949 144 0.0
(read/write) 94856 94856 0 0.0
.bss 49296 49296 0 0.0
.data 2034 2034 0 0.0
.data.rel.ro 38376 38376 0 0.0
.dynamic 592 592 0 0.0
.got 3952 3952 0 0.0
.init 27 27 0 0.0
.init_array 560 560 0 0.0
.rodata 142732 142732 0 0.0
.text 1475237 1475381 144 0.0
chip-tool debug (read only) 8988461 8988605 144 0.0
(read/write) 319568 319568 0 0.0
.bss 40728 40728 0 0.0
.data 1184 1184 0 0.0
.data.rel.ro 271624 271624 0 0.0
.dynamic 608 608 0 0.0
.got 4784 4784 0 0.0
.init 27 27 0 0.0
.init_array 624 624 0 0.0
.rodata 474037 474037 0 0.0
.text 7856149 7856293 144 0.0
chip-tool-ipv6only arm64 (read only) 8714068 8714196 128 0.0
(read/write) 431377 431377 0 0.0
.bss 58977 58977 0 0.0
.data 1216 1216 0 0.0
.data.rel.ro 316816 316816 0 0.0
.dynamic 560 560 0 0.0
.got 50568 50568 0 0.0
.init 24 24 0 0.0
.init_array 200 200 0 0.0
.rodata 450444 450444 0 0.0
.text 7352164 7352292 128 0.0
door-lock-app debug (read only) 1947257 1947401 144 0.0
(read/write) 120952 120952 0 0.0
.bss 52016 52016 0 0.0
.data 1010 1010 0 0.0
.data.rel.ro 62488 62488 0 0.0
.dynamic 592 592 0 0.0
.got 4136 4136 0 0.0
.init 27 27 0 0.0
.init_array 672 672 0 0.0
.rodata 174194 174194 0 0.0
.text 1623554 1623698 144 0.0
lighting-app debug+rpc (read only) 2073745 2073889 144 0.0
(read/write) 125944 125944 0 0.0
.bss 53024 53024 0 0.0
.data 1400 1400 0 0.0
.data.rel.ro 65984 65984 0 0.0
.dynamic 608 608 0 0.0
.got 4168 4168 0 0.0
.init 27 27 0 0.0
.init_array 720 720 0 0.0
.rodata 166801 166801 0 0.0
.text 1751298 1751442 144 0.0
ota-provider-app debug (read only) 1880713 1880857 144 0.0
(read/write) 116568 116568 0 0.0
.bss 51872 51872 0 0.0
.data 1224 1224 0 0.0
.data.rel.ro 57816 57816 0 0.0
.dynamic 608 608 0 0.0
.got 4392 4392 0 0.0
.init 27 27 0 0.0
.init_array 624 624 0 0.0
.rodata 159099 159099 0 0.0
.text 1570002 1570146 144 0.0
ota-requestor-app debug (read only) 1893745 1893873 128 0.0
(read/write) 117952 117952 0 0.0
.bss 52288 52288 0 0.0
.data 1128 1128 0 0.0
.data.rel.ro 59080 59080 0 0.0
.dynamic 592 592 0 0.0
.got 4192 4192 0 0.0
.init 27 27 0 0.0
.init_array 632 632 0 0.0
.rodata 153484 153484 0 0.0
.text 1590322 1590450 128 0.0
shell debug (read only) 2361169 2361313 144 0.0
(read/write) 153872 153872 0 0.0
.bss 73728 73728 0 0.0
.data 832 832 0 0.0
.data.rel.ro 73632 73632 0 0.0
.dynamic 592 592 0 0.0
.got 4168 4168 0 0.0
.init 27 27 0 0.0
.init_array 904 904 0 0.0
.rodata 208050 208050 0 0.0
.text 1995666 1995810 144 0.0
thermostat-no-ble arm64 (read only) 2167564 2167676 112 0.0
(read/write) 151137 151153 16 0.0
.bss 67505 67521 16 0.0
.data 1032 1032 0 0.0
.data.rel.ro 75384 75384 0 0.0
.dynamic 560 560 0 0.0
.got 4224 4224 0 0.0
.init 24 24 0 0.0
.init_array 336 336 0 0.0
.rodata 134060 134060 0 0.0
.text 1814032 1814144 112 0.0
tv-app debug (read only) 2542969 2543113 144 0.0
(read/write) 152064 152064 0 0.0
.bss 69248 69248 0 0.0
.data 3200 3200 0 0.0
.data.rel.ro 73576 73576 0 0.0
.dynamic 592 592 0 0.0
.got 4552 4552 0 0.0
.init 27 27 0 0.0
.init_array 888 888 0 0.0
.rodata 199117 199117 0 0.0
.text 2169506 2169650 144 0.0
mbed all-clusters-app CY8CPROTO_062_4343W+release (read only) 6224 6224 0 0.0
(read/write) 2431252 2431252 0 0.0
.bss 195924 195924 0 0.0
.data 5328 5328 0 0.0
.text 1393824 1393824 0 0.0
lighting-app CY8CPROTO_062_4343W+release (read only) 6224 6224 0 0.0
(read/write) 2393028 2393092 64 0.0
.bss 188432 188432 0 0.0
.data 5632 5632 0 0.0
.text 1355600 1355664 64 0.0
lock-app CY8CPROTO_062_4343W+release (read only) 6224 6224 0 0.0
(read/write) 2328536 2328536 0 0.0
.bss 187432 187432 0 0.0
.data 5608 5608 0 0.0
.text 1291136 1291136 0 0.0
pigweed-app CY8CPROTO_062_4343W+release (read only) 6224 6224 0 0.0
(read/write) 1139840 1139840 0 0.0
.bss 11796 11796 0 0.0
.data 4368 4368 0 0.0
.text 103224 103224 0 0.0
shell CY8CPROTO_062_4343W+release (read only) 6224 6224 0 0.0
(read/write) 2319428 2319492 64 0.0
.bss 185980 185980 0 0.0
.data 5440 5440 0 0.0
.text 1282000 1282064 64 0.0
nrfconnect lighting-app nrf52840dk_nrf52840 (read/write) 1023275 1023323 48 0.0
bss 123532 123532 0 0.0
rodata 120904 120904 0 0.0
text 699824 699864 40 0.0
nrf52840dk_nrf52840+rpc (read/write) 992587 992619 32 0.0
bss 120720 120720 0 0.0
rodata 112448 112448 0 0.0
text 679724 679764 40 0.0
nrf52840dongle_nrf52840 (read/write) 1038031 1038063 32 0.0
bss 124752 124752 0 0.0
rodata 119736 119736 0 0.0
text 703636 703676 40 0.0
nrf5340dk_nrf5340_cpuapp (read/write) 929886 929918 32 0.0
bss 120092 120092 0 0.0
rodata 114160 114160 0 0.0
text 614848 614888 40 0.0
lock-app nrf52840dk_nrf52840 (read/write) 952039 952087 48 0.0
bss 121760 121760 0 0.0
rodata 109740 109740 0 0.0
text 641940 641980 40 0.0
nrf5340dk_nrf5340_cpuapp (read/write) 859518 859550 32 0.0
bss 118352 118352 0 0.0
rodata 102912 102912 0 0.0
text 557740 557784 44 0.0
pigweed-app nrf52840dk_nrf52840 (read/write) 527595 527595 0 0.0
bss 53632 53632 0 0.0
rodata 49976 49976 0 0.0
text 361016 361016 0 0.0
pump-app nrf52840dk_nrf52840 (read/write) 950615 950663 48 0.0
bss 121480 121480 0 0.0
rodata 108692 108692 0 0.0
text 641760 641800 40 0.0
pump-controller-app nrf52840dk_nrf52840 (read/write) 946579 946611 32 0.0
bss 121484 121484 0 0.0
rodata 108392 108392 0 0.0
text 638000 638040 40 0.0
shell nrf52840dk_nrf52840 (read/write) 811335 811335 0 0.0
bss 113328 113328 0 0.0
rodata 79676 79676 0 0.0
text 540684 540684 0 0.0
p6 all-clusters-app default (read/write) 2489304 2489336 32 0.0
.bss 124240 124240 0 0.0
.data 2672 2672 0 0.0
.text 1447568 1447600 32 0.0
light-app default (read/write) 2394568 2394600 32 0.0
.bss 113896 113896 0 0.0
.data 2528 2528 0 0.0
.text 1352832 1352864 32 0.0
lock-app default (read/write) 2358128 2358160 32 0.0
.bss 113648 113648 0 0.0
.data 2488 2488 0 0.0
.text 1316392 1316424 32 0.0
qpg lighting-app qpg6105+debug (read only) 600060 600100 40 0.0
(read/write) 146940 146940 0 0.0
.bss 90952 90952 0 0.0
.data 1112 1112 0 0.0
.text 594740 594780 40 0.0
lock-app qpg6105+debug (read only) 565828 565868 40 0.0
(read/write) 146940 146940 0 0.0
.bss 90960 90960 0 0.0
.data 1064 1064 0 0.0
.text 560508 560548 40 0.0
persistent-storage-app qpg6105+debug (read only) 99536 99536 0 0.0
(read/write) 146941 146941 0 0.0
.bss 24001 24001 0 0.0
.data 180 180 0 0.0
.text 94216 94216 0 0.0
telink lighting-app tlsr9518adk80d (read/write) 878662 878702 40 0.0
bss 87504 87504 0 0.0
noinit 37160 37160 0 0.0
text 618826 618868 42 0.0

@yunhanw-google yunhanw-google merged commit 149c271 into project-chip:master Feb 18, 2022
@bzbarsky-apple bzbarsky-apple deleted the stop-dropping-reads branch February 18, 2022 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants