Stop dropping reads on the floor if more than one happens at once. #15338

bzbarsky-apple · 2022-02-18T05:51:44Z

The fix for #15304 is the change in the while loop condition in
Engine::Run. Before that change, we would compare numReadHandled to
the current count of allocated read handlers. But processing a read
handler would deallocate it, so we would only end up processing half
the read handlers that were outstanding on entry to Run (because after
that numReadHandled would be larger than the remaining number
allocated).

The change in InteractionModelEngine::OnDone and the management
of mRunningReadHandler are to handle a slightly more complicated
situation I ran into while writing some tests for this code. If we
have at least two subscriptions and some number of reads after them
pending when Engine::Run is entered, when we would process the first
read, it would be deallocated, mCurReadHandlerIdx would get reset to
0, we would then increment it by 1, and end up walking all but the
first subscription again. Which means that our numReadHandled would
increase to the point where the loop terminates before we had gotten
to all the read handlers. If we had N subscriptions we would miss N-1
read handlers. Those could then get stuck in limbo indefinitely,
until either a subscription heartbeat had to happen or someone else
issued a read.

The change in Engine::OnReportConfirm is to handle the case when we
have more than CHIP_IM_MAX_REPORTS_IN_FLIGHT subscriptions that all
need reporting, fire off the first CHIP_IM_MAX_REPORTS_IN_FLIGHT of
them, and then never call ScheduleRun() after that, so all the other
subscriptions get stuck.

The issue with CHIP_IM_MAX_REPORTS_IN_FLIGHT was not being caught by
the existing tests because those tests manually called Run() on the
reporting engine (in a loop, in fact). That was needed because we
could end up in a situation where DrainAndServiceIO() has processed
all the pending messages, but we have a queued task (from ScheduleRun)
that is not a message and will not get run, so Engine::Run was not
getting called properly via the "normal" codepath in the test. This
was fixed by removing all the manual Run() calls and making
DrainAndServiceIO() do a better job of handling async things queued by
message reception.

TestReadAttributeTimeout had to be modified slightly because in the
new setup we could no longer rely on DrainAndServiceIO() after we send
the reads not triggering the reports and giving us a chance to tear
down the session before the reports did get triggered. So instead of
first expiring the client-to-server session, we expire the
server-to-client one before doing DrainAndServiceIO. This ensures
that we never get replies to our reads, and then we can proceed to
expire the client-to-server session, which gets treated as a timeout.

Problem

See above

Change overview

See above

Testing

Yes, that's what took up most of the time for this PR.

Fixes project-chip#15304 The fix for project-chip#15304 is the change in the while loop condition in Engine::Run. Before that change, we would compare numReadHandled to the current count of allocated read handlers. But processing a read handler would deallocate it, so we would only end up processing half the read handlers that were outstanding on entry to Run (because after that numReadHandled would be larger than the remaining number allocated). The change in InteractionModelEngine::OnDone and the management of mRunningReadHandler are to handle a slightly more complicated situation I ran into while writing some tests for this code. If we have at least two subscriptions and some number of reads after them pending when Engine::Run is entered, when we would process the first read, it would be deallocated, mCurReadHandlerIdx would get reset to 0, we would then increment it by 1, and end up walking all but the first subscription again. Which means that our numReadHandled would increase to the point where the loop terminates before we had gotten to all the read handlers. If we had N subscriptions we would miss N-1 read handlers. Those could then get stuck in limbo indefinitely, until either a subscription heartbeat had to happen or someone else issued a read. The change in Engine::OnReportConfirm is to handle the case when we have more than CHIP_IM_MAX_REPORTS_IN_FLIGHT subscriptions that all need reporting, fire off the first CHIP_IM_MAX_REPORTS_IN_FLIGHT of them, and then never call ScheduleRun() after that, so all the other subscriptions get stuck. The issue with CHIP_IM_MAX_REPORTS_IN_FLIGHT was not being caught by the existing tests because those tests manually called Run() on the reporting engine (in a loop, in fact). That was needed because we could end up in a situation where DrainAndServiceIO() has processed all the pending messages, but we have a queued task (from ScheduleRun) that is not a message and will not get run, so Engine::Run was not getting called properly via the "normal" codepath in the test. This was fixed by removing all the manual Run() calls and making DrainAndServiceIO() do a better job of handling async things queued by message reception. TestReadAttributeTimeout had to be modified slightly because in the new setup we could no longer rely on DrainAndServiceIO() after we send the reads not triggering the reports and giving us a chance to tear down the session before the reports did get triggered. So instead of first expiring the client-to-server session, we expire the server-to-client one _before_ doing DrainAndServiceIO. This ensures that we never get replies to our reads, and then we can proceed to expire the client-to-server session, which gets treated as a timeout.

github-actions · 2022-02-18T06:12:38Z

PR #15338: Size comparison from d7bcb65 to 1154ed4

Increases (37 builds for cyw30739, efr32, esp32, k32w, linux, mbed, nrfconnect, p6, qpg, telink)

platform	target	config	section	`d7bcb65`	`1154ed4`	change
cyw30739	light	cyw930739m2evb_01	(read/write)	599234	599274	40
			.app_xip_area	503160	503200	40
	lock	cyw930739m2evb_01	(read/write)	557270	557310	40
			.app_xip_area	462740	462780	40
	ota-requestor	cyw930739m2evb_01	(read/write)	578506	578546	40
			.app_xip_area	474552	474592	40
efr32	lighting-app	BRD4161A	(read only)	916160	916208	48
			.text	916152	916200	48
		BRD4161A+rpc	(read only)	944868	944916	48
			.text	944860	944908	48
	window-app	BRD4161A	(read only)	850000	850032	32
			.text	849992	850024	32
esp32	all-clusters-app	c3devkit	(read only)	950350	950392	42
			.flash.text	950350	950392	42
		m5stack	(read only)	999843	999871	28
			.flash.text	994459	994487	28
k32w	light	k32w061+release	(read/write)	692416	692448	32
			.text	606312	606344	32
	lock	k32w061+release	(read/write)	694956	695004	48
			.text	608596	608644	48
linux	all-clusters-app	debug	(read only)	`2386281`	2386425	144
			.text	2015682	2015826	144
	bridge-app	debug+rpc	(read only)	1734805	1734949	144
			.text	1475237	`1475381`	144
	chip-tool	debug	(read only)	8988461	8988605	144
			.text	`7856149`	7856293	144
	chip-tool-ipv6only	arm64	(read only)	8714068	8714196	128
			.text	7352164	7352292	128
	door-lock-app	debug	(read only)	1947257	1947401	144
			.text	`1623554`	1623698	144
	lighting-app	debug+rpc	(read only)	`2073745`	`2073889`	144
			.text	`1751298`	1751442	144
	ota-provider-app	debug	(read only)	1880713	1880857	144
			.text	1570002	1570146	144
	ota-requestor-app	debug	(read only)	1893745	`1893873`	128
			.text	1590322	1590450	128
	shell	debug	(read only)	2361169	2361313	144
			.text	1995666	`1995810`	144
	thermostat-no-ble	arm64	(read only)	`2167564`	2167676	112
			(read/write)	151137	151153	16
			.bss	67505	67521	16
			.text	1814032	1814144	112
	tv-app	debug	(read only)	2542969	2543113	144
			.text	`2169506`	`2169650`	144
mbed	lighting-app	CY8CPROTO_062_4343W+release	(read/write)	2393028	2393092	64
			.text	1355600	1355664	64
	shell	CY8CPROTO_062_4343W+release	(read/write)	`2319428`	2319492	64
			.text	1282000	`1282064`	64
nrfconnect	lighting-app	nrf52840dk_nrf52840	(read/write)	1023275	1023323	48
			text	699824	699864	40
		nrf52840dk_nrf52840+rpc	(read/write)	992587	992619	32
			text	679724	679764	40
		nrf52840dongle_nrf52840	(read/write)	1038031	1038063	32
			text	703636	703676	40
		nrf5340dk_nrf5340_cpuapp	(read/write)	929886	929918	32
			text	614848	614888	40
	lock-app	nrf52840dk_nrf52840	(read/write)	952039	952087	48
			text	641940	641980	40
		nrf5340dk_nrf5340_cpuapp	(read/write)	859518	859550	32
			text	557740	557784	44
	pump-app	nrf52840dk_nrf52840	(read/write)	950615	950663	48
			text	641760	641800	40
	pump-controller-app	nrf52840dk_nrf52840	(read/write)	946579	946611	32
			text	638000	638040	40
p6	all-clusters-app	default	(read/write)	2489304	2489336	32
			.text	1447568	`1447600`	32
	light-app	default	(read/write)	2394568	`2394600`	32
			.text	1352832	`1352864`	32
	lock-app	default	(read/write)	2358128	2358160	32
			.text	`1316392`	`1316424`	32
qpg	lighting-app	qpg6105+debug	(read only)	600060	600100	40
			.text	594740	594780	40
	lock-app	qpg6105+debug	(read only)	565828	565868	40
			.text	560508	560548	40
telink	lighting-app	tlsr9518adk80d	(read/write)	878662	878702	40
			text	618826	618868	42

Full report (43 builds for cyw30739, efr32, esp32, k32w, linux, mbed, nrfconnect, p6, qpg, telink)

platform	target	config	section	`d7bcb65`	`1154ed4`	change
cyw30739	light	cyw930739m2evb_01	(read/write)	599234	599274	40
			.app_xip_area	503160	503200	40
			.bss	78772	78772	0
			.data	644	644	0
			.rodata	0	0	0
			.text	0	0	0
	lock	cyw930739m2evb_01	(read/write)	557270	557310	40
			.app_xip_area	462740	462780	40
			.bss	77268	77268	0
			.data	608	608	0
			.rodata	0	0	0
			.text	0	0	0
	ota-requestor	cyw930739m2evb_01	(read/write)	578506	578546	40
			.app_xip_area	474552	474592	40
			.bss	86364	86364	0
			.data	552	552	0
			.rodata	0	0	0
			.text	112	112	0
efr32	lighting-app	BRD4161A	(read only)	916160	916208	48
			(read/write)	129512	129512	0
			.bss	127472	127472	0
			.data	2036	2036	0
			.text	916152	916200	48
		BRD4161A+rpc	(read only)	944868	944916	48
			(read/write)	146424	146424	0
			.bss	144248	144248	0
			.data	2176	2176	0
			.text	944860	944908	48
	window-app	BRD4161A	(read only)	850000	850032	32
			(read/write)	127424	127424	0
			.bss	125520	125520	0
			.data	1904	1904	0
			.text	849992	850024	32
esp32	all-clusters-app	c3devkit	(read only)	950350	950392	42
			(read/write)	1401938	1401938	0
			.dram0.bss	68512	68512	0
			.dram0.data	14156	14156	0
			.flash.rodata	200376	200376	0
			.flash.text	950350	950392	42
			.iram0.text	62056	62056	0
		m5stack	(read only)	999843	999871	28
			(read/write)	467216	467216	0
			.dram0.bss	73656	73656	0
			.dram0.data	34064	34064	0
			.flash.rodata	227368	227368	0
			.flash.text	994459	994487	28
			.iram0.text	123399	123399	0
k32w	light	k32w061+release	(read/write)	692416	692448	32
			.bss	78392	78392	0
			.data	1912	1912	0
			.text	606312	606344	32
	lock	k32w061+release	(read/write)	694956	695004	48
			.bss	78608	78608	0
			.data	1952	1952	0
			.text	608596	608644	48
linux	all-clusters-app	debug	(read only)	`2386281`	2386425	144
			(read/write)	151456	151456	0
			.bss	65376	65376	0
			.data	1328	1328	0
			.data.rel.ro	79048	79048	0
			.dynamic	592	592	0
			.got	4160	4160	0
			.init	27	27	0
			.init_array	920	920	0
			.rodata	207013	207013	0
			.text	2015682	2015826	144
	bridge-app	debug+rpc	(read only)	1734805	1734949	144
			(read/write)	94856	94856	0
			.bss	49296	49296	0
			.data	2034	2034	0
			.data.rel.ro	38376	38376	0
			.dynamic	592	592	0
			.got	3952	3952	0
			.init	27	27	0
			.init_array	560	560	0
			.rodata	142732	142732	0
			.text	1475237	`1475381`	144
	chip-tool	debug	(read only)	8988461	8988605	144
			(read/write)	319568	319568	0
			.bss	40728	40728	0
			.data	1184	1184	0
			.data.rel.ro	271624	271624	0
			.dynamic	608	608	0
			.got	4784	4784	0
			.init	27	27	0
			.init_array	624	624	0
			.rodata	474037	474037	0
			.text	`7856149`	7856293	144
	chip-tool-ipv6only	arm64	(read only)	8714068	8714196	128
			(read/write)	431377	431377	0
			.bss	58977	58977	0
			.data	1216	1216	0
			.data.rel.ro	316816	316816	0
			.dynamic	560	560	0
			.got	50568	50568	0
			.init	24	24	0
			.init_array	200	200	0
			.rodata	450444	450444	0
			.text	7352164	7352292	128
	door-lock-app	debug	(read only)	1947257	1947401	144
			(read/write)	120952	120952	0
			.bss	52016	52016	0
			.data	1010	1010	0
			.data.rel.ro	62488	62488	0
			.dynamic	592	592	0
			.got	4136	4136	0
			.init	27	27	0
			.init_array	672	672	0
			.rodata	174194	174194	0
			.text	`1623554`	1623698	144
	lighting-app	debug+rpc	(read only)	`2073745`	`2073889`	144
			(read/write)	125944	125944	0
			.bss	53024	53024	0
			.data	1400	1400	0
			.data.rel.ro	65984	65984	0
			.dynamic	608	608	0
			.got	4168	4168	0
			.init	27	27	0
			.init_array	720	720	0
			.rodata	166801	166801	0
			.text	`1751298`	1751442	144
	ota-provider-app	debug	(read only)	1880713	1880857	144
			(read/write)	116568	116568	0
			.bss	51872	51872	0
			.data	1224	1224	0
			.data.rel.ro	57816	57816	0
			.dynamic	608	608	0
			.got	4392	4392	0
			.init	27	27	0
			.init_array	624	624	0
			.rodata	159099	159099	0
			.text	1570002	1570146	144
	ota-requestor-app	debug	(read only)	1893745	`1893873`	128
			(read/write)	117952	117952	0
			.bss	52288	52288	0
			.data	1128	1128	0
			.data.rel.ro	59080	59080	0
			.dynamic	592	592	0
			.got	4192	4192	0
			.init	27	27	0
			.init_array	632	632	0
			.rodata	153484	153484	0
			.text	1590322	1590450	128
	shell	debug	(read only)	2361169	2361313	144
			(read/write)	153872	153872	0
			.bss	73728	73728	0
			.data	832	832	0
			.data.rel.ro	73632	73632	0
			.dynamic	592	592	0
			.got	4168	4168	0
			.init	27	27	0
			.init_array	904	904	0
			.rodata	208050	208050	0
			.text	1995666	`1995810`	144
	thermostat-no-ble	arm64	(read only)	`2167564`	2167676	112
			(read/write)	151137	151153	16
			.bss	67505	67521	16
			.data	1032	1032	0
			.data.rel.ro	75384	75384	0
			.dynamic	560	560	0
			.got	4224	4224	0
			.init	24	24	0
			.init_array	336	336	0
			.rodata	134060	134060	0
			.text	1814032	1814144	112
	tv-app	debug	(read only)	2542969	2543113	144
			(read/write)	152064	152064	0
			.bss	69248	69248	0
			.data	3200	3200	0
			.data.rel.ro	73576	73576	0
			.dynamic	592	592	0
			.got	4552	4552	0
			.init	27	27	0
			.init_array	888	888	0
			.rodata	199117	199117	0
			.text	`2169506`	`2169650`	144
mbed	all-clusters-app	CY8CPROTO_062_4343W+release	(read only)	6224	6224	0
			(read/write)	2431252	2431252	0
			.bss	195924	195924	0
			.data	5328	5328	0
			.text	1393824	1393824	0
	lighting-app	CY8CPROTO_062_4343W+release	(read only)	6224	6224	0
			(read/write)	2393028	2393092	64
			.bss	188432	188432	0
			.data	5632	5632	0
			.text	1355600	1355664	64
	lock-app	CY8CPROTO_062_4343W+release	(read only)	6224	6224	0
			(read/write)	2328536	2328536	0
			.bss	187432	187432	0
			.data	5608	5608	0
			.text	1291136	1291136	0
	pigweed-app	CY8CPROTO_062_4343W+release	(read only)	6224	6224	0
			(read/write)	1139840	1139840	0
			.bss	11796	11796	0
			.data	4368	4368	0
			.text	103224	103224	0
	shell	CY8CPROTO_062_4343W+release	(read only)	6224	6224	0
			(read/write)	`2319428`	2319492	64
			.bss	185980	185980	0
			.data	5440	5440	0
			.text	1282000	`1282064`	64
nrfconnect	lighting-app	nrf52840dk_nrf52840	(read/write)	1023275	1023323	48
			bss	123532	123532	0
			rodata	120904	120904	0
			text	699824	699864	40
		nrf52840dk_nrf52840+rpc	(read/write)	992587	992619	32
			bss	120720	120720	0
			rodata	112448	112448	0
			text	679724	679764	40
		nrf52840dongle_nrf52840	(read/write)	1038031	1038063	32
			bss	124752	124752	0
			rodata	119736	119736	0
			text	703636	703676	40
		nrf5340dk_nrf5340_cpuapp	(read/write)	929886	929918	32
			bss	120092	120092	0
			rodata	114160	114160	0
			text	614848	614888	40
	lock-app	nrf52840dk_nrf52840	(read/write)	952039	952087	48
			bss	121760	121760	0
			rodata	109740	109740	0
			text	641940	641980	40
		nrf5340dk_nrf5340_cpuapp	(read/write)	859518	859550	32
			bss	118352	118352	0
			rodata	102912	102912	0
			text	557740	557784	44
	pigweed-app	nrf52840dk_nrf52840	(read/write)	527595	527595	0
			bss	53632	53632	0
			rodata	49976	49976	0
			text	361016	361016	0
	pump-app	nrf52840dk_nrf52840	(read/write)	950615	950663	48
			bss	121480	121480	0
			rodata	108692	108692	0
			text	641760	641800	40
	pump-controller-app	nrf52840dk_nrf52840	(read/write)	946579	946611	32
			bss	121484	121484	0
			rodata	108392	108392	0
			text	638000	638040	40
	shell	nrf52840dk_nrf52840	(read/write)	811335	811335	0
			bss	113328	113328	0
			rodata	79676	79676	0
			text	540684	540684	0
p6	all-clusters-app	default	(read/write)	2489304	2489336	32
			.bss	124240	124240	0
			.data	2672	2672	0
			.text	1447568	`1447600`	32
	light-app	default	(read/write)	2394568	`2394600`	32
			.bss	113896	113896	0
			.data	2528	2528	0
			.text	1352832	`1352864`	32
	lock-app	default	(read/write)	2358128	2358160	32
			.bss	113648	113648	0
			.data	2488	2488	0
			.text	`1316392`	`1316424`	32
qpg	lighting-app	qpg6105+debug	(read only)	600060	600100	40
			(read/write)	146940	146940	0
			.bss	90952	90952	0
			.data	1112	1112	0
			.text	594740	594780	40
	lock-app	qpg6105+debug	(read only)	565828	565868	40
			(read/write)	146940	146940	0
			.bss	90960	90960	0
			.data	1064	1064	0
			.text	560508	560548	40
	persistent-storage-app	qpg6105+debug	(read only)	99536	99536	0
			(read/write)	146941	146941	0
			.bss	24001	24001	0
			.data	180	180	0
			.text	94216	94216	0
telink	lighting-app	tlsr9518adk80d	(read/write)	878662	878702	40
			bss	87504	87504	0
			noinit	37160	37160	0
			text	618826	618868	42

bzbarsky-apple requested review from msandstedt, mrjerryjohns and yunhanw-google February 18, 2022 05:51

boring-cyborg bot added app controller labels Feb 18, 2022

pullapprove bot requested review from LuDuda, lzgrablic02, sagar-apple, saurabhst, selissia, tecimovic, turon, vijs, vivien-apple, wbschiller, woody-apple, xylophone21 and yufengwangca February 18, 2022 05:52

pullapprove bot added the review - pending label Feb 18, 2022

yunhanw-google approved these changes Feb 18, 2022

View reviewed changes

jmartinez-silabs approved these changes Feb 18, 2022

View reviewed changes

pullapprove bot added review - approved and removed review - pending labels Feb 18, 2022

yunhanw-google merged commit 149c271 into project-chip:master Feb 18, 2022

bzbarsky-apple deleted the stop-dropping-reads branch February 18, 2022 19:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop dropping reads on the floor if more than one happens at once. #15338

Stop dropping reads on the floor if more than one happens at once. #15338

bzbarsky-apple commented Feb 18, 2022

github-actions bot commented Feb 18, 2022 •

edited

Loading

Stop dropping reads on the floor if more than one happens at once. #15338

Stop dropping reads on the floor if more than one happens at once. #15338

Conversation

bzbarsky-apple commented Feb 18, 2022

Problem

Change overview

Testing

github-actions bot commented Feb 18, 2022 • edited Loading

github-actions bot commented Feb 18, 2022 •

edited

Loading