Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Stop dropping reads on the floor if more than one happens at once.
Fixes #15304 The fix for #15304 is the change in the while loop condition in Engine::Run. Before that change, we would compare numReadHandled to the current count of allocated read handlers. But processing a read handler would deallocate it, so we would only end up processing half the read handlers that were outstanding on entry to Run (because after that numReadHandled would be larger than the remaining number allocated). The change in InteractionModelEngine::OnDone and the management of mRunningReadHandler are to handle a slightly more complicated situation I ran into while writing some tests for this code. If we have at least two subscriptions and some number of reads after them pending when Engine::Run is entered, when we would process the first read, it would be deallocated, mCurReadHandlerIdx would get reset to 0, we would then increment it by 1, and end up walking all but the first subscription again. Which means that our numReadHandled would increase to the point where the loop terminates before we had gotten to all the read handlers. If we had N subscriptions we would miss N-1 read handlers. Those could then get stuck in limbo indefinitely, until either a subscription heartbeat had to happen or someone else issued a read. The change in Engine::OnReportConfirm is to handle the case when we have more than CHIP_IM_MAX_REPORTS_IN_FLIGHT subscriptions that all need reporting, fire off the first CHIP_IM_MAX_REPORTS_IN_FLIGHT of them, and then never call ScheduleRun() after that, so all the other subscriptions get stuck. The issue with CHIP_IM_MAX_REPORTS_IN_FLIGHT was not being caught by the existing tests because those tests manually called Run() on the reporting engine (in a loop, in fact). That was needed because we could end up in a situation where DrainAndServiceIO() has processed all the pending messages, but we have a queued task (from ScheduleRun) that is not a message and will not get run, so Engine::Run was not getting called properly via the "normal" codepath in the test. This was fixed by removing all the manual Run() calls and making DrainAndServiceIO() do a better job of handling async things queued by message reception. TestReadAttributeTimeout had to be modified slightly because in the new setup we could no longer rely on DrainAndServiceIO() after we send the reads not triggering the reports and giving us a chance to tear down the session before the reports did get triggered. So instead of first expiring the client-to-server session, we expire the server-to-client one _before_ doing DrainAndServiceIO. This ensures that we never get replies to our reads, and then we can proceed to expire the client-to-server session, which gets treated as a timeout.
- Loading branch information