Skip to content

Commit

Permalink
Stop dropping reads on the floor if more than one happens at once.
Browse files Browse the repository at this point in the history
Fixes #15304

The fix for #15304 is the change in the while loop condition in
Engine::Run.  Before that change, we would compare numReadHandled to
the current count of allocated read handlers.  But processing a read
handler would deallocate it, so we would only end up processing half
the read handlers that were outstanding on entry to Run (because after
that numReadHandled would be larger than the remaining number
allocated).

The change in InteractionModelEngine::OnDone and the management
of mRunningReadHandler are to handle a slightly more complicated
situation I ran into while writing some tests for this code.  If we
have at least two subscriptions and some number of reads after them
pending when Engine::Run is entered, when we would process the first
read, it would be deallocated, mCurReadHandlerIdx would get reset to
0, we would then increment it by 1, and end up walking all but the
first subscription again.  Which means that our numReadHandled would
increase to the point where the loop terminates before we had gotten
to all the read handlers.  If we had N subscriptions we would miss N-1
read handlers.  Those could then get stuck in limbo indefinitely,
until either a subscription heartbeat had to happen or someone else
issued a read.

The change in Engine::OnReportConfirm is to handle the case when we
have more than CHIP_IM_MAX_REPORTS_IN_FLIGHT subscriptions that all
need reporting, fire off the first CHIP_IM_MAX_REPORTS_IN_FLIGHT of
them, and then never call ScheduleRun() after that, so all the other
subscriptions get stuck.

The issue with CHIP_IM_MAX_REPORTS_IN_FLIGHT was not being caught by
the existing tests because those tests manually called Run() on the
reporting engine (in a loop, in fact).  That was needed because we
could end up in a situation where DrainAndServiceIO() has processed
all the pending messages, but we have a queued task (from ScheduleRun)
that is not a message and will not get run, so Engine::Run was not
getting called properly via the "normal" codepath in the test.  This
was fixed by removing all the manual Run() calls and making
DrainAndServiceIO() do a better job of handling async things queued by
message reception.

TestReadAttributeTimeout had to be modified slightly because in the
new setup we could no longer rely on DrainAndServiceIO() after we send
the reads not triggering the reports and giving us a chance to tear
down the session before the reports did get triggered.  So instead of
first expiring the client-to-server session, we expire the
server-to-client one _before_ doing DrainAndServiceIO.  This ensures
that we never get replies to our reads, and then we can proceed to
expire the client-to-server session, which gets treated as a timeout.
  • Loading branch information
bzbarsky-apple committed Feb 18, 2022
1 parent 72896fc commit 1154ed4
Show file tree
Hide file tree
Showing 5 changed files with 235 additions and 76 deletions.
10 changes: 5 additions & 5 deletions src/app/InteractionModelEngine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -210,14 +210,14 @@ void InteractionModelEngine::OnDone(CommandHandler & apCommandObj)

void InteractionModelEngine::OnDone(ReadHandler & apReadObj)
{
mReadHandlers.ReleaseObject(&apReadObj);

//
// Deleting an item can shift down the contents of the underlying pool storage,
// rendering any tracker using positional indexes invalid. Let's reset it and
// have it start from index 0.
// rendering any tracker using positional indexes invalid. Let's reset it,
// based on which readHandler we are getting rid of.
//
mReportingEngine.ResetReadHandlerTracker();
mReportingEngine.ResetReadHandlerTracker(&apReadObj);

mReadHandlers.ReleaseObject(&apReadObj);
}

CHIP_ERROR InteractionModelEngine::OnInvokeCommandRequest(Messaging::ExchangeContext * apExchangeContext,
Expand Down
18 changes: 16 additions & 2 deletions src/app/reporting/Engine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -519,21 +519,29 @@ void Engine::Run()

mRunScheduled = false;

while ((mNumReportsInFlight < CHIP_IM_MAX_REPORTS_IN_FLIGHT) && (numReadHandled < imEngine->mReadHandlers.Allocated()))
// We may be deallocating read handlers as we go. Track how many we had
// initially, so we make sure to go through all of them.
size_t initialAllocated = imEngine->mReadHandlers.Allocated();
while ((mNumReportsInFlight < CHIP_IM_MAX_REPORTS_IN_FLIGHT) && (numReadHandled < initialAllocated))
{
ReadHandler * readHandler = imEngine->ActiveHandlerAt(mCurReadHandlerIdx % (uint32_t) imEngine->mReadHandlers.Allocated());
VerifyOrDie(readHandler != nullptr);

if (readHandler->IsReportable())
{
CHIP_ERROR err = BuildAndSendSingleReportData(readHandler);
mRunningReadHandler = readHandler;
CHIP_ERROR err = BuildAndSendSingleReportData(readHandler);
mRunningReadHandler = nullptr;
if (err != CHIP_NO_ERROR)
{
return;
}
}

numReadHandled++;
// If readHandler removed itself from our list, we also decremented
// mCurReadHandlerIdx to account for that removal, so it's safe to
// increment here.
mCurReadHandlerIdx++;
}

Expand Down Expand Up @@ -669,6 +677,12 @@ void Engine::OnReportConfirm()
{
VerifyOrDie(mNumReportsInFlight > 0);

if (mNumReportsInFlight == CHIP_IM_MAX_REPORTS_IN_FLIGHT)
{
// We could have other things waiting to go now that this report is no
// longer in flight.
ScheduleRun();
}
mNumReportsInFlight--;
ChipLogDetail(DataManagement, "<RE> OnReportConfirm: NumReports = %" PRIu32, mNumReportsInFlight);
}
Expand Down
26 changes: 24 additions & 2 deletions src/app/reporting/Engine.h
Original file line number Diff line number Diff line change
Expand Up @@ -98,8 +98,25 @@ class Engine

/*
* Resets the tracker that tracks the currently serviced read handler.
*/
void ResetReadHandlerTracker() { mCurReadHandlerIdx = 0; }
* apReadHandler can be non-null to indicate that the reset is due to a
* specific ReadHandler being deallocated.
*/
void ResetReadHandlerTracker(ReadHandler * apReadHandlerBeingDeleted)
{
if (apReadHandlerBeingDeleted == mRunningReadHandler)
{
// Just decrement, so our increment after we finish running it will
// do the right thing.
--mCurReadHandlerIdx;
}
else
{
// No idea what to do here to make the indexing sane. Just start at
// the beginning. We need to do better here; see
// https://github.com/project-chip/connectedhomeip/issues/13809
mCurReadHandlerIdx = 0;
}
}

private:
friend class TestReportingEngine;
Expand Down Expand Up @@ -174,6 +191,11 @@ class Engine
*/
uint32_t mCurReadHandlerIdx = 0;

/**
* The read handler we're calling BuildAndSendSingleReportData on right now.
*/
ReadHandler * mRunningReadHandler = nullptr;

/**
* mGlobalDirtySet is used to track the set of attribute/event paths marked dirty for reporting purposes.
*
Expand Down
Loading

0 comments on commit 1154ed4

Please sign in to comment.