-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Canceling timers does not work reliably in the LayerImplSelect implementation #19387
Comments
|
Once this is fixed, the workaround for this in |
It can't, because in the Select implementation a timer callback that re-adds itself with immediate effect (e.g. using |
Ah, and the FreeRTOS setup does not have that problem? |
On FreeRTOS, I don't know why the Select version doesn't use |
It turns out there's another subtlety with the Select implementation and being able to Cancel from the expired timers. Suppose timers A and B expire, and A's callback wants to schedule B. This cancels B from the expired timer list and re-adds it to the main list, so it doesn't execute in the current batch. This can happen forever. For |
To minimize risk, the changes here keep the "grab all the timers we should fire, then fire them" setup instead of switching to the "fire the timers one at a time" approach LayerImplFreeRTOS uses. The fix consists of the following parts: 1) Store the list of timers to fire in a member, so we can cancel things from that list as needed. 2) To avoid canceling things scheduled by ScheduleWork, remove the CancelTimer call in ScheduleWork. This does mean we now allow multiple timers for the same callback+appState in the timer list, if they were created by ScheduleWork, but that should be OK, since the only reason that pair needs to be unique is to allow cancellation and we never want to cancel the things ScheduleWork schedules. As a followup we should stop using the timer list for ScheduleWork altogether and use ScheduleLambda like LayerImplFreeRTOS does, but that involves fixing the unit tests that call ScheduleWork without actually running the platfor manager event loop and expect it to work somehow. TestRead was failing the sanity assert for not losing track of timers to fire, because it was spinning a nested event loop. The changes to that test stop it from doing that. Fixes project-chip#19387 Fixes project-chip#22160
To minimize risk, the changes here keep the "grab all the timers we should fire, then fire them" setup instead of switching to the "fire the timers one at a time" approach LayerImplFreeRTOS uses. The fix consists of the following parts: 1) Store the list of timers to fire in a member, so we can cancel things from that list as needed. 2) To avoid canceling things scheduled by ScheduleWork, remove the CancelTimer call in ScheduleWork. This does mean we now allow multiple timers for the same callback+appState in the timer list, if they were created by ScheduleWork, but that should be OK, since the only reason that pair needs to be unique is to allow cancellation and we never want to cancel the things ScheduleWork schedules. As a followup we should stop using the timer list for ScheduleWork altogether and use ScheduleLambda like LayerImplFreeRTOS does, but that involves fixing the unit tests that call ScheduleWork without actually running the platfor manager event loop and expect it to work somehow. TestRead was failing the sanity assert for not losing track of timers to fire, because it was spinning a nested event loop. The changes to that test stop it from doing that. Fixes project-chip#19387 Fixes project-chip#22160
To minimize risk, the changes here keep the "grab all the timers we should fire, then fire them" setup instead of switching to the "fire the timers one at a time" approach LayerImplFreeRTOS uses. The fix consists of the following parts: 1) Store the list of timers to fire in a member, so we can cancel things from that list as needed. 2) To avoid canceling things scheduled by ScheduleWork, remove the CancelTimer call in ScheduleWork. This does mean we now allow multiple timers for the same callback+appState in the timer list, if they were created by ScheduleWork, but that should be OK, since the only reason that pair needs to be unique is to allow cancellation and we never want to cancel the things ScheduleWork schedules. As a followup we should stop using the timer list for ScheduleWork altogether and use ScheduleLambda like LayerImplFreeRTOS does, but that involves fixing the unit tests that call ScheduleWork without actually running the platfor manager event loop and expect it to work somehow. TestRead was failing the sanity assert for not losing track of timers to fire, because it was spinning a nested event loop. The changes to that test stop it from doing that. Fixes project-chip#19387 Fixes project-chip#22160
* Duplicate src/system/tests/TestSystemScheduleLambda.cpp history in src/system/tests/TestSystemScheduleWork.cpp history. * Fix timer cancellation to be reliable in LayerImplSelect. To minimize risk, the changes here keep the "grab all the timers we should fire, then fire them" setup instead of switching to the "fire the timers one at a time" approach LayerImplFreeRTOS uses. The fix consists of the following parts: 1) Store the list of timers to fire in a member, so we can cancel things from that list as needed. 2) To avoid canceling things scheduled by ScheduleWork, remove the CancelTimer call in ScheduleWork. This does mean we now allow multiple timers for the same callback+appState in the timer list, if they were created by ScheduleWork, but that should be OK, since the only reason that pair needs to be unique is to allow cancellation and we never want to cancel the things ScheduleWork schedules. As a followup we should stop using the timer list for ScheduleWork altogether and use ScheduleLambda like LayerImplFreeRTOS does, but that involves fixing the unit tests that call ScheduleWork without actually running the platfor manager event loop and expect it to work somehow. TestRead was failing the sanity assert for not losing track of timers to fire, because it was spinning a nested event loop. The changes to that test stop it from doing that. Fixes #19387 Fixes #22160 * Add a unit test that timer cancelling works even for currently "in progress" timers * Address review comments. * Fix shadowing problem. * Turn off TestSystemScheduleWork on esp32 QEMU for now. * Turn off TestSystemScheduleWork on the fake platform too. The fake platform's event loop does not actually process the SystemLayer events. Co-authored-by: Andrei Litvin <[email protected]>
…ip#22375) * Duplicate src/system/tests/TestSystemScheduleLambda.cpp history in src/system/tests/TestSystemScheduleWork.cpp history. * Fix timer cancellation to be reliable in LayerImplSelect. To minimize risk, the changes here keep the "grab all the timers we should fire, then fire them" setup instead of switching to the "fire the timers one at a time" approach LayerImplFreeRTOS uses. The fix consists of the following parts: 1) Store the list of timers to fire in a member, so we can cancel things from that list as needed. 2) To avoid canceling things scheduled by ScheduleWork, remove the CancelTimer call in ScheduleWork. This does mean we now allow multiple timers for the same callback+appState in the timer list, if they were created by ScheduleWork, but that should be OK, since the only reason that pair needs to be unique is to allow cancellation and we never want to cancel the things ScheduleWork schedules. As a followup we should stop using the timer list for ScheduleWork altogether and use ScheduleLambda like LayerImplFreeRTOS does, but that involves fixing the unit tests that call ScheduleWork without actually running the platfor manager event loop and expect it to work somehow. TestRead was failing the sanity assert for not losing track of timers to fire, because it was spinning a nested event loop. The changes to that test stop it from doing that. Fixes project-chip#19387 Fixes project-chip#22160 * Add a unit test that timer cancelling works even for currently "in progress" timers * Address review comments. * Fix shadowing problem. * Turn off TestSystemScheduleWork on esp32 QEMU for now. * Turn off TestSystemScheduleWork on the fake platform too. The fake platform's event loop does not actually process the SystemLayer events. Co-authored-by: Andrei Litvin <[email protected]>
Problem
When we process timers, we grab a batch out of the list before we start calling them, and now the timers in that batch cannot be canceled. This can lead to use-after-free.
Proposed Solution
Either stop doing the batch thing and process timers one by one (maybe with a cap on how far down the list we go), or make the batch a member so that canceling can remove from the batch too. Discussion on Slack tended toward the former solution.
We should also consider making sure that the Select and FreeRTOS implementations are the same, so our CI tells us something about the FreeRTOS case.
@kpschoedel
The text was updated successfully, but these errors were encountered: