-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
event: scaled timers using dynamic queues #13129
event: scaled timers using dynamic queues #13129
Conversation
Add an interface for timers that can be enabled for some timeout within a range. The actual choice of when the timer should be triggered is not specified by the interface. Possible implementations of the interface include - timers that are triggered when a thread is not busy - timers that are triggered early/late in response to load Signed-off-by: Alex Konradi <[email protected]>
The Google Test repository has endorsed Abseil's "live at head" philosophy. Bump to a newer commit that includes google/googletest/envoyproxy#2350 for ease of writing tests. Signed-off-by: Alex Konradi <[email protected]>
Signed-off-by: Alex Konradi <[email protected]>
Add a ScaledRangeTimerManager class that produces RangeTimers that can be adjusted after the fact by changing the scaling factor on the manager. This class implements the dynamic queues approach suggested in envoyproxy#11427#issuecomment-691154144. The code is tested but not yet integrated anywhere. Signed-off-by: Alex Konradi <[email protected]>
/assign @antoniovicente |
DoAll now needs to be pulled into scope explicitly with a `using` declaration. Signed-off-by: Alex Konradi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Alex, this seems like huge progress towards implementation of overload manager aware timeouts. Sorry for the laundry list of comments, it sometimes unavoidable on PRs of this size and complexity.
ASSERT(!queue.range_timers.empty()); | ||
auto item = std::move(queue.range_timers.front()); | ||
queue.range_timers.pop_front(); | ||
item.timer.trigger(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to trigger timers in a loop in cases where there are multiple timers at the head of the queue that have already expired?
Note that real timers always execute in the next iteration of the event loop. This implies that in the current implmentation at most 1 scaled timer can trigger per event loop iteration.
Another thing to consider is how to integrate with #13104. Ideally we'ld touch the watchdogs between each scaled timer execution; you'll need help from the dispatcher to do that.
Please add a test that verifies that multiple timers that are older than the current monotonic time fire as a group even if their exact trigger times are different. The way to verify that the timers fire on the same iteration requires hooking into the event loop's prepare callback as done in test/common/event/dispatcher_impl_test.cc test cases including SchedulableCallbackImplTest.RescheduleNext
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to fire multiple timers as a group? It seems like we can get the same watchdog petting between callback executions by rescheduling the queue's timer with duration 0. I guess that might be extra expensive, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timers with 0 duration run on the next iteration of the event loop after #11823
In order to schedule a second call in the same loop iteration you would need to use to use SchedulableCallback::scheduleCallbackCurrentIteration to schedule in cases where remaining max-min duration is 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Triggering expired timers as a group seems like a good way to implement this. We can address the watchdog petting issues in a followup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this to trigger multiple timers when a queue timer expires.
WaitingForMin(std::chrono::milliseconds duration) : duration(duration) {} | ||
|
||
// The number for the bucket this timer will be placed in. | ||
const std::chrono::milliseconds duration; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could use a more descriptive name like scalable_duration. Also, the comment seems out of date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
||
ScaledRangeTimerManager& manager_; | ||
const TimerCb callback_; | ||
const TimerPtr pending_timer_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Possilbly better name: min_duration_timer_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
const ScopeTrackedObject* scope_; | ||
}; | ||
|
||
ScaledRangeTimerManager::ScaledRangeTimerManager(Dispatcher& dispatcher, double scale_factor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have expected scale factor to always be 1.0 on construction. Consider removing scale_factor constructor argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
EXPECT_FALSE(timer->enabled()); | ||
} | ||
|
||
TEST_F(ScaledRangeTimerManagerTest, DisableWhileActive) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does pending and active mean in this context? Seems different from the states encoded in the RangeTimerImpl absl::variant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pending was from an old implementation, should be fixed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "active" mean in this context?
I see other references to "active" below, it seems that it refers to ScalingMax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
||
timer->enableTimer(std::chrono::seconds(5), std::chrono::seconds(100)); | ||
|
||
simTime().advanceTimeAsync(std::chrono::seconds(5)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth adding some assertions about the queue state after the advance and run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not part of the public interface, so I'm hesitant to test it directly.
|
||
TEST_F(ScaledRangeTimerManagerTest, ScheduledWithScalingFactorZero) { | ||
ScaledRangeTimerManager manager(dispatcher_, 0.0); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test that covers use of setScaleFactor. There seems to be no tests that exercise that function.
Also, prefer use of setScaleFactor instead of passing in a scale value to the constructor.
I recommend exercising some some in-between scale factors like 0.5 with timers that are expected to fire with different orders when scale = 1.0 and scale = 0.5
Cover both the case of timers added before the scale change and timers added after the scale change. Bonus points for mixing the two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MultipleTimersWithScaling does some of this. I've added some more tests.
|
||
EXPECT_THAT(*timers[0].trigger_times, ElementsAre(T + std::chrono::seconds(3))); | ||
EXPECT_THAT(*timers[1].trigger_times, ElementsAre(T + std::chrono::seconds(3))); | ||
EXPECT_THAT(*timers[2].trigger_times, ElementsAre(T + std::chrono::seconds(3))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the comment I added to ScaledRangeTimerManager::onQueueTimerFired
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bit about heterogeneous lookup? Please clarify.
Signed-off-by: Alex Konradi <[email protected]>
Signed-off-by: Alex Konradi <[email protected]>
…aled-timer Signed-off-by: Alex Konradi <[email protected]>
Signed-off-by: Alex Konradi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there. Thanks again for this very exciting change.
timers.emplace_back(manager, simTime()); | ||
} | ||
|
||
const MonotonicTime T = simTime().monotonicTime(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable name 'T' doesn't follow style guidelines. I recommend changing the name to "start"'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
|
||
EXPECT_THAT(*timers[0].trigger_times, ElementsAre(T + std::chrono::seconds(2))); | ||
EXPECT_THAT(*timers[1].trigger_times, ElementsAre(T + std::chrono::seconds(2))); | ||
EXPECT_THAT(*timers[2].trigger_times, ElementsAre(T + std::chrono::seconds(2))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test that verifies that multiple timers that are older than the current monotonic time fire as a group even if their exact trigger times are different. The way to verify that the timers fire on the same iteration requires hooking into the event loop's prepare callback as done in test/common/event/dispatcher_impl_test.cc test cases including SchedulableCallbackImplTest.RescheduleNext
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pointer. Added a test.
ScaledRangeTimerManager::ScalingTimerHandle handle_; | ||
}; | ||
|
||
void onPendingTimerComplete() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit about naming consistency: I think you used to refer to WaitingForMin as Pending. Consider changing name of this function to onMinTimerComplete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* Implementation of RangeTimer that can be scaled by the backing manager object. | ||
* | ||
* Instances of this class exist in one of 3 states: | ||
* - disabled: not enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change disabled to inactive for consistency with name of state_ alternatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* Instances of this class exist in one of 3 states: | ||
* - disabled: not enabled | ||
* - waiting-for-min: enabled, min timeout not elapsed | ||
* - scaling-max: enabled, min timeout elapsed, max timeout not elapsed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth commenting about the expected transitions from inactive to waiting-for-min to scaling-max; usually we start inactive, enter wait-for-min for the requested min duration and transition to scaling-max when the min duration timer expires.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
EXPECT_FALSE(timer->enabled()); | ||
} | ||
|
||
TEST_F(ScaledRangeTimerManagerTest, DisableWhileWaitingForMax) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you meant: DisableWhileWaitingForMin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
timer->enableTimer(std::chrono::seconds(1), std::chrono::seconds(1)); | ||
|
||
simTime().advanceTimeAndRun(std::chrono::seconds(1), dispatcher_, Dispatcher::RunType::Block); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add: EXPECT_FALSE(timer->enabled());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
EXPECT_CALL(callback, Call).WillOnce([&] { EXPECT_EQ(dispatcher_.scope_, getScope()); }); | ||
|
||
timer->enableTimer(std::chrono::seconds(0), std::chrono::seconds(1), getScope()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the only test case that has min time set to 0. Should we have some regular trigger and disable tests with min time of 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
simTime().advanceTimeAndRun(std::chrono::seconds(5), dispatcher_, Dispatcher::RunType::Block); | ||
simTime().advanceTimeAndRun(std::chrono::seconds(5), dispatcher_, Dispatcher::RunType::Block); | ||
|
||
timer1->disableTimer(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that you want to cover behavior in cases where the timer scaling max which is responsible for the queue timer is disabled, and check that the next timer in the queue takes over. There are no assertions on changes to the queue timer.
Possible suggestion: Advance time to T=30 and verify that nothing triggers, move to T=35 and verify that a timer fires.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
EXPECT_THAT(*timer.trigger_times, ElementsAre(T + std::chrono::seconds(4))); | ||
} | ||
|
||
TEST_F(ScaledRangeTimerManagerTest, ScheduledWithMaxBeforeMin) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comment explaining expected behavior; cases where max < min are treated as if max == min.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Signed-off-by: Alex Konradi <[email protected]>
Signed-off-by: Alex Konradi <[email protected]>
@@ -515,6 +514,7 @@ TEST_F(ScaledRangeTimerManagerTest, MultipleTimersWithChangeInScalingFactor) { | |||
const MonotonicTime start = simTime().monotonicTime(); | |||
|
|||
std::vector<TrackedTimer> timers; | |||
timers.reserve(3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean 4 instead of 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 thanks, fixed.
* | ||
* Some methods combine multiple state transitions; enableTimer(0, max) on a | ||
* timer in the scaling-max state will logically execute the transition sequence | ||
* [scaling-max -> inactive -> waiting-for-min -> scaling-max] in a single |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the transition from scaling-max to inactive come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the timer is already enabled, the first step is to disable it. Then we can enable it again for the new duration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed that the timer described in the comment is initially in the scaling-max state. The example may have been easier to follow if the initial state had been waiting-for-min
dispatcher_.createSchedulableCallback([&] { schedulable_watcher.ready(); }); | ||
|
||
testing::Expectation first_prepare = EXPECT_CALL(prepare_watcher, ready()); | ||
testing::ExpectationSet after_first_prepare; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.After is new to me. Cool stuff!
@@ -440,45 +532,45 @@ TEST_F(ScaledRangeTimerManagerTest, MultipleTimersWithChangeInScalingFactor) { | |||
|
|||
manager.setScaleFactor(0.5); | |||
|
|||
// Now that the scale factor is 0.5, fire times are 0: T+10, 1: T+13, 2: T+14, 3: T+13. | |||
// Advance to timer 2's min. | |||
// Now that the scale factor is 0.5, fire times are 0: start 10, 1: start 13, 2: start 14, 3: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You lost some of the + after the T to start rename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed.
Signed-off-by: Alex Konradi <[email protected]>
…aled-timer Signed-off-by: Alex Konradi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks really good. Thanks for sticking with it until the end, looking forward to softer overload actions that protect the proxy while reducing query loss.
Off to @mattklein123 or @envoyproxy/senior-maintainers for review and merge.
* | ||
* Some methods combine multiple state transitions; enableTimer(0, max) on a | ||
* timer in the scaling-max state will logically execute the transition sequence | ||
* [scaling-max -> inactive -> waiting-for-min -> scaling-max] in a single |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed that the timer described in the comment is initially in the scaling-max state. The example may have been easier to follow if the initial state had been waiting-for-min
/cc @htuch since you had left comments on the doc |
…aled-timer Signed-off-by: Alex Konradi <[email protected]>
@mattklein123 @htuch this implements the first scheme from the doc, which it sounds like might be sufficient for our needs. Tagging you both since you left feedback on the doc, PTaL. |
Sure happy to take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll defer to others on implementation review, but I'm happy with this after a high-level read over the PR, nice work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just reviewed the code. Overall LGTM with a few small mostly comment comments. Thanks!
/wait
include/envoy/event/range_timer.h
Outdated
virtual void enableTimer(const std::chrono::milliseconds& min_ms, | ||
const std::chrono::milliseconds& max_ms, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perf nit: pass by value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, and fixed Event::Timer::enableTimer as well.
/** | ||
* Class for creating RangeTimer objects that can be adjusted towards either the minimum or maximum | ||
* of their range by the owner of the manager object. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a brief summary of the design doc here so the reader has some idea of the underlying implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
/** | ||
* Sets the scale factor for all timers created through this manager. The value should be between | ||
* 0 and 1, inclusive. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some comments on the effect this API has on new/existing timers? Possibly covered by the top level comment I asked for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
double value() const { return value_; } | ||
|
||
private: | ||
double value_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: const, same elsewhere where you can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish, but then assignment doesn't work.
ASSERT(dispatcher_.isThreadSafe()); | ||
auto it = queues_.find(duration); | ||
if (it == queues_.end()) { | ||
auto queue = std::make_unique<Queue>(duration, *this, dispatcher_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me try to explain this code back to you to see if I understand it:
- Every range timer has an individual min timer that fires independently.
- Once the min timer fires, if there is a gap between min and max, we take the duration of the min/max delta, and put it in a queue unique to that delta. The assumption here is that there will be a small number of min/max configured so the number of queues is not that big.
- Each queue has a timer that fires for the min/max duration scaled, and flushes all timers that have expired, and then resets for the next duration that needs to fire.
If so that makes sense, but per my other comments can you add a bunch more comments in these two files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's correct. Added plenty of comments.
Signed-off-by: Alex Konradi <[email protected]>
Signed-off-by: Alex Konradi <[email protected]>
Signed-off-by: Alex Konradi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks for the comments. LGTM modulo format fix.
/wait
Signed-off-by: Alex Konradi <[email protected]>
Signed-off-by: Alex Konradi <[email protected]>
Signed-off-by: Alex Konradi <[email protected]>
/retest |
Retrying Azure Pipelines, to retry CircleCI checks, use |
Signed-off-by: Alex Konradi <[email protected]>
Looks like the failing test is |
Commit Message: Add ScaledRangeTimerManager using dynamic queues
Additional Description:
The ScaledRangeTimerManager will be used to implement overload actions
that scale timeouts, as suggested for #11427. The timer objects returned remain
attached to the manager so that their actual trigger points can be adjusted via
changes to the manager's scale factor.
Risk Level: low (unused code)
Testing: ran unit tests
Docs Changes: none
Release Notes: none
#11427
/cc @antoniovicente