Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

event: scaled timers using dynamic queues #13129

Merged
merged 21 commits into from
Oct 5, 2020

Conversation

akonradi
Copy link
Contributor

Commit Message: Add ScaledRangeTimerManager using dynamic queues
Additional Description:
The ScaledRangeTimerManager will be used to implement overload actions
that scale timeouts, as suggested for #11427. The timer objects returned remain
attached to the manager so that their actual trigger points can be adjusted via
changes to the manager's scale factor.

Risk Level: low (unused code)
Testing: ran unit tests
Docs Changes: none
Release Notes: none

#11427
/cc @antoniovicente

Add an interface for timers that can be enabled for some timeout within
a range. The actual choice of when the timer should be triggered is
not specified by the interface. Possible implementations of the
interface include

 - timers that are triggered when a thread is not busy
 - timers that are triggered early/late in response to load

Signed-off-by: Alex Konradi <[email protected]>
The Google Test repository has endorsed Abseil's "live at head"
philosophy. Bump to a newer commit that includes google/googletest/envoyproxy#2350
for ease of writing tests.

Signed-off-by: Alex Konradi <[email protected]>
Add a ScaledRangeTimerManager class that produces RangeTimers that can
be adjusted after the fact by changing the scaling factor on the
manager. This class implements the dynamic queues approach suggested
in envoyproxy#11427#issuecomment-691154144. The code is tested but not yet
integrated anywhere.

Signed-off-by: Alex Konradi <[email protected]>
@repokitteh-read-only
Copy link

CC @envoyproxy/dependency-watchers: FYI only for changes made to (bazel/repository_locations\.bzl)|(api/bazel/repository_locations\.bzl)|(.*/requirements\.txt).

🐱

Caused by: #13129 was opened by akonradi.

see: more, trace.

@antoniovicente
Copy link
Contributor

/assign @antoniovicente

DoAll now needs to be pulled into scope explicitly with a `using`
declaration.

Signed-off-by: Alex Konradi <[email protected]>
Copy link
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Alex, this seems like huge progress towards implementation of overload manager aware timeouts. Sorry for the laundry list of comments, it sometimes unavoidable on PRs of this size and complexity.

ASSERT(!queue.range_timers.empty());
auto item = std::move(queue.range_timers.front());
queue.range_timers.pop_front();
item.timer.trigger();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to trigger timers in a loop in cases where there are multiple timers at the head of the queue that have already expired?

Note that real timers always execute in the next iteration of the event loop. This implies that in the current implmentation at most 1 scaled timer can trigger per event loop iteration.

Another thing to consider is how to integrate with #13104. Ideally we'ld touch the watchdogs between each scaled timer execution; you'll need help from the dispatcher to do that.

Please add a test that verifies that multiple timers that are older than the current monotonic time fire as a group even if their exact trigger times are different. The way to verify that the timers fire on the same iteration requires hooking into the event loop's prepare callback as done in test/common/event/dispatcher_impl_test.cc test cases including SchedulableCallbackImplTest.RescheduleNext

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to fire multiple timers as a group? It seems like we can get the same watchdog petting between callback executions by rescheduling the queue's timer with duration 0. I guess that might be extra expensive, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timers with 0 duration run on the next iteration of the event loop after #11823

In order to schedule a second call in the same loop iteration you would need to use to use SchedulableCallback::scheduleCallbackCurrentIteration to schedule in cases where remaining max-min duration is 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Triggering expired timers as a group seems like a good way to implement this. We can address the watchdog petting issues in a followup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to trigger multiple timers when a queue timer expires.

WaitingForMin(std::chrono::milliseconds duration) : duration(duration) {}

// The number for the bucket this timer will be placed in.
const std::chrono::milliseconds duration;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could use a more descriptive name like scalable_duration. Also, the comment seems out of date.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


ScaledRangeTimerManager& manager_;
const TimerCb callback_;
const TimerPtr pending_timer_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Possilbly better name: min_duration_timer_

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

const ScopeTrackedObject* scope_;
};

ScaledRangeTimerManager::ScaledRangeTimerManager(Dispatcher& dispatcher, double scale_factor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected scale factor to always be 1.0 on construction. Consider removing scale_factor constructor argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

EXPECT_FALSE(timer->enabled());
}

TEST_F(ScaledRangeTimerManagerTest, DisableWhileActive) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does pending and active mean in this context? Seems different from the states encoded in the RangeTimerImpl absl::variant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending was from an old implementation, should be fixed now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "active" mean in this context?

I see other references to "active" below, it seems that it refers to ScalingMax.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


timer->enableTimer(std::chrono::seconds(5), std::chrono::seconds(100));

simTime().advanceTimeAsync(std::chrono::seconds(5));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding some assertions about the queue state after the advance and run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not part of the public interface, so I'm hesitant to test it directly.

test/common/event/scaled_range_timer_manager_test.cc Outdated Show resolved Hide resolved

TEST_F(ScaledRangeTimerManagerTest, ScheduledWithScalingFactorZero) {
ScaledRangeTimerManager manager(dispatcher_, 0.0);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test that covers use of setScaleFactor. There seems to be no tests that exercise that function.
Also, prefer use of setScaleFactor instead of passing in a scale value to the constructor.

I recommend exercising some some in-between scale factors like 0.5 with timers that are expected to fire with different orders when scale = 1.0 and scale = 0.5
Cover both the case of timers added before the scale change and timers added after the scale change. Bonus points for mixing the two.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MultipleTimersWithScaling does some of this. I've added some more tests.


EXPECT_THAT(*timers[0].trigger_times, ElementsAre(T + std::chrono::seconds(3)));
EXPECT_THAT(*timers[1].trigger_times, ElementsAre(T + std::chrono::seconds(3)));
EXPECT_THAT(*timers[2].trigger_times, ElementsAre(T + std::chrono::seconds(3)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comment I added to ScaledRangeTimerManager::onQueueTimerFired

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bit about heterogeneous lookup? Please clarify.

Copy link
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there. Thanks again for this very exciting change.

timers.emplace_back(manager, simTime());
}

const MonotonicTime T = simTime().monotonicTime();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name 'T' doesn't follow style guidelines. I recommend changing the name to "start"'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.


EXPECT_THAT(*timers[0].trigger_times, ElementsAre(T + std::chrono::seconds(2)));
EXPECT_THAT(*timers[1].trigger_times, ElementsAre(T + std::chrono::seconds(2)));
EXPECT_THAT(*timers[2].trigger_times, ElementsAre(T + std::chrono::seconds(2)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test that verifies that multiple timers that are older than the current monotonic time fire as a group even if their exact trigger times are different. The way to verify that the timers fire on the same iteration requires hooking into the event loop's prepare callback as done in test/common/event/dispatcher_impl_test.cc test cases including SchedulableCallbackImplTest.RescheduleNext

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer. Added a test.

ScaledRangeTimerManager::ScalingTimerHandle handle_;
};

void onPendingTimerComplete() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit about naming consistency: I think you used to refer to WaitingForMin as Pending. Consider changing name of this function to onMinTimerComplete

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* Implementation of RangeTimer that can be scaled by the backing manager object.
*
* Instances of this class exist in one of 3 states:
* - disabled: not enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change disabled to inactive for consistency with name of state_ alternatives.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* Instances of this class exist in one of 3 states:
* - disabled: not enabled
* - waiting-for-min: enabled, min timeout not elapsed
* - scaling-max: enabled, min timeout elapsed, max timeout not elapsed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth commenting about the expected transitions from inactive to waiting-for-min to scaling-max; usually we start inactive, enter wait-for-min for the requested min duration and transition to scaling-max when the min duration timer expires.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

EXPECT_FALSE(timer->enabled());
}

TEST_F(ScaledRangeTimerManagerTest, DisableWhileWaitingForMax) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant: DisableWhileWaitingForMin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


timer->enableTimer(std::chrono::seconds(1), std::chrono::seconds(1));

simTime().advanceTimeAndRun(std::chrono::seconds(1), dispatcher_, Dispatcher::RunType::Block);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add: EXPECT_FALSE(timer->enabled());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


EXPECT_CALL(callback, Call).WillOnce([&] { EXPECT_EQ(dispatcher_.scope_, getScope()); });

timer->enableTimer(std::chrono::seconds(0), std::chrono::seconds(1), getScope());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the only test case that has min time set to 0. Should we have some regular trigger and disable tests with min time of 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

simTime().advanceTimeAndRun(std::chrono::seconds(5), dispatcher_, Dispatcher::RunType::Block);
simTime().advanceTimeAndRun(std::chrono::seconds(5), dispatcher_, Dispatcher::RunType::Block);

timer1->disableTimer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that you want to cover behavior in cases where the timer scaling max which is responsible for the queue timer is disabled, and check that the next timer in the queue takes over. There are no assertions on changes to the queue timer.

Possible suggestion: Advance time to T=30 and verify that nothing triggers, move to T=35 and verify that a timer fires.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

EXPECT_THAT(*timer.trigger_times, ElementsAre(T + std::chrono::seconds(4)));
}

TEST_F(ScaledRangeTimerManagerTest, ScheduledWithMaxBeforeMin) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comment explaining expected behavior; cases where max < min are treated as if max == min.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

antoniovicente
antoniovicente previously approved these changes Sep 23, 2020
@@ -515,6 +514,7 @@ TEST_F(ScaledRangeTimerManagerTest, MultipleTimersWithChangeInScalingFactor) {
const MonotonicTime start = simTime().monotonicTime();

std::vector<TrackedTimer> timers;
timers.reserve(3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean 4 instead of 3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 thanks, fixed.

*
* Some methods combine multiple state transitions; enableTimer(0, max) on a
* timer in the scaling-max state will logically execute the transition sequence
* [scaling-max -> inactive -> waiting-for-min -> scaling-max] in a single
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the transition from scaling-max to inactive come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the timer is already enabled, the first step is to disable it. Then we can enable it again for the new duration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that the timer described in the comment is initially in the scaling-max state. The example may have been easier to follow if the initial state had been waiting-for-min

dispatcher_.createSchedulableCallback([&] { schedulable_watcher.ready(); });

testing::Expectation first_prepare = EXPECT_CALL(prepare_watcher, ready());
testing::ExpectationSet after_first_prepare;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.After is new to me. Cool stuff!

@@ -440,45 +532,45 @@ TEST_F(ScaledRangeTimerManagerTest, MultipleTimersWithChangeInScalingFactor) {

manager.setScaleFactor(0.5);

// Now that the scale factor is 0.5, fire times are 0: T+10, 1: T+13, 2: T+14, 3: T+13.
// Advance to timer 2's min.
// Now that the scale factor is 0.5, fire times are 0: start 10, 1: start 13, 2: start 14, 3:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You lost some of the + after the T to start rename.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

antoniovicente
antoniovicente previously approved these changes Sep 23, 2020
Copy link
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks really good. Thanks for sticking with it until the end, looking forward to softer overload actions that protect the proxy while reducing query loss.

Off to @mattklein123 or @envoyproxy/senior-maintainers for review and merge.

*
* Some methods combine multiple state transitions; enableTimer(0, max) on a
* timer in the scaling-max state will logically execute the transition sequence
* [scaling-max -> inactive -> waiting-for-min -> scaling-max] in a single
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that the timer described in the comment is initially in the scaling-max state. The example may have been easier to follow if the initial state had been waiting-for-min

@akonradi
Copy link
Contributor Author

/cc @htuch since you had left comments on the doc

@akonradi
Copy link
Contributor Author

@mattklein123 @htuch this implements the first scheme from the doc, which it sounds like might be sufficient for our needs. Tagging you both since you left feedback on the doc, PTaL.

@mattklein123
Copy link
Member

Sure happy to take a look.

@mattklein123 mattklein123 self-assigned this Sep 30, 2020
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll defer to others on implementation review, but I'm happy with this after a high-level read over the PR, nice work!

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just reviewed the code. Overall LGTM with a few small mostly comment comments. Thanks!

/wait

Comment on lines 35 to 36
virtual void enableTimer(const std::chrono::milliseconds& min_ms,
const std::chrono::milliseconds& max_ms,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perf nit: pass by value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, and fixed Event::Timer::enableTimer as well.

Comment on lines 13 to 16
/**
* Class for creating RangeTimer objects that can be adjusted towards either the minimum or maximum
* of their range by the owner of the manager object.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a brief summary of the design doc here so the reader has some idea of the underlying implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 29 to 32
/**
* Sets the scale factor for all timers created through this manager. The value should be between
* 0 and 1, inclusive.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some comments on the effect this API has on new/existing timers? Possibly covered by the top level comment I asked for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

double value() const { return value_; }

private:
double value_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const, same elsewhere where you can.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish, but then assignment doesn't work.

ASSERT(dispatcher_.isThreadSafe());
auto it = queues_.find(duration);
if (it == queues_.end()) {
auto queue = std::make_unique<Queue>(duration, *this, dispatcher_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try to explain this code back to you to see if I understand it:

  1. Every range timer has an individual min timer that fires independently.
  2. Once the min timer fires, if there is a gap between min and max, we take the duration of the min/max delta, and put it in a queue unique to that delta. The assumption here is that there will be a small number of min/max configured so the number of queues is not that big.
  3. Each queue has a timer that fires for the min/max duration scaled, and flushes all timers that have expired, and then resets for the next duration that needs to fire.

If so that makes sense, but per my other comments can you add a bunch more comments in these two files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's correct. Added plenty of comments.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for the comments. LGTM modulo format fix.

/wait

Signed-off-by: Alex Konradi <[email protected]>
Signed-off-by: Alex Konradi <[email protected]>
Signed-off-by: Alex Konradi <[email protected]>
@akonradi
Copy link
Contributor Author

akonradi commented Oct 2, 2020

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines, to retry CircleCI checks, use /retest-circle.
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #13129 (comment) was created by @akonradi.

see: more, trace.

@akonradi
Copy link
Contributor Author

akonradi commented Oct 5, 2020

Looks like the failing test is //test/extensions/transport_sockets/tls:handshaker_test which is flaky under Windows.

@mattklein123 mattklein123 merged commit 80138b2 into envoyproxy:master Oct 5, 2020
@akonradi akonradi deleted the dynamic-queues-scaled-timer branch October 5, 2020 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants