-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
overload: scale transport socket connect timeout #13800
Conversation
Splitting these will prevent circular dependencies when the Dispatcher references thread-local overload state. Signed-off-by: Alex Konradi <[email protected]>
Thread the ThreadLocalOverloadState down into the ConnectionHandlerImpl and ConnectionImpl and use it to create the timer for the transport socket handshake timeout. Signed-off-by: Alex Konradi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM; most of it was refactoring / interface changes.
@@ -143,6 +142,8 @@ OverloadTimerType parseTimerType( | |||
switch (config_timer_type) { | |||
case Config::HTTP_DOWNSTREAM_CONNECTION_IDLE: | |||
return OverloadTimerType::HttpDownstreamIdleConnectionTimeout; | |||
case Config::TRANSPORT_SOCKET_CONNECT: | |||
return OverloadTimerType::HttpDownstreamIdleConnectionTimeout; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think you meant TransportSocketConnectTimeout
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's embarrassing, and that there wasn't a test that caught it is problematic. I've added a test to verify that the correct scale factors are applied for the different timer types.
format check failed |
Add tests to verify that the correct scale factors are being applied based on the overload manager action config. Signed-off-by: Alex Konradi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed formatting too.
@@ -143,6 +142,8 @@ OverloadTimerType parseTimerType( | |||
switch (config_timer_type) { | |||
case Config::HTTP_DOWNSTREAM_CONNECTION_IDLE: | |||
return OverloadTimerType::HttpDownstreamIdleConnectionTimeout; | |||
case Config::TRANSPORT_SOCKET_CONNECT: | |||
return OverloadTimerType::HttpDownstreamIdleConnectionTimeout; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's embarrassing, and that there wasn't a test that caught it is problematic. I've added a test to verify that the correct scale factors are applied for the different timer types.
This addresses the clang-tidy issue Signed-off-by: Alex Konradi <[email protected]>
Signed-off-by: Alex Konradi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see this happening. First set of comments related just to PR size and unrelated looking changes.
// Test-only functions that need access to the internals of | ||
// ScaledTimerMinimum. These are declared here but only defined in tests. | ||
friend bool operator==(const ScaledTimerMinimum&, const ScaledTimerMinimum&); | ||
friend std::ostream& operator<<(std::ostream&, const ScaledTimerMinimum&); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to implement these functions outside tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only if we want to write these as output anywhere. We don't need to yet, and this avoids (a tiny bit of) binary bloat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems very unusual to implement these functions in a different library. Consider the case of "ForTest" methods. We tend to implement those in the same library as the main object despite they only being used in tests. I think that modern linkers are capable of removing unused code from the resulting binaries, but don't know if they do so by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sold, the confusion isn't worth the tiny bit of size gains. Moved these definitions inline.
#include "envoy/event/timer.h" | ||
#include "envoy/thread_local/thread_local.h" | ||
#include "envoy/event/dispatcher.h" | ||
#include "envoy/server/overload/thread_local_overload_state.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The refactor to add a directory for overload manager components in the server subdirectory and thread_local_object.h adds a lot of noise to this review. Worth doing it as a separate change instead? Also see comments about forward declaring instead.
Example:
envoy/include/envoy/network/connection.h
Line 19 in db756af
class Dispatcher; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to do the refactor in a separate PR. I don't have strong opinions about forward declarations vs #include
s other than an inclination to follow the style guide.
|
||
void ServerConnectionImpl::setTransportSocketConnectTimeout(std::chrono::milliseconds timeout) { | ||
if (!transport_connect_pending_) { | ||
return; | ||
} | ||
if (transport_socket_connect_timer_ == nullptr) { | ||
transport_socket_connect_timer_ = | ||
dispatcher_.createTimer([this] { onTransportSocketConnectTimeout(); }); | ||
overload_state_.createScaledTimer(Server::OverloadTimerType::TransportSocketConnectTimeout, | ||
[this] { onTransportSocketConnectTimeout(); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should revisit the possibility of having the dispatcher be the one that is aware of the overload state and provide a method to create scaled timers. Dispatcher is plumbed in widely.
I can see a counter argument involving our desire to bring overload manager closer to connections as a starting point for adjusting work done on wakeup based on overload state; something like smaller allocations/reads when under memory pressure. So this new plumbing may be more generally useful. But I think the direction we're taking there is to measure memory usage by connection, which actually doesn't require involvement from the overload mananger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned elsewhere, I'd rather keep scaled timer creation going through the Overload Manager since it is ultimately responsible for determining the scale factor of those timers. We can revisit making the Dispatcher aware of the OM later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of plumbing the OM widely, could the dispatcher hold a reference to it? I dislike that there are now two very different places to create timers: OM and dispatcher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a chicken-and-egg problem there since the thread-local overload state requires a reference to the dispatcher. We could work around this, though, if that sounds preferable. It seems like the larger issue is that there is a divide between the dispatcher, which represents a thread worker, and thread-local state, which is managed elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dispatcher is also owned by the thread. I wonder if the solution would be to have the dispatcher own the thread overload state instead of having it be part of the thread local storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dispatcher is also owned by the thread. I wonder if the solution would be to have the dispatcher own the thread overload state instead of having it be part of the thread local storage.
Yeah I think that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started working on this. It's doable but introduces some weirdness; see #14401 for more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a fresh attempt at this in #14679 which feels reasonably clean.
@@ -6,30 +6,13 @@ | |||
|
|||
#include "envoy/common/pure.h" | |||
#include "envoy/event/dispatcher.h" | |||
#include "envoy/thread_local/thread_local_object.h" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the dispatcher could be forward declared and dispatcher interface library depedency removed as a dependency if that would make some of these changes easier.
@@ -9,6 +9,7 @@ | |||
#include "envoy/api/api.h" | |||
#include "envoy/network/listen_socket.h" | |||
#include "envoy/network/listener.h" | |||
#include "envoy/server/overload/overload_manager.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this include needed? It seems that envoy/server/overload/thread_local_overload_state.h would be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither is needed since thread_local_overload_state.h is included in the interface header.
/lgtm api |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to do the refactoring in a separate PR, or to talk about making the thread-local overload state accessible via the dispatcher.
// Test-only functions that need access to the internals of | ||
// ScaledTimerMinimum. These are declared here but only defined in tests. | ||
friend bool operator==(const ScaledTimerMinimum&, const ScaledTimerMinimum&); | ||
friend std::ostream& operator<<(std::ostream&, const ScaledTimerMinimum&); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only if we want to write these as output anywhere. We don't need to yet, and this avoids (a tiny bit of) binary bloat.
#include "envoy/event/timer.h" | ||
#include "envoy/thread_local/thread_local.h" | ||
#include "envoy/event/dispatcher.h" | ||
#include "envoy/server/overload/thread_local_overload_state.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to do the refactor in a separate PR. I don't have strong opinions about forward declarations vs #include
s other than an inclination to follow the style guide.
See #13800 (comment) If you decide to expose this via dispatcher instead, I suggest you add a createRangeTimer method to the dispatcher interface instead of providing a way to access thread locals via the dispatcher interface. |
As discussed offline, I think I'd prefer keeping the scaled timer accessible via the OverloadManager, not the Dispatcher, since the scaling factor is determined by the OM. To that end, I've sent #13999 to do the refactoring requested above separate from this PR. |
…-scaled-timeout Signed-off-by: Alex Konradi <[email protected]>
@@ -93,6 +93,11 @@ message ScaleTimersOverloadActionConfig { | |||
// Adjusts the idle timer for downstream HTTP connections that takes effect when there are no active streams. | |||
// This affects the value of :ref:`RouteAction.idle_timeout <envoy_v3_api_field_config.route.v3.RouteAction.idle_timeout>`. | |||
HTTP_DOWNSTREAM_CONNECTION_IDLE = 1; | |||
|
|||
// Adjusts the timer for how long downstream clients have to finish transport-level negotiations before the connection is closed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This line seem very long. Can you rerun format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed manually since the formatter doesn't seem to touch proto comments.
// Test-only functions that need access to the internals of | ||
// ScaledTimerMinimum. These are declared here but only defined in tests. | ||
friend bool operator==(const ScaledTimerMinimum&, const ScaledTimerMinimum&); | ||
friend std::ostream& operator<<(std::ostream&, const ScaledTimerMinimum&); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems very unusual to implement these functions in a different library. Consider the case of "ForTest" methods. We tend to implement those in the same library as the main object despite they only being used in tests. I think that modern linkers are capable of removing unused code from the resulting binaries, but don't know if they do so by default.
return ConnectionMocks{std::move(dispatcher), timer, std::move(transport_socket), file_event, | ||
&file_ready_cb_}; | ||
return ConnectionMocks{std::move(dispatcher), {}, timer, | ||
std::move(transport_socket), file_event, &file_ready_cb_}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unusual looking formatting, can you rerun format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reran format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that the {}'s around the arguments are confusing the formatter, ugh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: will explicitly constructing Server::MockThreadLocalOverloadState()
make formatter happy?
)YAML"; | ||
|
||
// These are the timer types according to the reduced timeouts config above. | ||
constexpr std::pair<OverloadTimerType, Event::ScaledTimerMinimum> kReducedTimeoutsMinimums[]{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Please move this list of cases inside the relevant test case to avoid having a large separation between these definitions and the test body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This list matches exactly what kReducedTimeoutsConfig produces, so I wanted to keep them close together. Please push back if that's not a good enough reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know which of the two separations is more problematic. I would consider having both in the test case so everything is in 1 place, or move OverloadManagerImplTest.TimerTypesProduceCorrectMinimums so it is just below this list definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the test case; that's a good option.
I'm trying to figure out what integrating the thread-local overload state into the dispatcher would look like. Should I consider that a blocker for getting this PR merged? |
I defer to @ggreenway 's opinion. Seems like you accumulated some merge conflicts as your other PRs which modify some of the same files got merged in. |
Yes, I'd like to have that sorted out before we merge this. |
Signed-off-by: Alex Konradi <[email protected]>
Signed-off-by: Alex Konradi <[email protected]>
I've merged main. This is now using Dispatcher::createScaledTimer, which obviates the need for plumbing around the ThreadLocalOverloadState. |
/retest |
Retrying Azure Pipelines: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great otherwise.
@ggreenway could you take a look?
…caled-timeout Signed-off-by: Alex Konradi <[email protected]>
Signed-off-by: Alex Konradi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, aside from one nit.
…caled-timeout Signed-off-by: Alex Konradi <[email protected]>
Signed-off-by: Alex Konradi <[email protected]>
/retest |
Retrying Azure Pipelines: |
@envoyproxy/api-shepherds for the new timer type |
Commit Message: Scale transport socket connect timeout in response to load
Additional Description:
Add support for scaling the transport socket connect timeout with load.
Risk Level: low
Testing: added tests and ran affected tests
Docs Changes: none
Release Notes: none
Platform Specific Features: none
Fixes: #11426
/assign @KBaichoo