-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce Least Request LB active request bias config #11252
Changes from 18 commits
a1b52fd
762e61b
4cf5f19
0a2b1fc
6f9b89c
6864b50
25f7c98
f4b2da3
e96d52d
a5d7782
cfca8f7
ce291d6
b487a5e
d7cf64c
3fc99ea
9be22f0
f2d8924
71fcad8
2690778
be7c000
a4e7b39
1010883
a6a285d
2676928
100c3db
cf06a76
eb1b857
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -41,11 +41,25 @@ same or different weights. | |
less than or equal to all of the other hosts. | ||
* *all weights not equal*: If two or more hosts in the cluster have different load balancing | ||
weights, the load balancer shifts into a mode where it uses a weighted round robin schedule in | ||
which weights are dynamically adjusted based on the host's request load at the time of selection | ||
(weight is divided by the current active request count. For example, a host with weight 2 and an | ||
active request count of 4 will have a synthetic weight of 2 / 4 = 0.5). This algorithm provides | ||
good balance at steady state but may not adapt to load imbalance as quickly. Additionally, unlike | ||
P2C, a host will never truly drain, though it will receive fewer requests over time. | ||
which weights are dynamically adjusted based on the host's request load at the time of selection. | ||
|
||
In this case the weights are calculated at the time a host is picked using the following formula: | ||
|
||
`weight = load_balancing_weight / (active_requests + 1)^active_request_bias`. | ||
|
||
:ref:`active_request_bias<envoy_v3_api_field_config.cluster.v3.Cluster.LeastRequestLbConfig.active_request_bias>` | ||
can be configured via runtime and defaults to 1.0. It must be greater than or equal to 0.0. | ||
|
||
The larger the active request bias is, the more aggressively active requests will lower the | ||
effective weight. | ||
|
||
If `active_request_bias` is set to 0.0, the least request load balancer behaves like the round | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might want to make it clear that this only happens if weights are set for various endpoints. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tonya11en this whole section talks about what happens when weights are not equal (the section section starts with Do you think we should still add a clarification or would it be redundant? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oh, it's redundant |
||
robin load balancer and ignores the active request count at the time of picking. | ||
|
||
For example, if active_request_bias is 1.0, a host with weight 2 and an active request count of 4 | ||
will have an effective weight of 2 / (4 + 1)^1 = 0.4. This algorithm provides good balance at | ||
steady state but may not adapt to load imbalance as quickly. Additionally, unlike P2C, a host will | ||
never truly drain, though it will receive fewer requests over time. | ||
|
||
.. _arch_overview_load_balancing_types_ring_hash: | ||
|
||
|
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -1,6 +1,8 @@ | ||||||
#pragma once | ||||||
|
||||||
#include <cmath> | ||||||
#include <cstdint> | ||||||
#include <memory> | ||||||
#include <queue> | ||||||
#include <set> | ||||||
#include <vector> | ||||||
|
@@ -11,6 +13,7 @@ | |||||
#include "envoy/upstream/upstream.h" | ||||||
|
||||||
#include "common/protobuf/utility.h" | ||||||
#include "common/runtime/runtime_protos.h" | ||||||
#include "common/upstream/edf_scheduler.h" | ||||||
|
||||||
namespace Envoy { | ||||||
|
@@ -367,6 +370,8 @@ class EdfLoadBalancerBase : public ZoneAwareLoadBalancerBase { | |||||
|
||||||
void initialize(); | ||||||
|
||||||
virtual void refresh(uint32_t priority); | ||||||
|
||||||
// Seed to allow us to desynchronize load balancers across a fleet. If we don't | ||||||
// do this, multiple Envoys that receive an update at the same time (or even | ||||||
// multiple load balancers on the same host) will send requests to | ||||||
|
@@ -375,7 +380,6 @@ class EdfLoadBalancerBase : public ZoneAwareLoadBalancerBase { | |||||
const uint64_t seed_; | ||||||
|
||||||
private: | ||||||
void refresh(uint32_t priority); | ||||||
virtual void refreshHostSource(const HostsSource& source) PURE; | ||||||
virtual double hostWeight(const Host& host) PURE; | ||||||
virtual HostConstSharedPtr unweightedHostPick(const HostVector& hosts_to_use, | ||||||
|
@@ -437,7 +441,8 @@ class RoundRobinLoadBalancer : public EdfLoadBalancerBase { | |||||
* The benefit of the Maglev table is at the expense of resolution, memory usage is capped. | ||||||
* Additionally, the Maglev table can be shared amongst all threads. | ||||||
*/ | ||||||
class LeastRequestLoadBalancer : public EdfLoadBalancerBase { | ||||||
class LeastRequestLoadBalancer : public EdfLoadBalancerBase, | ||||||
Logger::Loggable<Logger::Id::upstream> { | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if you save some bytes by using ENVOY_LOG_MISC here rather than adding a second inheritance here, or by putting the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just tried using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. one quick point about that: the EXPECT_MEMORY_EQ macro will be a no-op on Mac. It only runs on linux release builds. There's a bazel flag controlling an ifdef whether to enable that check. EXPECT_MEMORY_LE will run on Mac if tcmalloc is being used, but the memory checking for non-canonical platforms is much looser, and so you might not notice a benefit from dropping the multiple inheritance on Mac. |
||||||
public: | ||||||
LeastRequestLoadBalancer( | ||||||
const PrioritySet& priority_set, const PrioritySet* local_priority_set, ClusterStats& stats, | ||||||
|
@@ -450,26 +455,71 @@ class LeastRequestLoadBalancer : public EdfLoadBalancerBase { | |||||
choice_count_( | ||||||
least_request_config.has_value() | ||||||
? PROTOBUF_GET_WRAPPED_OR_DEFAULT(least_request_config.value(), choice_count, 2) | ||||||
: 2) { | ||||||
: 2), | ||||||
active_request_bias_runtime_( | ||||||
least_request_config.has_value() && least_request_config->has_active_request_bias() | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also test for the value differing from the default? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you mean we should initialize This would prevent users from making the bias overridable via runtime while still defaulting it to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. DIdn't mean to limit functionality. I'm confused though if the StatsIntegrationTest flow (a) creates this LB, which is not the default, and (b) overrides this new field. Otherwise I'd expect the overhead to be no more than 64 bytes. How did we get 256 byte overhead per cluster with the default setup? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, got confused. I think this extra optional field, stored as a unique pointer should only cost 8 bytes per cluster if it's being used. DO we think it's worth investigating why the overhead is 256 per cluster? Maybe we'll decide it's needed but it'd be great to understand why. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The overhead was 256 byes before switching to unique_ptr. Now it's 8 bytes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. github is giving me a confusing view; from https://github.com/envoyproxy/envoy/pull/11252/files this is what I see at the bottom of that web-page: Left-hand-side: Do you see something different? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The overhead is still 256 bytes =/. I just tried making There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So my suspicion is that it's actually creating the optional object for some reason. Maybe it's it worth throwing some log statements or firing up the debugger to check? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jmarantz I reverted the changes to My guess is that the overhead comes from the protobuf message descriptor, but I wouldn't expect each cluster to hold a copy of it. What do you think? |
||||||
? std::make_unique<Runtime::Double>(least_request_config->active_request_bias(), | ||||||
runtime) | ||||||
: nullptr) { | ||||||
initialize(); | ||||||
} | ||||||
|
||||||
protected: | ||||||
void refresh(uint32_t priority) override { | ||||||
active_request_bias_ = | ||||||
active_request_bias_runtime_ != nullptr ? active_request_bias_runtime_->value() : 1.0; | ||||||
|
||||||
if (active_request_bias_ < 0.0) { | ||||||
ENVOY_LOG(warn, "upstream: invalid active request bias supplied (runtime key {}), using 1.0", | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might want to also validate it shows the log message in your unit test. Something like:
If for some reason TSAN builds start failing after you add that test, protect this thing with a mutex and that should fix it: envoy/test/test_common/logging.h Line 61 in 841ad99
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Arguably this could also be a stat. Generally I think we prefer having stats to accompany warnings as logs aren't necessarily tied to operations alerting. OTOH, this particular case is probably going to be pretty rare/unlikely if ever, so I don't think we need to do a stat. |
||||||
active_request_bias_runtime_->runtimeKey()); | ||||||
active_request_bias_ = 1.0; | ||||||
} | ||||||
|
||||||
EdfLoadBalancerBase::refresh(priority); | ||||||
} | ||||||
|
||||||
private: | ||||||
void refreshHostSource(const HostsSource&) override {} | ||||||
double hostWeight(const Host& host) override { | ||||||
// Here we scale host weight by the number of active requests at the time we do the pick. We | ||||||
// always add 1 to avoid division by 0. It might be possible to do better by picking two hosts | ||||||
// off of the schedule, and selecting the one with fewer active requests at the time of | ||||||
// selection. | ||||||
// TODO(mattklein123): @htuch brings up the point that how we are scaling weight here might not | ||||||
// be the only/best way of doing this. Essentially, it makes weight and active requests equally | ||||||
// important. Are they equally important in practice? There is no right answer here and we might | ||||||
// want to iterate on this as we gain more experience. | ||||||
return static_cast<double>(host.weight()) / (host.stats().rq_active_.value() + 1); | ||||||
// This method is called to calculate the dynamic weight as following when all load balancing | ||||||
// weights are not equal: | ||||||
// | ||||||
// `weight = load_balancing_weight / (active_requests + 1)^active_request_bias` | ||||||
// | ||||||
// `active_request_bias` can be configured via runtime and its value is cached in | ||||||
// `active_request_bias_` to avoid having to do a runtime lookup each time a host weight is | ||||||
// calculated. | ||||||
// | ||||||
// When `active_request_bias == 0.0` we behave like `RoundRobinLoadBalancer` and return the | ||||||
// host weight without considering the number of active requests at the time we do the pick. | ||||||
// | ||||||
// When `active_request_bias > 0.0` we scale the host weight by the number of active | ||||||
// requests at the time we do the pick. We always add 1 to avoid division by 0. | ||||||
// | ||||||
// It might be possible to do better by picking two hosts off of the schedule, and selecting the | ||||||
// one with fewer active requests at the time of selection. | ||||||
if (active_request_bias_ == 0.0) { | ||||||
return host.weight(); | ||||||
} | ||||||
|
||||||
if (active_request_bias_ == 1.0) { | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this just an optimization? I'm not a C++ floating point expert, but generally prefer to avoid comparing floats for exact value, since the representation might not allow for precise representation, so comparing with epsilon ranges is better. Probably safe for 1.0 though? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, this is just an optimization to avoid having to call This field isn't set with the result of a calculation, so I don't think there is much risk of float representation problems. I expect most users to not specify any bias, so Users can override this value via runtime, but since no calculation is done, they should be able to easily set it to exactly |
||||||
return static_cast<double>(host.weight()) / (host.stats().rq_active_.value() + 1); | ||||||
} | ||||||
|
||||||
return static_cast<double>(host.weight()) / | ||||||
std::pow(host.stats().rq_active_.value() + 1, active_request_bias_); | ||||||
} | ||||||
HostConstSharedPtr unweightedHostPick(const HostVector& hosts_to_use, | ||||||
const HostsSource& source) override; | ||||||
|
||||||
const uint32_t choice_count_; | ||||||
|
||||||
// The exponent used to calculate host weights can be configured via runtime. We cache it for | ||||||
// performance reasons and refresh it in `LeastRequestLoadBalancer::refresh(uint32_t priority)` | ||||||
// whenever a `HostSet` is updated. | ||||||
double active_request_bias_{}; | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oh there's these 8 bytes too. |
||||||
|
||||||
const std::unique_ptr<Runtime::Double> active_request_bias_runtime_; | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry; late to the party here but would also be happy if this is a follow-up. This is the only additional memory-field right? This is per-cluster and also replicated per-thread (I don't have a clear picture of the topology of the LB structures in my head yet). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jmarantz these are the only two additional fields.
So in theory this change should only use 128 bits of overhead if no bias is specified. However it looks like 256 are being added. I wonder if that might be caused by padding or something like that? |
||||||
}; | ||||||
|
||||||
/** | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this must be the cause of the bloat.
I'm somewhat familiar with the Runtime infrastructure but not deeply so. If we have 10k clusters would you runtime-override the active_request_bias for a subset of them? Or is the runtime override supposed to be a kill-switch to broadly disable a feature?
Maybe we are paying a lot for functionality that we don't need? WDYT?
BTW I'm fine if we want to submit this for now given the benefits, and then consider how we can better represent the flexibility we want from the Runtime system. @alyssawilk would be curious for your take here. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Captured this question in #11872
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if this addresses the problem but I'd encourage using xds for changing per-cluster values rather than using runtime overrides. IMO runtime changes make some sense for global things which you want fast changes on (DoS protection knobs) but something like this could be done by CDS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pushed a commit that uses
double
instead ofRuntimeDouble
to see whether not usingRuntimeDouble
reduces the memory overhead — I'm waiting for the CI results to see how things go on Linux, but it did not help on macOS.As far as I can tell the helper used by the memory usage tests to create clusters doesn't create any instances of
LeastRequestLbConfig
message:envoy/test/integration/stats_integration_test.cc
Lines 173 to 203 in e71e4dc
So I don't understand why the per-cluster memory footprint increases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new CI result shows that the per-cluster memory footprint of the integration tests using
double
is the same as usingRuntimeDouble
.RuntimeDouble
does contain a string (runtime_key
), so I do expect it to use a little more memory for clusters that specify a bias.Making it runtime overridable would make it easier to change the bias without having to update the xDS server. But any of the approaches would be enough to improve the load balancing behavior during squeeze (fka red line) tests at Lyft, so I don't have a very strong opinion towards any of them.
@tonya11en and @mattklein123 said in this Slack thread that they preferred to use
RuntimeDouble
, but we were only considering the potential performance overhead and not the memory overhead.@jmarantz / @alyssawilk: given the new data do you still believe that it would be better not to use
RuntimeDouble
?