Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Least Request LB active request bias config #11252

Merged
merged 27 commits into from
Jul 15, 2020
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
a1b52fd
Add support for making Least Requests LB behave like Round Robin in w…
gkleiman May 19, 2020
762e61b
Address feedback
gkleiman May 19, 2020
4cf5f19
Perf/logging improvements
gkleiman May 20, 2020
0a2b1fc
Address feedback and cleanup BUILD file
gkleiman May 20, 2020
6f9b89c
Merge branch 'master' into alternative-lr-weights
gkleiman Jun 16, 2020
6864b50
Make active requests exponent configurable via CDS/runtime
gkleiman Jun 11, 2020
25f7c98
Address feedback
gkleiman Jun 22, 2020
f4b2da3
Validate log message
gkleiman Jun 23, 2020
e96d52d
Update cluster memory test golden values
gkleiman Jun 23, 2020
a5d7782
Merge remote-tracking branch 'origin/master' into alternative-lr-weights
gkleiman Jun 24, 2020
cfca8f7
Fix method name
gkleiman Jun 26, 2020
ce291d6
Merge remote-tracking branch 'origin/master' into alternative-lr-weights
gkleiman Jun 26, 2020
b487a5e
Explicitly initialize active_request_bias_
gkleiman Jun 29, 2020
d7cf64c
Merge remote-tracking branch 'origin/master' into alternative-lr-weights
gkleiman Jun 29, 2020
3fc99ea
Try to make clang-tidy happy
gkleiman Jun 29, 2020
9be22f0
Use unique_ptr instead of optional
gkleiman Jun 30, 2020
f2d8924
Merge remote-tracking branch 'origin/master' into alternative-lr-weights
gkleiman Jul 1, 2020
71fcad8
Update stats integration test
gkleiman Jul 1, 2020
2690778
Check whether memory footprint is reduced without LB changes
gkleiman Jul 1, 2020
be7c000
Use plain double for active request bias
gkleiman Jul 6, 2020
a4e7b39
Revert back to approved implementation using RuntimeDouble
gkleiman Jul 7, 2020
1010883
Merge remote-tracking branch 'origin/master' into alternative-lr-weights
gkleiman Jul 7, 2020
a6a285d
Add extra fields to CDS cluster proto to check memory usage
gkleiman Jul 8, 2020
2676928
Revert "Add extra fields to CDS cluster proto to check memory usage"
gkleiman Jul 13, 2020
100c3db
Merge remote-tracking branch 'origin/master' into alternative-lr-weights
gkleiman Jul 13, 2020
cf06a76
Merge remote-tracking branch 'origin/master' into alternative-lr-weights
gkleiman Jul 14, 2020
eb1b857
Add changelog entry
gkleiman Jul 15, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions api/envoy/config/cluster/v3/cluster.proto
Original file line number Diff line number Diff line change
Expand Up @@ -319,6 +319,31 @@ message Cluster {
// The number of random healthy hosts from which the host with the fewest active requests will
// be chosen. Defaults to 2 so that we perform two-choice selection if the field is not set.
google.protobuf.UInt32Value choice_count = 1 [(validate.rules).uint32 = {gte: 2}];

// The following formula is used to calculate the dynamic weights when hosts have different load
// balancing weights:
//
// `weight = load_balancing_weight / (active_requests + 1)^active_request_bias`
//
// The larger the active request bias is, the more aggressively active requests will lower the
// effective weight when all host weights are not equal.
//
// `active_request_bias` must be greater than or equal to 0.0.
//
// When `active_request_bias == 0.0` the Least Request Load Balancer doesn't consider the number
// of active requests at the time it picks a host and behaves like the Round Robin Load
// Balancer.
//
// When `active_request_bias > 0.0` the Least Request Load Balancer scales the load balancing
// weight by the number of active requests at the time it does a pick.
//
// The value is cached for performance reasons and refreshed whenever one of the Load Balancer's
// host sets changes, e.g., whenever there is a host membership update or a host load balancing
// weight change.
//
// .. note::
// This setting only takes effect if all host weights are not equal.
core.v3.RuntimeDouble active_request_bias = 2;
Copy link
Contributor

@jmarantz jmarantz Jul 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this must be the cause of the bloat.

I'm somewhat familiar with the Runtime infrastructure but not deeply so. If we have 10k clusters would you runtime-override the active_request_bias for a subset of them? Or is the runtime override supposed to be a kill-switch to broadly disable a feature?

Maybe we are paying a lot for functionality that we don't need? WDYT?

BTW I'm fine if we want to submit this for now given the benefits, and then consider how we can better represent the flexibility we want from the Runtime system. @alyssawilk would be curious for your take here. Thank you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Captured this question in #11872

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this addresses the problem but I'd encourage using xds for changing per-cluster values rather than using runtime overrides. IMO runtime changes make some sense for global things which you want fast changes on (DoS protection knobs) but something like this could be done by CDS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed a commit that uses double instead of RuntimeDouble to see whether not using RuntimeDouble reduces the memory overhead — I'm waiting for the CI results to see how things go on Linux, but it did not help on macOS.

As far as I can tell the helper used by the memory usage tests to create clusters doesn't create any instances of LeastRequestLbConfig message:

size_t clusterMemoryHelper(int num_clusters, int num_hosts, bool allow_stats) {
Stats::TestUtil::MemoryTest memory_test;
config_helper_.addConfigModifier([&](envoy::config::bootstrap::v3::Bootstrap& bootstrap) {
if (!allow_stats) {
bootstrap.mutable_stats_config()->mutable_stats_matcher()->set_reject_all(true);
}
for (int i = 1; i < num_clusters; ++i) {
auto* cluster = bootstrap.mutable_static_resources()->add_clusters();
cluster->set_name(absl::StrCat("cluster_", i));
}
for (int i = 0; i < num_clusters; ++i) {
auto* cluster = bootstrap.mutable_static_resources()->mutable_clusters(i);
for (int j = 0; j < num_hosts; ++j) {
auto* host = cluster->mutable_load_assignment()
->mutable_endpoints(0)
->add_lb_endpoints()
->mutable_endpoint()
->mutable_address();
auto* socket_address = host->mutable_socket_address();
socket_address->set_protocol(envoy::config::core::v3::SocketAddress::TCP);
socket_address->set_address("0.0.0.0");
socket_address->set_port_value(80);
}
}
});
initialize();
return memory_test.consumedBytes();
}
};

So I don't understand why the per-cluster memory footprint increases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new CI result shows that the per-cluster memory footprint of the integration tests using double is the same as using RuntimeDouble.

RuntimeDouble does contain a string (runtime_key), so I do expect it to use a little more memory for clusters that specify a bias.

Making it runtime overridable would make it easier to change the bias without having to update the xDS server. But any of the approaches would be enough to improve the load balancing behavior during squeeze (fka red line) tests at Lyft, so I don't have a very strong opinion towards any of them.

@tonya11en and @mattklein123 said in this Slack thread that they preferred to use RuntimeDouble, but we were only considering the potential performance overhead and not the memory overhead.

@jmarantz / @alyssawilk: given the new data do you still believe that it would be better not to use RuntimeDouble?

}

// Specific configuration for the :ref:`RingHash<arch_overview_load_balancing_types_ring_hash>`
Expand Down
25 changes: 25 additions & 0 deletions api/envoy/config/cluster/v4alpha/cluster.proto

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -41,11 +41,25 @@ same or different weights.
less than or equal to all of the other hosts.
* *all weights not equal*: If two or more hosts in the cluster have different load balancing
weights, the load balancer shifts into a mode where it uses a weighted round robin schedule in
which weights are dynamically adjusted based on the host's request load at the time of selection
(weight is divided by the current active request count. For example, a host with weight 2 and an
active request count of 4 will have a synthetic weight of 2 / 4 = 0.5). This algorithm provides
good balance at steady state but may not adapt to load imbalance as quickly. Additionally, unlike
P2C, a host will never truly drain, though it will receive fewer requests over time.
which weights are dynamically adjusted based on the host's request load at the time of selection.

In this case the weights are calculated at the time a host is picked using the following formula:

`weight = load_balancing_weight / (active_requests + 1)^active_request_bias`.

:ref:`active_request_bias<envoy_v3_api_field_config.cluster.v3.Cluster.LeastRequestLbConfig.active_request_bias>`
can be configured via runtime and defaults to 1.0. It must be greater than or equal to 0.0.

The larger the active request bias is, the more aggressively active requests will lower the
effective weight.

If `active_request_bias` is set to 0.0, the least request load balancer behaves like the round
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to make it clear that this only happens if weights are set for various endpoints.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tonya11en this whole section talks about what happens when weights are not equal (the section section starts with *all weights not equal*:).

Do you think we should still add a clarification or would it be redundant?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, it's redundant

robin load balancer and ignores the active request count at the time of picking.

For example, if active_request_bias is 1.0, a host with weight 2 and an active request count of 4
will have an effective weight of 2 / (4 + 1)^1 = 0.4. This algorithm provides good balance at
steady state but may not adapt to load imbalance as quickly. Additionally, unlike P2C, a host will
never truly drain, though it will receive fewer requests over time.

.. _arch_overview_load_balancing_types_ring_hash:

Expand Down
25 changes: 25 additions & 0 deletions generated_api_shadow/envoy/config/cluster/v3/cluster.proto

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

25 changes: 25 additions & 0 deletions generated_api_shadow/envoy/config/cluster/v4alpha/cluster.proto

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions source/common/runtime/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ envoy_cc_library(
],
deps = [
"//include/envoy/runtime:runtime_interface",
"//source/common/protobuf:utility_lib",
"@envoy_api//envoy/config/core/v3:pkg_cc_proto",
"@envoy_api//envoy/type/v3:pkg_cc_proto",
],
Expand Down
2 changes: 2 additions & 0 deletions source/common/runtime/runtime_protos.h
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ class Double {
: runtime_key_(double_proto.runtime_key()), default_value_(double_proto.default_value()),
runtime_(runtime) {}

const std::string& runtimeKey() const { return runtime_key_; }

double value() const { return runtime_.snapshot().getDouble(runtime_key_, default_value_); }

private:
Expand Down
1 change: 1 addition & 0 deletions source/common/upstream/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,7 @@ envoy_cc_library(
"//include/envoy/upstream:upstream_interface",
"//source/common/common:assert_lib",
"//source/common/protobuf:utility_lib",
"//source/common/runtime:runtime_protos_lib",
"@envoy_api//envoy/config/cluster/v3:pkg_cc_proto",
],
)
Expand Down
74 changes: 62 additions & 12 deletions source/common/upstream/load_balancer_impl.h
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
#pragma once

#include <cmath>
#include <cstdint>
#include <memory>
#include <queue>
#include <set>
#include <vector>
Expand All @@ -11,6 +13,7 @@
#include "envoy/upstream/upstream.h"

#include "common/protobuf/utility.h"
#include "common/runtime/runtime_protos.h"
#include "common/upstream/edf_scheduler.h"

namespace Envoy {
Expand Down Expand Up @@ -367,6 +370,8 @@ class EdfLoadBalancerBase : public ZoneAwareLoadBalancerBase {

void initialize();

virtual void refresh(uint32_t priority);

// Seed to allow us to desynchronize load balancers across a fleet. If we don't
// do this, multiple Envoys that receive an update at the same time (or even
// multiple load balancers on the same host) will send requests to
Expand All @@ -375,7 +380,6 @@ class EdfLoadBalancerBase : public ZoneAwareLoadBalancerBase {
const uint64_t seed_;

private:
void refresh(uint32_t priority);
virtual void refreshHostSource(const HostsSource& source) PURE;
virtual double hostWeight(const Host& host) PURE;
virtual HostConstSharedPtr unweightedHostPick(const HostVector& hosts_to_use,
Expand Down Expand Up @@ -437,7 +441,8 @@ class RoundRobinLoadBalancer : public EdfLoadBalancerBase {
* The benefit of the Maglev table is at the expense of resolution, memory usage is capped.
* Additionally, the Maglev table can be shared amongst all threads.
*/
class LeastRequestLoadBalancer : public EdfLoadBalancerBase {
class LeastRequestLoadBalancer : public EdfLoadBalancerBase,
Logger::Loggable<Logger::Id::upstream> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if you save some bytes by using ENVOY_LOG_MISC here rather than adding a second inheritance here, or by putting the Logger inheritance at the root of this inheritance tree, avoiding multiple-inheritance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried using ENVOY_LOG_MISC and it didn't improve memory usage on macOS =/.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one quick point about that: the EXPECT_MEMORY_EQ macro will be a no-op on Mac. It only runs on linux release builds. There's a bazel flag controlling an ifdef whether to enable that check. EXPECT_MEMORY_LE will run on Mac if tcmalloc is being used, but the memory checking for non-canonical platforms is much looser, and so you might not notice a benefit from dropping the multiple inheritance on Mac.

public:
LeastRequestLoadBalancer(
const PrioritySet& priority_set, const PrioritySet* local_priority_set, ClusterStats& stats,
Expand All @@ -450,26 +455,71 @@ class LeastRequestLoadBalancer : public EdfLoadBalancerBase {
choice_count_(
least_request_config.has_value()
? PROTOBUF_GET_WRAPPED_OR_DEFAULT(least_request_config.value(), choice_count, 2)
: 2) {
: 2),
active_request_bias_runtime_(
least_request_config.has_value() && least_request_config->has_active_request_bias()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also test for the value differing from the default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean we should initialize active_request_bias_runtime_ with nullptr if least_request_config->active_request_bias()->default_value() == 1.0?

This would prevent users from making the bias overridable via runtime while still defaulting it to 1.0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DIdn't mean to limit functionality. I'm confused though if the StatsIntegrationTest flow (a) creates this LB, which is not the default, and (b) overrides this new field. Otherwise I'd expect the overhead to be no more than 64 bytes. How did we get 256 byte overhead per cluster with the default setup?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, got confused. I think this extra optional field, stored as a unique pointer should only cost 8 bytes per cluster if it's being used. DO we think it's worth investigating why the overhead is 256 per cluster? Maybe we'll decide it's needed but it'd be great to understand why.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overhead was 256 byes before switching to unique_ptr. Now it's 8 bytes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

github is giving me a confusing view; from https://github.com/envoyproxy/envoy/pull/11252/files this is what I see at the bottom of that web-page:

Left-hand-side: EXPECT_MEMORY_EQ(m_per_cluster, 36603);
right-hand side: EXPECT_MEMORY_EQ(m_per_cluster, 36859);

Do you see something different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overhead is still 256 bytes =/.

I just tried making LeastRequestLoadBalancer use ENVOY_LOG_MISC and not inherit from Logger::Loggable, but it didn't lead to any memory savings on macOS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my suspicion is that it's actually creating the optional object for some reason. Maybe it's it worth throwing some log statements or firing up the debugger to check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmarantz I reverted the changes to LeastRquestLoadBalancer changes but kept the new proto field, and the integration test still complains about those extra 256 bytes.

My guess is that the overhead comes from the protobuf message descriptor, but I wouldn't expect each cluster to hold a copy of it. What do you think?

? std::make_unique<Runtime::Double>(least_request_config->active_request_bias(),
runtime)
: nullptr) {
initialize();
}

protected:
void refresh(uint32_t priority) override {
active_request_bias_ =
active_request_bias_runtime_ != nullptr ? active_request_bias_runtime_->value() : 1.0;

if (active_request_bias_ < 0.0) {
ENVOY_LOG(warn, "upstream: invalid active request bias supplied (runtime key {}), using 1.0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to also validate it shows the log message in your unit test. Something like:

EXPECT_LOG_CONTAINS("warn", "Using deprecated XXX extension name 'deprecated' for 'canonical'.",

If for some reason TSAN builds start failing after you add that test, protect this thing with a mutex and that should fix it:

std::vector<std::string> messages_;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably this could also be a stat. Generally I think we prefer having stats to accompany warnings as logs aren't necessarily tied to operations alerting. OTOH, this particular case is probably going to be pretty rare/unlikely if ever, so I don't think we need to do a stat.

active_request_bias_runtime_->runtimeKey());
active_request_bias_ = 1.0;
}

EdfLoadBalancerBase::refresh(priority);
}

private:
void refreshHostSource(const HostsSource&) override {}
double hostWeight(const Host& host) override {
// Here we scale host weight by the number of active requests at the time we do the pick. We
// always add 1 to avoid division by 0. It might be possible to do better by picking two hosts
// off of the schedule, and selecting the one with fewer active requests at the time of
// selection.
// TODO(mattklein123): @htuch brings up the point that how we are scaling weight here might not
// be the only/best way of doing this. Essentially, it makes weight and active requests equally
// important. Are they equally important in practice? There is no right answer here and we might
// want to iterate on this as we gain more experience.
return static_cast<double>(host.weight()) / (host.stats().rq_active_.value() + 1);
// This method is called to calculate the dynamic weight as following when all load balancing
// weights are not equal:
//
// `weight = load_balancing_weight / (active_requests + 1)^active_request_bias`
//
// `active_request_bias` can be configured via runtime and its value is cached in
// `active_request_bias_` to avoid having to do a runtime lookup each time a host weight is
// calculated.
//
// When `active_request_bias == 0.0` we behave like `RoundRobinLoadBalancer` and return the
// host weight without considering the number of active requests at the time we do the pick.
//
// When `active_request_bias > 0.0` we scale the host weight by the number of active
// requests at the time we do the pick. We always add 1 to avoid division by 0.
//
// It might be possible to do better by picking two hosts off of the schedule, and selecting the
// one with fewer active requests at the time of selection.
if (active_request_bias_ == 0.0) {
return host.weight();
}

if (active_request_bias_ == 1.0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just an optimization? I'm not a C++ floating point expert, but generally prefer to avoid comparing floats for exact value, since the representation might not allow for precise representation, so comparing with epsilon ranges is better. Probably safe for 1.0 though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is just an optimization to avoid having to call std::pow() for the default value of active_request_bias.

This field isn't set with the result of a calculation, so I don't think there is much risk of float representation problems.

I expect most users to not specify any bias, so LeastRequestLoadBalancer::refresh(uint32_t) will keep the current behavior and set it to exactly 1.0, in which case this optimization would be used.

Users can override this value via runtime, but since no calculation is done, they should be able to easily set it to exactly 1.0.

return static_cast<double>(host.weight()) / (host.stats().rq_active_.value() + 1);
}

return static_cast<double>(host.weight()) /
std::pow(host.stats().rq_active_.value() + 1, active_request_bias_);
}
HostConstSharedPtr unweightedHostPick(const HostVector& hosts_to_use,
const HostsSource& source) override;

const uint32_t choice_count_;

// The exponent used to calculate host weights can be configured via runtime. We cache it for
// performance reasons and refresh it in `LeastRequestLoadBalancer::refresh(uint32_t priority)`
// whenever a `HostSet` is updated.
double active_request_bias_{};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh there's these 8 bytes too.


const std::unique_ptr<Runtime::Double> active_request_bias_runtime_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry; late to the party here but would also be happy if this is a follow-up. This is the only additional memory-field right? This is per-cluster and also replicated per-thread (I don't have a clear picture of the topology of the LB structures in my head yet).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmarantz these are the only two additional fields.

active_request_bias_ will always add 64 bits regardless of whether LeastRequestLbConfig::active_request_bias is set or not.

active_request_bias_runtime_ should be nullptr unless LeastRequestLbConfig::active_request_bias is set.

So in theory this change should only use 128 bits of overhead if no bias is specified. However it looks like 256 are being added. I wonder if that might be caused by padding or something like that?

};

/**
Expand Down
2 changes: 2 additions & 0 deletions test/common/upstream/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,8 @@ envoy_cc_test(
"//source/common/upstream:upstream_lib",
"//test/mocks/runtime:runtime_mocks",
"//test/mocks/upstream:upstream_mocks",
"//test/test_common:logging_lib",
"//test/test_common:test_runtime_lib",
"@envoy_api//envoy/config/cluster/v3:pkg_cc_proto",
],
)
Expand Down
Loading