Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Least Request LB active request bias config #11252

Merged
merged 27 commits into from
Jul 15, 2020
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
a1b52fd
Add support for making Least Requests LB behave like Round Robin in w…
gkleiman May 19, 2020
762e61b
Address feedback
gkleiman May 19, 2020
4cf5f19
Perf/logging improvements
gkleiman May 20, 2020
0a2b1fc
Address feedback and cleanup BUILD file
gkleiman May 20, 2020
6f9b89c
Merge branch 'master' into alternative-lr-weights
gkleiman Jun 16, 2020
6864b50
Make active requests exponent configurable via CDS/runtime
gkleiman Jun 11, 2020
25f7c98
Address feedback
gkleiman Jun 22, 2020
f4b2da3
Validate log message
gkleiman Jun 23, 2020
e96d52d
Update cluster memory test golden values
gkleiman Jun 23, 2020
a5d7782
Merge remote-tracking branch 'origin/master' into alternative-lr-weights
gkleiman Jun 24, 2020
cfca8f7
Fix method name
gkleiman Jun 26, 2020
ce291d6
Merge remote-tracking branch 'origin/master' into alternative-lr-weights
gkleiman Jun 26, 2020
b487a5e
Explicitly initialize active_request_bias_
gkleiman Jun 29, 2020
d7cf64c
Merge remote-tracking branch 'origin/master' into alternative-lr-weights
gkleiman Jun 29, 2020
3fc99ea
Try to make clang-tidy happy
gkleiman Jun 29, 2020
9be22f0
Use unique_ptr instead of optional
gkleiman Jun 30, 2020
f2d8924
Merge remote-tracking branch 'origin/master' into alternative-lr-weights
gkleiman Jul 1, 2020
71fcad8
Update stats integration test
gkleiman Jul 1, 2020
2690778
Check whether memory footprint is reduced without LB changes
gkleiman Jul 1, 2020
be7c000
Use plain double for active request bias
gkleiman Jul 6, 2020
a4e7b39
Revert back to approved implementation using RuntimeDouble
gkleiman Jul 7, 2020
1010883
Merge remote-tracking branch 'origin/master' into alternative-lr-weights
gkleiman Jul 7, 2020
a6a285d
Add extra fields to CDS cluster proto to check memory usage
gkleiman Jul 8, 2020
2676928
Revert "Add extra fields to CDS cluster proto to check memory usage"
gkleiman Jul 13, 2020
100c3db
Merge remote-tracking branch 'origin/master' into alternative-lr-weights
gkleiman Jul 13, 2020
cf06a76
Merge remote-tracking branch 'origin/master' into alternative-lr-weights
gkleiman Jul 14, 2020
eb1b857
Add changelog entry
gkleiman Jul 15, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions source/common/upstream/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,7 @@ envoy_cc_library(
"//include/envoy/upstream:upstream_interface",
"//source/common/common:assert_lib",
"//source/common/protobuf:utility_lib",
"//source/common/runtime:runtime_features_lib",
"@envoy_api//envoy/config/cluster/v3:pkg_cc_proto",
],
)
Expand Down
47 changes: 43 additions & 4 deletions source/common/upstream/load_balancer_impl.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
#pragma once

#include <cmath>
#include <cstdint>
#include <queue>
#include <set>
Expand All @@ -16,6 +17,9 @@
namespace Envoy {
namespace Upstream {

static const std::string RuntimeLeastRequestsActiveRequestsExponent =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absl::string_view to avoid static fiasco.

"upstream.least_requests.active_requests_exponent";

// Priority levels and localities are considered overprovisioned with this factor.
static constexpr uint32_t kDefaultOverProvisioningFactor = 140;

Expand Down Expand Up @@ -365,7 +369,9 @@ class EdfLoadBalancerBase : public ZoneAwareLoadBalancerBase {
std::unique_ptr<EdfScheduler<const Host>> edf_;
};

void initialize();
virtual void initialize();

virtual void refresh(uint32_t priority);

// Seed to allow us to desynchronize load balancers across a fleet. If we don't
// do this, multiple Envoys that receive an update at the same time (or even
Expand All @@ -375,7 +381,6 @@ class EdfLoadBalancerBase : public ZoneAwareLoadBalancerBase {
const uint64_t seed_;

private:
void refresh(uint32_t priority);
virtual void refreshHostSource(const HostsSource& source) PURE;
virtual double hostWeight(const Host& host) PURE;
virtual HostConstSharedPtr unweightedHostPick(const HostVector& hosts_to_use,
Expand Down Expand Up @@ -437,7 +442,8 @@ class RoundRobinLoadBalancer : public EdfLoadBalancerBase {
* The benefit of the Maglev table is at the expense of resolution, memory usage is capped.
* Additionally, the Maglev table can be shared amongst all threads.
*/
class LeastRequestLoadBalancer : public EdfLoadBalancerBase {
class LeastRequestLoadBalancer : public EdfLoadBalancerBase,
Logger::Loggable<Logger::Id::upstream> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if you save some bytes by using ENVOY_LOG_MISC here rather than adding a second inheritance here, or by putting the Logger inheritance at the root of this inheritance tree, avoiding multiple-inheritance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried using ENVOY_LOG_MISC and it didn't improve memory usage on macOS =/.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one quick point about that: the EXPECT_MEMORY_EQ macro will be a no-op on Mac. It only runs on linux release builds. There's a bazel flag controlling an ifdef whether to enable that check. EXPECT_MEMORY_LE will run on Mac if tcmalloc is being used, but the memory checking for non-canonical platforms is much looser, and so you might not notice a benefit from dropping the multiple inheritance on Mac.

public:
LeastRequestLoadBalancer(
const PrioritySet& priority_set, const PrioritySet* local_priority_set, ClusterStats& stats,
Expand All @@ -454,6 +460,13 @@ class LeastRequestLoadBalancer : public EdfLoadBalancerBase {
initialize();
}

protected:
void refresh(uint32_t priority) override {
active_requests_exponent_ =
runtime_.snapshot().getDouble(RuntimeLeastRequestsActiveRequestsExponent, 1.0);
EdfLoadBalancerBase::refresh(priority);
}

private:
void refreshHostSource(const HostsSource&) override {}
double hostWeight(const Host& host) override {
Expand All @@ -465,11 +478,37 @@ class LeastRequestLoadBalancer : public EdfLoadBalancerBase {
// be the only/best way of doing this. Essentially, it makes weight and active requests equally
// important. Are they equally important in practice? There is no right answer here and we might
// want to iterate on this as we gain more experience.
return static_cast<double>(host.weight()) / (host.stats().rq_active_.value() + 1);

// TODO(gkleiman): the final PR will not log anything and will return directly instead of using
// a temporary variable.
double weight;

if (active_requests_exponent_ == 0.0) {
weight = host.weight();
} else if (active_requests_exponent_ == 1.0) {
weight = static_cast<double>(host.weight()) / (host.stats().rq_active_.value() + 1);
} else {
weight = static_cast<double>(host.weight()) /
std::pow(host.stats().rq_active_.value() + 1, active_requests_exponent_);
}

ENVOY_LOG(trace, "cluster={} address={} active_requests_exponent={} weight={}",
host.cluster().name(), host.address()->asString(), active_requests_exponent_, weight);

return weight;
}
HostConstSharedPtr unweightedHostPick(const HostVector& hosts_to_use,
const HostsSource& source) override;

const uint32_t choice_count_;

// When hosts have different weights, the host weight is calculated as:
//
// host_weight = (configured_weight / active_requests^k). k is configured via runtime and its
// value is cached to avoid having to do a runtime lookup each time a host weight is generated.
//
// The cached value is refreshed in `LeastRequestLoadBalancer::refresh(uint32_t priority)`.
double active_requests_exponent_;
};

/**
Expand Down
1 change: 1 addition & 0 deletions test/common/upstream/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,7 @@ envoy_cc_test(
"//source/common/upstream:upstream_lib",
"//test/mocks/runtime:runtime_mocks",
"//test/mocks/upstream:upstream_mocks",
"//test/test_common:test_runtime_lib",
"@envoy_api//envoy/config/cluster/v3:pkg_cc_proto",
],
)
Expand Down
25 changes: 25 additions & 0 deletions test/common/upstream/load_balancer_impl_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
#include "test/common/upstream/utility.h"
#include "test/mocks/runtime/mocks.h"
#include "test/mocks/upstream/mocks.h"
#include "test/test_common/test_runtime.h"

#include "gmock/gmock.h"
#include "gtest/gtest.h"
Expand Down Expand Up @@ -1532,6 +1533,30 @@ TEST_P(LeastRequestLoadBalancerTest, WeightImbalance) {
EXPECT_EQ(hostSet().healthy_hosts_[0], lb_.chooseHost(nullptr));
}

TEST_P(LeastRequestLoadBalancerTest, WeightImbalanceWithCustomExponent) {
EXPECT_CALL(runtime_.snapshot_,
getDouble("upstream.least_requests.active_requests_exponent", 1.0))
.WillRepeatedly(Return(0.0));

hostSet().healthy_hosts_ = {makeTestHost(info_, "tcp://127.0.0.1:80", 1),
makeTestHost(info_, "tcp://127.0.0.1:81", 2)};
stats_.max_host_weight_.set(2UL);

hostSet().hosts_ = hostSet().healthy_hosts_;
hostSet().runCallbacks({}, {}); // Trigger callbacks. The added/removed lists are not relevant.

EXPECT_CALL(random_, random()).WillRepeatedly(Return(0));

// We should see 2:1 ratio for hosts[1] to hosts[0], regardless of the active request count.
hostSet().healthy_hosts_[1]->stats().rq_active_.set(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this works. Are the stats affecting the outstanding requests perceived by the LB when the exponent is non-zero?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EXPECT_EQ(hostSet().healthy_hosts_[1], lb_.chooseHost(nullptr));
EXPECT_EQ(hostSet().healthy_hosts_[0], lb_.chooseHost(nullptr));
EXPECT_EQ(hostSet().healthy_hosts_[1], lb_.chooseHost(nullptr));
EXPECT_EQ(hostSet().healthy_hosts_[1], lb_.chooseHost(nullptr));
EXPECT_EQ(hostSet().healthy_hosts_[0], lb_.chooseHost(nullptr));
EXPECT_EQ(hostSet().healthy_hosts_[1], lb_.chooseHost(nullptr));
}

TEST_P(LeastRequestLoadBalancerTest, WeightImbalanceCallbacks) {
hostSet().healthy_hosts_ = {makeTestHost(info_, "tcp://127.0.0.1:80", 1),
makeTestHost(info_, "tcp://127.0.0.1:81", 2)};
Expand Down