Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extend load balancer context interface to provide override host #17848

Merged
merged 8 commits into from
Sep 8, 2021

Conversation

wbpcode
Copy link
Member

@wbpcode wbpcode commented Aug 25, 2021

Signed-off-by: wbpcode [email protected]

Commit Message: extend load balancer context interface to provide override host
Additional Description:

Continuation of #17290. Part of #16698. In the #17290, we implement a cross priority host map for fast host searching and add it to all types of Cluster. This PR extends load balancer context interface to provider override host. Then the load balancer can select host based on this override host directly and bypass algorithm selection.

Get more detailed design of stateful session sticky by ref https://docs.google.com/document/d/1IU4b76AgOXijNa4sew1gfBfSiOMbZNiEt5Dhis8QpYg/edit?usp=sharing

Risk Level: Mid.
Testing: Add.
Docs Changes: N/A.
Release Notes: N/A.
Platform Specific Features: N/A.

@wbpcode wbpcode requested a review from snowp as a code owner August 25, 2021 16:29
@mattklein123 mattklein123 self-assigned this Aug 25, 2021
Signed-off-by: wbpcode <[email protected]>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, flushing out some initial comments.

/wait

using ExpectedHost = std::pair<std::string, ExpectedHostStatus>;

/**
* Returns the host the load balancer shouled slected directly. If the expected host exists and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Returns the host the load balancer shouled slected directly. If the expected host exists and
* Returns the host the load balancer should select directly. If the expected host exists and

@@ -83,6 +83,16 @@ class LoadBalancerContext {
* Returns the transport socket options which should be applied on upstream connections
*/
virtual Network::TransportSocketOptionsConstSharedPtr upstreamTransportSocketOptions() const PURE;

using ExpectedHostStatus = uint32_t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be uin32_t? Can't you just use the health status enum directly? I think I see that you did this because you want to allow multiple status as a logical OR? Is that right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. For the logical OR. 👍

@@ -83,6 +83,16 @@ class LoadBalancerContext {
* Returns the transport socket options which should be applied on upstream connections
*/
virtual Network::TransportSocketOptionsConstSharedPtr upstreamTransportSocketOptions() const PURE;

using ExpectedHostStatus = uint32_t;
using ExpectedHost = std::pair<std::string, ExpectedHostStatus>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call this OverrideHost

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok 👌 . It will make the code more uniform.

Comment on lines 90 to 92
// ThreadAwareLoadBalancerBase inherits LoadBalancerBase, but in fact the `chooseHost` method
// should never be called.
HostConstSharedPtr chooseHost(LoadBalancerContext*) override { NOT_IMPLEMENTED_GCOVR_EXCL_LINE; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have gone wrong somewhere with the interfaces if we have to do this. Can you figure out a different way of expressing this hierarchy? I'm also confused why this is now needed given no interface changes that I can see?

Copy link
Member Author

@wbpcode wbpcode Aug 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is code is not actually not directly related to what the PR hopes to do.

I just happened to find that ThreadAwareLoadBalancerBase inherited LoadBalancerBase and its chooseHost method. But in fact, chooseHost of ThreadAwareLoadBalancerBase should never be used.

So this additional override method was added to ensure that no one would abuse ThreadAwareLoadBalancerBase.

There is no problem at all to remove this code, because chooseHostOnce can also achieve similar constraints. 🤔

@@ -1,5 +1,6 @@
#include "source/common/upstream/thread_aware_lb_impl.h"

#include <atomic>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This is added by the clangd. Maybe it's because I tried atomic_store before.

@@ -141,6 +142,12 @@ ThreadAwareLoadBalancerBase::LoadBalancerImpl::chooseHost(LoadBalancerContext* c
return nullptr;
}

HostConstSharedPtr host;
host = LoadBalancerBase::selectOverrideHost(cross_priority_host_map_.get(), context);
if (host != nullptr && !context->shouldSelectAnotherHost(*host)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think mixing the override and the "should select another host" concept is worth the complexity. If someone sets an override host I would just use it.

Copy link
Member Author

@wbpcode wbpcode Aug 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This design was originally designed to correctly select a new host when a request is retried. 🤔 But I think you are right. When retrying, LoadBalancerContext should set the override host to null.

@@ -276,6 +279,12 @@ void SubsetLoadBalancer::initSelectorFallbackSubset(
}

HostConstSharedPtr SubsetLoadBalancer::chooseHost(LoadBalancerContext* context) {
HostConstSharedPtr override_host =
LoadBalancerBase::selectOverrideHost(cross_priority_host_map_.get(), context);
if (override_host != nullptr && !context->shouldSelectAnotherHost(*override_host)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about shouldSelectAnotherHost

@@ -507,8 +515,76 @@ bool ZoneAwareLoadBalancerBase::earlyExitNonLocalityRouting() {
return false;
}

LoadBalancerContext::ExpectedHostStatus LoadBalancerContextBase::createExpectedHostStatus(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is only used in test code, please move it to test code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this is only be used in the test code for now. But it would be a useful util function in the next PR.
We can jut move it to test code temporary and move it back when we actually use it.

HostConstSharedPtr LoadBalancerBase::chooseHost(LoadBalancerContext* context) {
HostConstSharedPtr host;

host = selectOverrideHost(cross_priority_host_map_.get(), context);
if (host != nullptr && !context->shouldSelectAnotherHost(*host)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about shouldSelectAnotherHost

// This leads to the possibility of simultaneous reading and writing of cross_priority_host_map_
// in different threads. For this reason, an additional mutex is necessary to guard
// cross_priority_host_map_.
absl::Mutex cross_priority_host_map_mutex_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not crazy about having this mutex be away from the underlying variable so we can't use thread annotations. One option here would be to not have cross_priority_host_map_ in the load balancer base and have it in all derived classes? What do you think? Another option I suppose would be to just move the mutex to the base class and have it always be accessed there. Not sure how often that would create a perf issue or not (maybe not really ever if using read/write mutex).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I will re-thinking about it. I currently prefer to place the cross_priority_host_map_ in derived classes. Because in addition to ThreadAwareLoadBalancerBase, no one else has to pay any performance overhead (or even a simple R/W Lock).

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this LGTM with a small doc comment. Nice work. Please double check that you have test coverage on all of the new functionality across all of the different LB types. (You can check the coverage report in CI.)

@snowp can you take a pass also?

/wait

@@ -83,6 +83,16 @@ class LoadBalancerContext {
* Returns the transport socket options which should be applied on upstream connections
*/
virtual Network::TransportSocketOptionsConstSharedPtr upstreamTransportSocketOptions() const PURE;

using OverrideHostStatus = uint32_t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs documentation on how the bitfield works. It's not clear from reading this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get it.

Signed-off-by: wbpcode <[email protected]>
@wbpcode
Copy link
Member Author

wbpcode commented Sep 1, 2021

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #17848 (comment) was created by @wbpcode.

see: more, trace.

Signed-off-by: wbpcode <[email protected]>
mattklein123
mattklein123 previously approved these changes Sep 1, 2021
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks LGTM pending @snowp review.

Copy link
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this looks pretty good to me, just a few comments

}

auto host_iter = host_map->find(override_host.value().first);
HostConstSharedPtr host = host_iter != host_map->end() ? host_iter->second : nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think doing an early return here and breaking up L549 into two if checks would make it a bit easier to follow

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get it.

Comment on lines +542 to +545
if (host_map == nullptr) {
return nullptr;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would this be nullptr? Don't we always have a host_map?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Before the first priority update, it is nullptr.
  2. The host maps of all sub-LBs of subset lb are nullptr.

So I think keeping this check will make the code safer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we document this via a comment somewhere? Would be good to let people reading the code understand the need for a null host_map, thanks!

@@ -40,24 +40,7 @@ class LoadBalancerBase : public LoadBalancer {
choosePriority(uint64_t hash, const HealthyLoad& healthy_per_priority_load,
const DegradedLoad& degraded_per_priority_load);

HostConstSharedPtr chooseHost(LoadBalancerContext* context) override;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so I understand, why did you move this down in the class hierarchy? Thanks

Copy link
Member Author

@wbpcode wbpcode Sep 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This question is a bit more complicated.

Both ThreadAwareLoadBalancerBase and ZoneAwareLoadBalancerBase need a cross priority host map. But the use of the two is completely different. The former needs to ensure that the host map is thread-safe, while the latter does not need to consider this issue at all.

This led me to finally decide to add a member of cross_priority_host_map_ to ThreadAwareLoadBalancerBase and ZoneAwareLoadBalancerBase separately, instead of adding a member of cross_priority_host_map_ to the common base class LoadBalancerBase of the two.

But this brings a new problem, that is, the chooseHost method in LoadBalancerBase needs to access cross_priority_host_map_ to search for override host.
Although I know that the base class can access the members of the derived class through virtual functions, but there is no need to be so complicated.

The chooseHost method in LoadBalancerBase is only meaningful to ZoneAwareLoadBalancerBase and ZoneAwareLoadBalancerBase's derived classes, so there is no need to implement it in LoadBalancerBase.

I finally moved chooseHost to ZoneAwareLoadBalancerBase, so that it will not affect the function, but also can directly access cross_priority_host_map_ in chooseHost, and it is simpler and cleaner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining! This makes sense to me

Comment on lines 147 to 148
HostConstSharedPtr host;
if (host = LoadBalancerContextBase::selectOverrideHost(cross_priority_host_map_.get(), context);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would either initialize host outside the if-block, or move the declaration inside. Right now I'm not seeing much benefit to this pattern

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get it.

Comment on lines +154 to +155
static HostConstSharedPtr selectOverrideHost(const HostMap* host_map,
LoadBalancerContext* context);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it makes sense to have this be on the LoadBalancerBase, it just seems a tad bit odd to have a static function on a class that accepts a pointer to said class. Not a big deal if you prefer it here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't found out a better position to place these util functions. I have no particular insistence on this, I think we can move it as needed in the future.

priority_set_.cross_priority_host_map_ = host_map;
host_set_.runCallbacks({}, {});

// Expected host is not exist in the host map and then `chooseHostOnce` will be called.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd do this as // The expected host does not exist in the host map, therefore ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

priority_set_.cross_priority_host_map_ = host_map;
host_set_.runCallbacks({}, {});

// Host status does not match the expected host status and then `chooseHostOnce` will be called.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then -> therefore

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@@ -1904,6 +2025,135 @@ TEST(LoadBalancerSubsetInfoImplTest, KeysSubsetEqualKeysInvalid) {
"fallback_keys_subset cannot be equal to keys");
}

TEST(LoadBalancerContextBaseTest, LoadBalancerContextBaseTest) {
{
auto context = LoadBalancerContextBase();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LoadBalancerContextBase context; would be more idiomatic here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get it.

static constexpr uint32_t DegradedStatus = 1u << static_cast<size_t>(Host::Health::Degraded);
static constexpr uint32_t HealthyStatus = 1u << static_cast<size_t>(Host::Health::Healthy);

LoadBalancerContext::OverrideHostStatus createOverrideHostStatus(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something but this doesn't seem to be used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check it. Thanks.

Copy link
Member Author

@wbpcode wbpcode Sep 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used by some new added test case.

  {
    Protobuf::RepeatedPtrField<envoy::config::core::v3::HealthStatus> statuses;
    statuses.Add(envoy::config::core::v3::HealthStatus::UNKNOWN);
    statuses.Add(envoy::config::core::v3::HealthStatus::HEALTHY);
    EXPECT_EQ(createOverrideHostStatus(statuses), HealthyStatus);
  }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those tests seem to be testing createOverrideHostStatus itself, can we just remove it nothing is actually using it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, you are right. 🤣

Signed-off-by: wbpcode <[email protected]>
@wbpcode
Copy link
Member Author

wbpcode commented Sep 3, 2021

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #17848 (comment) was created by @wbpcode.

see: more, trace.

@snowp snowp added the waiting label Sep 3, 2021
Signed-off-by: wbpcode <[email protected]>
@wbpcode wbpcode requested a review from snowp September 8, 2021 01:35
Copy link
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@snowp snowp merged commit 86bf636 into envoyproxy:main Sep 8, 2021
tyxia pushed a commit to tyxia/envoy that referenced this pull request Sep 21, 2021
dio pushed a commit that referenced this pull request Jan 12, 2022
Introduce a new stateful session extension. This is a change after #17848 #17290. 

In #17290, we added a new cross-priority host map for fast host searching. In #17848, we extend `LoadBalancerContext` interface to provide an override host and to select the upstream host by the override host.

Finally, in this change, we expand a new API to allow users to extract the state of the session from the request and change the result of load balancing by setting the override host value.

Related doc: https://docs.google.com/document/d/1IU4b76AgOXijNa4sew1gfBfSiOMbZNiEt5Dhis8QpYg/edit?usp=sharing.

Risk Level: Medium
Testing: Added
Docs Changes: Added
Release Notes: Added
Platform-Specific Features: N/A.

Signed-off-by: wbpcode <[email protected]>
joshperry pushed a commit to joshperry/envoy that referenced this pull request Feb 13, 2022
Introduce a new stateful session extension. This is a change after envoyproxy#17848 envoyproxy#17290. 

In envoyproxy#17290, we added a new cross-priority host map for fast host searching. In envoyproxy#17848, we extend `LoadBalancerContext` interface to provide an override host and to select the upstream host by the override host.

Finally, in this change, we expand a new API to allow users to extract the state of the session from the request and change the result of load balancing by setting the override host value.

Related doc: https://docs.google.com/document/d/1IU4b76AgOXijNa4sew1gfBfSiOMbZNiEt5Dhis8QpYg/edit?usp=sharing.

Risk Level: Medium
Testing: Added
Docs Changes: Added
Release Notes: Added
Platform-Specific Features: N/A.

Signed-off-by: wbpcode <[email protected]>
Signed-off-by: Josh Perry <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants