-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
extend load balancer context interface to provide override host #17848
Conversation
Signed-off-by: wbpcode <[email protected]>
Signed-off-by: wbpcode <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, flushing out some initial comments.
/wait
envoy/upstream/load_balancer.h
Outdated
using ExpectedHost = std::pair<std::string, ExpectedHostStatus>; | ||
|
||
/** | ||
* Returns the host the load balancer shouled slected directly. If the expected host exists and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Returns the host the load balancer shouled slected directly. If the expected host exists and | |
* Returns the host the load balancer should select directly. If the expected host exists and |
envoy/upstream/load_balancer.h
Outdated
@@ -83,6 +83,16 @@ class LoadBalancerContext { | |||
* Returns the transport socket options which should be applied on upstream connections | |||
*/ | |||
virtual Network::TransportSocketOptionsConstSharedPtr upstreamTransportSocketOptions() const PURE; | |||
|
|||
using ExpectedHostStatus = uint32_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be uin32_t? Can't you just use the health status enum directly? I think I see that you did this because you want to allow multiple status as a logical OR? Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. For the logical OR. 👍
envoy/upstream/load_balancer.h
Outdated
@@ -83,6 +83,16 @@ class LoadBalancerContext { | |||
* Returns the transport socket options which should be applied on upstream connections | |||
*/ | |||
virtual Network::TransportSocketOptionsConstSharedPtr upstreamTransportSocketOptions() const PURE; | |||
|
|||
using ExpectedHostStatus = uint32_t; | |||
using ExpectedHost = std::pair<std::string, ExpectedHostStatus>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call this OverrideHost
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok 👌 . It will make the code more uniform.
// ThreadAwareLoadBalancerBase inherits LoadBalancerBase, but in fact the `chooseHost` method | ||
// should never be called. | ||
HostConstSharedPtr chooseHost(LoadBalancerContext*) override { NOT_IMPLEMENTED_GCOVR_EXCL_LINE; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have gone wrong somewhere with the interfaces if we have to do this. Can you figure out a different way of expressing this hierarchy? I'm also confused why this is now needed given no interface changes that I can see?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is code is not actually not directly related to what the PR hopes to do.
I just happened to find that ThreadAwareLoadBalancerBase
inherited LoadBalancerBase and its chooseHost
method. But in fact, chooseHost
of ThreadAwareLoadBalancerBase
should never be used.
So this additional override method was added to ensure that no one would abuse ThreadAwareLoadBalancerBase
.
There is no problem at all to remove this code, because chooseHostOnce
can also achieve similar constraints. 🤔
@@ -1,5 +1,6 @@ | |||
#include "source/common/upstream/thread_aware_lb_impl.h" | |||
|
|||
#include <atomic> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. This is added by the clangd. Maybe it's because I tried atomic_store before.
@@ -141,6 +142,12 @@ ThreadAwareLoadBalancerBase::LoadBalancerImpl::chooseHost(LoadBalancerContext* c | |||
return nullptr; | |||
} | |||
|
|||
HostConstSharedPtr host; | |||
host = LoadBalancerBase::selectOverrideHost(cross_priority_host_map_.get(), context); | |||
if (host != nullptr && !context->shouldSelectAnotherHost(*host)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think mixing the override and the "should select another host" concept is worth the complexity. If someone sets an override host I would just use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This design was originally designed to correctly select a new host when a request is retried. 🤔 But I think you are right. When retrying, LoadBalancerContext should set the override host to null.
source/common/upstream/subset_lb.cc
Outdated
@@ -276,6 +279,12 @@ void SubsetLoadBalancer::initSelectorFallbackSubset( | |||
} | |||
|
|||
HostConstSharedPtr SubsetLoadBalancer::chooseHost(LoadBalancerContext* context) { | |||
HostConstSharedPtr override_host = | |||
LoadBalancerBase::selectOverrideHost(cross_priority_host_map_.get(), context); | |||
if (override_host != nullptr && !context->shouldSelectAnotherHost(*override_host)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about shouldSelectAnotherHost
@@ -507,8 +515,76 @@ bool ZoneAwareLoadBalancerBase::earlyExitNonLocalityRouting() { | |||
return false; | |||
} | |||
|
|||
LoadBalancerContext::ExpectedHostStatus LoadBalancerContextBase::createExpectedHostStatus( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is only used in test code, please move it to test code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep this is only be used in the test code for now. But it would be a useful util function in the next PR.
We can jut move it to test code temporary and move it back when we actually use it.
HostConstSharedPtr LoadBalancerBase::chooseHost(LoadBalancerContext* context) { | ||
HostConstSharedPtr host; | ||
|
||
host = selectOverrideHost(cross_priority_host_map_.get(), context); | ||
if (host != nullptr && !context->shouldSelectAnotherHost(*host)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about shouldSelectAnotherHost
// This leads to the possibility of simultaneous reading and writing of cross_priority_host_map_ | ||
// in different threads. For this reason, an additional mutex is necessary to guard | ||
// cross_priority_host_map_. | ||
absl::Mutex cross_priority_host_map_mutex_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not crazy about having this mutex be away from the underlying variable so we can't use thread annotations. One option here would be to not have cross_priority_host_map_ in the load balancer base and have it in all derived classes? What do you think? Another option I suppose would be to just move the mutex to the base class and have it always be accessed there. Not sure how often that would create a perf issue or not (maybe not really ever if using read/write mutex).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I will re-thinking about it. I currently prefer to place the cross_priority_host_map_ in derived classes. Because in addition to ThreadAwareLoadBalancerBase, no one else has to pay any performance overhead (or even a simple R/W Lock).
Signed-off-by: wbpcode <[email protected]>
Signed-off-by: wbpcode <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this LGTM with a small doc comment. Nice work. Please double check that you have test coverage on all of the new functionality across all of the different LB types. (You can check the coverage report in CI.)
@snowp can you take a pass also?
/wait
@@ -83,6 +83,16 @@ class LoadBalancerContext { | |||
* Returns the transport socket options which should be applied on upstream connections | |||
*/ | |||
virtual Network::TransportSocketOptionsConstSharedPtr upstreamTransportSocketOptions() const PURE; | |||
|
|||
using OverrideHostStatus = uint32_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still needs documentation on how the bitfield works. It's not clear from reading this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get it.
Signed-off-by: wbpcode <[email protected]>
/retest |
Retrying Azure Pipelines: |
Signed-off-by: wbpcode <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks LGTM pending @snowp review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this looks pretty good to me, just a few comments
} | ||
|
||
auto host_iter = host_map->find(override_host.value().first); | ||
HostConstSharedPtr host = host_iter != host_map->end() ? host_iter->second : nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think doing an early return here and breaking up L549 into two if checks would make it a bit easier to follow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get it.
if (host_map == nullptr) { | ||
return nullptr; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would this be nullptr? Don't we always have a host_map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Before the first priority update, it is nullptr.
- The host maps of all sub-LBs of subset lb are nullptr.
So I think keeping this check will make the code safer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we document this via a comment somewhere? Would be good to let people reading the code understand the need for a null host_map, thanks!
@@ -40,24 +40,7 @@ class LoadBalancerBase : public LoadBalancer { | |||
choosePriority(uint64_t hash, const HealthyLoad& healthy_per_priority_load, | |||
const DegradedLoad& degraded_per_priority_load); | |||
|
|||
HostConstSharedPtr chooseHost(LoadBalancerContext* context) override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so I understand, why did you move this down in the class hierarchy? Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This question is a bit more complicated.
Both ThreadAwareLoadBalancerBase
and ZoneAwareLoadBalancerBase
need a cross priority host map. But the use of the two is completely different. The former needs to ensure that the host map is thread-safe, while the latter does not need to consider this issue at all.
This led me to finally decide to add a member of cross_priority_host_map_ to ThreadAwareLoadBalancerBase
and ZoneAwareLoadBalancerBase
separately, instead of adding a member of cross_priority_host_map_ to the common base class LoadBalancerBase
of the two.
But this brings a new problem, that is, the chooseHost
method in LoadBalancerBase
needs to access cross_priority_host_map_ to search for override host.
Although I know that the base class can access the members of the derived class through virtual functions, but there is no need to be so complicated.
The chooseHost
method in LoadBalancerBase
is only meaningful to ZoneAwareLoadBalancerBase
and ZoneAwareLoadBalancerBase
's derived classes, so there is no need to implement it in LoadBalancerBase
.
I finally moved chooseHost
to ZoneAwareLoadBalancerBase
, so that it will not affect the function, but also can directly access cross_priority_host_map_ in chooseHost
, and it is simpler and cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining! This makes sense to me
HostConstSharedPtr host; | ||
if (host = LoadBalancerContextBase::selectOverrideHost(cross_priority_host_map_.get(), context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would either initialize host outside the if-block, or move the declaration inside. Right now I'm not seeing much benefit to this pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get it.
static HostConstSharedPtr selectOverrideHost(const HostMap* host_map, | ||
LoadBalancerContext* context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it makes sense to have this be on the LoadBalancerBase, it just seems a tad bit odd to have a static function on a class that accepts a pointer to said class. Not a big deal if you prefer it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't found out a better position to place these util functions. I have no particular insistence on this, I think we can move it as needed in the future.
priority_set_.cross_priority_host_map_ = host_map; | ||
host_set_.runCallbacks({}, {}); | ||
|
||
// Expected host is not exist in the host map and then `chooseHostOnce` will be called. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd do this as // The expected host does not exist in the host map, therefore ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
priority_set_.cross_priority_host_map_ = host_map; | ||
host_set_.runCallbacks({}, {}); | ||
|
||
// Host status does not match the expected host status and then `chooseHostOnce` will be called. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then -> therefore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
@@ -1904,6 +2025,135 @@ TEST(LoadBalancerSubsetInfoImplTest, KeysSubsetEqualKeysInvalid) { | |||
"fallback_keys_subset cannot be equal to keys"); | |||
} | |||
|
|||
TEST(LoadBalancerContextBaseTest, LoadBalancerContextBaseTest) { | |||
{ | |||
auto context = LoadBalancerContextBase(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LoadBalancerContextBase context;
would be more idiomatic here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get it.
static constexpr uint32_t DegradedStatus = 1u << static_cast<size_t>(Host::Health::Degraded); | ||
static constexpr uint32_t HealthyStatus = 1u << static_cast<size_t>(Host::Health::Healthy); | ||
|
||
LoadBalancerContext::OverrideHostStatus createOverrideHostStatus( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something but this doesn't seem to be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check it. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is used by some new added test case.
{
Protobuf::RepeatedPtrField<envoy::config::core::v3::HealthStatus> statuses;
statuses.Add(envoy::config::core::v3::HealthStatus::UNKNOWN);
statuses.Add(envoy::config::core::v3::HealthStatus::HEALTHY);
EXPECT_EQ(createOverrideHostStatus(statuses), HealthyStatus);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those tests seem to be testing createOverrideHostStatus
itself, can we just remove it nothing is actually using it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha, you are right. 🤣
Signed-off-by: wbpcode <[email protected]>
/retest |
Retrying Azure Pipelines: |
Signed-off-by: wbpcode <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
…host (envoyproxy#17848) Signed-off-by: wbpcode <[email protected]> Signed-off-by: code <[email protected]>
Introduce a new stateful session extension. This is a change after #17848 #17290. In #17290, we added a new cross-priority host map for fast host searching. In #17848, we extend `LoadBalancerContext` interface to provide an override host and to select the upstream host by the override host. Finally, in this change, we expand a new API to allow users to extract the state of the session from the request and change the result of load balancing by setting the override host value. Related doc: https://docs.google.com/document/d/1IU4b76AgOXijNa4sew1gfBfSiOMbZNiEt5Dhis8QpYg/edit?usp=sharing. Risk Level: Medium Testing: Added Docs Changes: Added Release Notes: Added Platform-Specific Features: N/A. Signed-off-by: wbpcode <[email protected]>
Introduce a new stateful session extension. This is a change after envoyproxy#17848 envoyproxy#17290. In envoyproxy#17290, we added a new cross-priority host map for fast host searching. In envoyproxy#17848, we extend `LoadBalancerContext` interface to provide an override host and to select the upstream host by the override host. Finally, in this change, we expand a new API to allow users to extract the state of the session from the request and change the result of load balancing by setting the override host value. Related doc: https://docs.google.com/document/d/1IU4b76AgOXijNa4sew1gfBfSiOMbZNiEt5Dhis8QpYg/edit?usp=sharing. Risk Level: Medium Testing: Added Docs Changes: Added Release Notes: Added Platform-Specific Features: N/A. Signed-off-by: wbpcode <[email protected]> Signed-off-by: Josh Perry <[email protected]>
Signed-off-by: wbpcode [email protected]
Commit Message: extend load balancer context interface to provide override host
Additional Description:
Continuation of #17290. Part of #16698. In the #17290, we implement a cross priority host map for fast host searching and add it to all types of Cluster. This PR extends load balancer context interface to provider override host. Then the load balancer can select host based on this override host directly and bypass algorithm selection.
Get more detailed design of stateful session sticky by ref https://docs.google.com/document/d/1IU4b76AgOXijNa4sew1gfBfSiOMbZNiEt5Dhis8QpYg/edit?usp=sharing
Risk Level: Mid.
Testing: Add.
Docs Changes: N/A.
Release Notes: N/A.
Platform Specific Features: N/A.