-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A34 weighted_round_robin
lb_policy for per endpoint weight from ClusterLoadAssignment
response
#202
Open
yishuT
wants to merge
11
commits into
grpc:master
Choose a base branch
from
yishuT:A34
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 8 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
0b0adb0
Draft
yishuT 519d3ab
add grpc-io thread
yishuT 5a694c0
wrr part
yishuT d1e010e
xDS part
yishuT 311cb4e
description of wrr
yishuT d858803
change file name, and address CR
yishuT fe7b887
add connectivity management section
yishuT 577f84d
add open issues and store weight as uint32
yishuT e8e0c9b
better describe the issue
yishuT 76c0ecd
switch to wrsq
yishuT 45169f7
fix wording
yishuT File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
`edf_weighted_round_robin` lb_policy for per endpoint `load_balancing_weight` from `ClusterLoadAssignment` response | ||
---- | ||
* Author(s): Yi-Shu Tai ([email protected]) | ||
* Approver: markdroth | ||
* Status: In Review | ||
* Implemented in: N/A | ||
* Last updated: 2020-09-20 | ||
* Discussion at: https://groups.google.com/g/grpc-io/c/j76bnPgpHYo | ||
|
||
## Abstract | ||
This proposal is for carrying per endpoint weight in address attribute from [`ClusterLoadAssignment`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint.proto#L34) and introducing `edf_weighted_round_robin` policy based on [earliest deadline first scheduling algorithm](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling) for taking advantage of the information which per endpoint weight provides. | ||
|
||
This proposal is based on [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md). | ||
|
||
## Background | ||
[A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md) describes resolver/LB architecture and xDS client behavior. This proposal specifically extends the behavior of EDS section in [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md). We pass [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) of [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) to `lb_policy` by carrying per endpoint `load_balancing_weight` in per-address attribute. `lb_policy` can make use of the information provided by per endpoint `load_balancing_weight` for better load balancing. | ||
|
||
To best utilize the information, we also propose a new `lb_policy`, `edf_weighted_round_robin` which works on `LbEndpoint`s within same [`LocalityLbEndpoints`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L116). | ||
|
||
This proposal has two parts. The first part is the new `lb_policy`, `edf_weighted_round_robin`. The second part discuss how we handle per endpoint `load_balancing_weight` from `ClusterLoadAssignment` response. | ||
|
||
### Related Proposals: | ||
* [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md). | ||
|
||
## Proposal | ||
The proposal is to carry [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) of [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) from [`ClusterLoadAssignment`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint.proto#L34) response to `lb_policy` and introduce a new `edf_weighted_round_robin` policy based on Earliest deadline first scheduling algorithm picker [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling). Each endpoint will get fraction equal to the [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) for the [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) divided by the sum of the `load_balancing_weight` of all `LbEndpoint` within the same [`LocalityLbEndpoints`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L116) of traffic routed to the locality. | ||
|
||
### Overview of `edf_weighted_round_robin` policy | ||
`edf_weighted_round_robin` distribute traffic to each endpoint in a way that each endpoint will get fraction traffic equal to the weight associated with the endpoint divided by the sum of the weight of all endpoints. The core of `edf_weighted_round_robin` is [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling) picker. | ||
|
||
#### Weight of each endpoint | ||
`edf_weighted_round_robin` needs extra information `weight` of each endpoint. We pass weight to `lb_policy` as per address attribute. | ||
|
||
#### Overview of EDF scheduler | ||
EDF picker maintains a priority queue of `EdfEntry`. The key of the priority queue is `(deadline, order_offset)` pair. On the top of the queue, it’s the entry with lowest `deadline`. `order_offset` is the tie breaker when two entries have same deadline to maintain FIFO order. If there is a tie on deadline of two entries, the one with smaller `order_offset` will have higher priority. | ||
|
||
Proposed `EdfEntry` | ||
``` | ||
struct EdfEntry { | ||
// primary key for the priority queue. The entry with least deadline is the top of the queue. | ||
double deadline; | ||
|
||
// secondary key for the priority queue. Used as a tiebreaker for same deadline to | ||
// maintain FIFO order. If there is a tie on deadline of two entries, the one with | ||
// smaller `order_offset` will have higher priority. `order_offset` is assigned to this | ||
// entry on constructing the priority queue for the first time and it's immutable. | ||
// Also the `order_offset` assigned to entries is strictly increasing, in other words, | ||
// no two entries have same `order_offset`. | ||
uint64 order_offset; | ||
|
||
// `load_balancing_weight` of this endpoint from address attribute of this endpoint. | ||
uint32 weight; | ||
|
||
// Subchannel data structure of this endpoint. | ||
Subchannel subchannel; | ||
} | ||
``` | ||
Initialization | ||
- At the very beginning, `deadline` of an entry `e` is equal to `1.0/e.weight`. | ||
- We assign `order_offset` to each entry while constructing the priority queue. `order_offset` assigned to an entry is distinct and nonnegative integer. During the whole lifecycle of this picker, `order_offset` of an entry is unchanged. | ||
|
||
Pick | ||
- On each call to the `Pick`, EDF picker picks the entry `e` on the top of the queue, returns the subchannel associated with the entry. After that, picker updates the `deadline` of `e` to `e.deadline + 1.0/weight` and either performs a pop and push the entry back to the queue or key increase operation. | ||
|
||
Notes | ||
- If all endpoints have the same `load_balancing_weight`, `EDF` picker degenerates to `round_robin` picker. The order of picked subschannel is purely decided by `order_offset`. It's easier to reason and consistent with envoy. | ||
- Endpoints do not have `load_balancing_weight` is assigned to 1 (the smallest possible weight). This is to be consistent with the [behavior of envoy on missing weight assignment](https://github.com/envoyproxy/envoy/blob/5d95032baa803f853e9120048b56c8be3dab4b0d/source/common/upstream/upstream_impl.cc#L359) | ||
|
||
|
||
#### Subchannel connectivity management | ||
`edf_weighted_round_robin` proactively monitors the connectivity of each subchannel. `edf_weighted_round_robin` always tries to keep one connection open to each address in the address list at all times. When `edf_weighted_round_robin` is first instantiated, it immediately tries to connect to all addresses, and whenever a subchannel becomes disconnected, it immediately tries to reconnect. | ||
|
||
#### Service Config | ||
The service config for the `edf_weighted_round_robin` LB policy is an empty proto message | ||
``` | ||
{ | ||
load_balancing_config: { edf_weighted_round_robin: {}} | ||
} | ||
``` | ||
|
||
### Handling per endpoint `load_balancing_weight` from `ClusterLoadAssignment` response | ||
|
||
This part extends the behavior of EDS section in [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md). Instead of discarding the per endpoint `load_balancing_weight`, we want to add it to per-address attibute and pass it along to `lb_policy`. | ||
|
||
#### `lb_policy` for per endpoint `load_balancing_weight` from `ClusterLoadAssignment` | ||
When the `lb_policy` field in CDS response is `ROUND_ROBIN`, we use `edf_weighted_round_robin` as the `lb_policy`. | ||
|
||
As of today, we only accept `ROUND_ROBIN` as `lb_policy` in CDS response per [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md). Therefore, `edf_weighted_round_robin` will always be used. | ||
|
||
#### On update of `ClusterLoadAssignment` | ||
When an EDS update is received, an update will be sent to the `lb_policy`. The `lb_policy` will create a new picker. This is slightly Different from [Envoy](https://github.com/envoyproxy/envoy/blob/51551ae944c642e6fc61563cbea8653087e70f1f/source/common/upstream/load_balancer_impl.cc#L733-L737). We'd like to udpate EDF priority queue so that new weights applied immediately even endpoints list is not changed. | ||
|
||
#### NOTE | ||
- `edf_weighted_round_robin` should always be updated to the lastest `ClusterLoadAssignment`. It's xDS server's responsibility to maintain consistency. | ||
|
||
## Rationale | ||
|
||
Several applications can be built upon this feature, e.g. utilization load balancing, blackhole erroring endpoints, load testing,... etc. | ||
|
||
The reason to refresh EDF picker even there is only weight change on some endpoints which is different from envoy is because we'd like real time traffic shift for use cases like load testing, blackhole erroring endpoints. | ||
|
||
The reasons to introduce a new algorithm instead of re-using the same algorithm of `weighted_target` policy are | ||
- [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling) maintains FIFO order for endpoints with same weight which is easier to reason. | ||
- We want to be consistent with the behavior of Envoy on how lb_policy picks the backend for sending the request. However, there is a difference between `edf_weighted_round_robin` and `edf_scheduler` of Envoy. `edf_weighted_round_robin` actively monitors the connectivity of each subchannel but `edf_scheduler` of Envoy does not. | ||
|
||
## Implementation | ||
|
||
N/A | ||
|
||
## Open issues (if applicable) | ||
* How to desync to avoid all clients do synchronized pick? | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what this is referring to. Can you explain the problem here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated. Please let me know it's still not clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I understand the problem. I see two possible solutions here:
The simple approach is, as you suggest, randomize the starting point in the list whenever we create a new picker, just like
round_robin
does. This will avoid having all clients hammer the same endpoints at the same time (and also, keep in mind that the control plane could actually send different results to different clients -- each client could see different endpoints or the same endpoints in a different order or with different weights). But it will disrupt the expected scheduling whenever the picker changes.A more complicated solution would be to maintain the current scheduler state in the LB policy and use a mutex to synchronize access to it between the picker and the LB policy. This is more complicated to implement, and the synchronization imposes a performance penalty (because you have to acquire the mutex for every pick), but it should work.