From 0b0adb05affbf9a0aad26dfed3ab71faf4411e7f Mon Sep 17 00:00:00 2001 From: Yi-Shu Tai Date: Wed, 5 Aug 2020 21:57:27 -0700 Subject: [PATCH 01/11] Draft --- A34-weighted-round-robin.md | 111 ++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 A34-weighted-round-robin.md diff --git a/A34-weighted-round-robin.md b/A34-weighted-round-robin.md new file mode 100644 index 000000000..17c477bea --- /dev/null +++ b/A34-weighted-round-robin.md @@ -0,0 +1,111 @@ +`weighted_round_robin` lb_policy for per endpoint weight from `ClusterLoadAssignment` response +---- +* Author(s): Yi-Shu Tai (echo80313@gmail.com) +* Approver: a11r, Mark D. Roth (roth@google.com) +* Status: Draft +* Implemented in: N/A +* Last updated: 2020-08-16 +* Discussion at: (filled after thread exists) + +## Abstract + +The proposal introduces `weighted_round_robin` policy based on [earliest deadline first scheduling algorithm](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling) for per lb_endpoint weight from [`ClusterLoadAssignment`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint.proto#L34). + +This proposal is based on [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md). + +## Background + +[A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md) describes resolver/LB architecture and xDS client behavior. This proposal specifically extends the behavior of EDS to take [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) of [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) into account, and perform `weighted_round_robin` policy on `LbEndpoint`s within same [`LocalityLbEndpoints`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L116). + + +### Related Proposals: +* [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md). + +## Proposal + +The proposal is to carry [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) of [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) from [`ClusterLoadAssignment`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint.proto#L34) response to `lb_policy` and implement a `weighted_round_robin` policy based on Earliest deadline first scheduling algorithm picker [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling). Each endpoint will get fraction equal to the [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) for the [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) divided by the sum of the `load_balancing_weight` of all `LbEndpoint` within the same [`LocalityLbEndpoints`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L116) of traffic routed to the locality. + +### Overview of `weighted_round_robin` policy + +`weighted_round_robin` policy is powered by [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling) picker. EDF picker maintains a priority queue of `EdfEntry`. On the top of the queue, it’s the entry with lowest `deadline`. `order_offset` is the tie breaker when two entries have same deadline to maintain FIFO order. + +Proposed `EdfEntry` +``` +struct EdfEntry { + // primary key for the priority queue. The entry with least deadline is the top of the queue. + double deadline; + // secondary key for the priority queue. Used as a tiebreaker for same deadline to maintain FIFO order. + uint64 order_offset; + // `load_balancing_weight` of this endpoint from the latest `ClusterLoadAssignment`. + double weight; + // Subchannel data structure of this endpoint. + Subchannel subchannel; +} +``` + +- On each call to the `Pick`, EDF picker picks the entry `e` on the top of the queue, returns the subchannel associated with the entry. After that, picker updates the `deadline` of `e` to `e.deadline + 1/weight` and either performs a pop and push the entry back to the queue or key increase operation. +- `weighted_round_robin` updates the entries in EDF priority queue on any change of subchannel `ConnnectiveState` or endpoint `load_balancing_weight`. +- If all endpoints have the same `load_balancing_weight`, `EDF` picker degenerates to `round_robin` picker. It's easier to reason and consistent with envoy. +- Endpoints do not have `load_balancing_weight` assigned are discarded. +- `weighted_round_robin` should always be updated to the lastest `ClusterLoadAssignment`. It's xDS server's responsibility to maintain consistency. + +#### EDF picker interface +``` +/* +Add new SubChannel to EDF picker. + +weighted_round_robin sees a new SubChannel in the most recent ClusterLoadAssignment or ConnectivityState of SubChannel is READY again. +*/ +Add(SubChannel, weight) + +/* +Remove Subchannel from EDF picker. + +Subchannel is removed from the most recent ClusterLoadAssignment comparing to last ClusterLoadAssignment or +ConnectivityState of SubChannel becomes not READY. +*/ +Remove(SubChannel) + +/* +Update the weight of SubChannel + +There is weight change on Subchannel in the most recent ClusterLoadAssignment comparing to last ClusterLoadAssignment +*/ +Update(SubChannel, newWeight) + +/* +`Pick` picks the EdfEntry on top of the queue, returns the underlying SubChannel and updates deadline of the EdfEntry. +*/ +PickResult Pick(PickArgs args) +``` + +### On new `ClusterLoadAssignment` + +#### endpoints have weight change +- Update EDF priority queue by updating entries have weight change. +- Different from [Envoy](https://github.com/envoyproxy/envoy/blob/51551ae944c642e6fc61563cbea8653087e70f1f/source/common/upstream/load_balancer_impl.cc#L733-L737), we'd like to udpate EDF priority queue so that new weights applied immediately even endpoints list is not changed. + +#### endpoints list change +- Udpate EDF priority queue by adding/removing entries to reflect the change immediately. + +### On endpoint ConnectiveState Update +- Udpate EDF priority queue by adding/removing entries to reflect the change immediately. + +## Rationale + +Several applications can be built upon this feature, e.g. utilization load balancing, blackhole erroring endpoints, load testing,... etc. + +The reason to refresh EDF picker even there is only weight change on some endpoints which is different from envoy is because we'd like real time traffic shift for use cases like load testing, blackhole erroring endpoints. + +The reasons to introduce a new algorithm instead of using existing `weighted_target` policy are +- [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling) maintains FIFO order for endpoints with same weight which is easier to reason. +- We want to be consistent with the behavior of Envoy. + +## Implementation + +N/A + +## Open issues (if applicable) + +- Replace `round_robin` with `weighted_round_robin`? +- Following first issue, if not, do we want to use `weighted_round_robin` as default lb_policy for eDS response? \ No newline at end of file From 519d3ab2411d594c575b9b37e4d236e4ac46a2ba Mon Sep 17 00:00:00 2001 From: Yi-Shu Tai Date: Sun, 16 Aug 2020 22:41:54 -0700 Subject: [PATCH 02/11] add grpc-io thread --- A34-weighted-round-robin.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/A34-weighted-round-robin.md b/A34-weighted-round-robin.md index 17c477bea..f2815d550 100644 --- a/A34-weighted-round-robin.md +++ b/A34-weighted-round-robin.md @@ -5,7 +5,7 @@ * Status: Draft * Implemented in: N/A * Last updated: 2020-08-16 -* Discussion at: (filled after thread exists) +* Discussion at: https://groups.google.com/g/grpc-io/c/j76bnPgpHYo ## Abstract From 5a694c0e4cde36d6cf189107a8b9251d1187c1dc Mon Sep 17 00:00:00 2001 From: Yi-Shu Tai Date: Sun, 20 Sep 2020 22:34:40 -0700 Subject: [PATCH 03/11] wrr part --- A34-weighted-round-robin.md | 91 ++++++++++++++++--------------------- 1 file changed, 38 insertions(+), 53 deletions(-) diff --git a/A34-weighted-round-robin.md b/A34-weighted-round-robin.md index f2815d550..4740e62c5 100644 --- a/A34-weighted-round-robin.md +++ b/A34-weighted-round-robin.md @@ -1,21 +1,22 @@ `weighted_round_robin` lb_policy for per endpoint weight from `ClusterLoadAssignment` response ---- * Author(s): Yi-Shu Tai (echo80313@gmail.com) -* Approver: a11r, Mark D. Roth (roth@google.com) -* Status: Draft +* Approver: a11r, markdroth +* Status: In Review * Implemented in: N/A -* Last updated: 2020-08-16 +* Last updated: 2020-09-20 * Discussion at: https://groups.google.com/g/grpc-io/c/j76bnPgpHYo ## Abstract - -The proposal introduces `weighted_round_robin` policy based on [earliest deadline first scheduling algorithm](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling) for per lb_endpoint weight from [`ClusterLoadAssignment`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint.proto#L34). +This proposal is for carrying per endpoint weight in address attribute from [`ClusterLoadAssignment`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint.proto#L34) and introducing `weighted_round_robin` policy based on [earliest deadline first scheduling algorithm](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling) for taking advantage of the information which per endpoint weight provides. This proposal is based on [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md). ## Background -[A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md) describes resolver/LB architecture and xDS client behavior. This proposal specifically extends the behavior of EDS to take [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) of [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) into account, and perform `weighted_round_robin` policy on `LbEndpoint`s within same [`LocalityLbEndpoints`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L116). +[A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md) describes resolver/LB architecture and xDS client behavior. This proposal specifically extends the behavior of EDS to pass [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) of [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) in per-address attribute to `lb_policy`, so that `lb_policy` can make use of the information for better load balancing. + +To best utilize the information, we also propose a new `lb_policy`, `weighted_round_robin` which works on `LbEndpoint`s within same [`LocalityLbEndpoints`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L116). ### Related Proposals: @@ -26,70 +27,57 @@ This proposal is based on [A27: xDS-Based Global Load Balancing](https://github. The proposal is to carry [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) of [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) from [`ClusterLoadAssignment`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint.proto#L34) response to `lb_policy` and implement a `weighted_round_robin` policy based on Earliest deadline first scheduling algorithm picker [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling). Each endpoint will get fraction equal to the [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) for the [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) divided by the sum of the `load_balancing_weight` of all `LbEndpoint` within the same [`LocalityLbEndpoints`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L116) of traffic routed to the locality. ### Overview of `weighted_round_robin` policy +The core of `weighted_round_robin` is [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling) picker. -`weighted_round_robin` policy is powered by [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling) picker. EDF picker maintains a priority queue of `EdfEntry`. On the top of the queue, it’s the entry with lowest `deadline`. `order_offset` is the tie breaker when two entries have same deadline to maintain FIFO order. +#### Overview of EDF scheduler +EDF picker maintains a priority queue of `EdfEntry`. The key of the priority queue is `(deadline, order_offset)` pair. On the top of the queue, it’s the entry with lowest `deadline`. `order_offset` is the tie breaker when two entries have same deadline to maintain FIFO order. If there is a tie on deadline of two entries, the one with smaller `order_offset` will have higher priority. Proposed `EdfEntry` ``` struct EdfEntry { // primary key for the priority queue. The entry with least deadline is the top of the queue. double deadline; - // secondary key for the priority queue. Used as a tiebreaker for same deadline to maintain FIFO order. + + // secondary key for the priority queue. Used as a tiebreaker for same deadline to + // maintain FIFO order. If there is a tie on deadline of two entries, the one with + // smaller `order_offset` will have higher priority. `order_offset` is assigned to this + // entry on constructing the priority queue for the first time and it's immutable. + // Also the `order_offset` assigned to entries is strictly increasing, in other words, + // no two entries have same `order_offset`. uint64 order_offset; - // `load_balancing_weight` of this endpoint from the latest `ClusterLoadAssignment`. + + // `load_balancing_weight` of this endpoint from address attribute of this endpoint. double weight; + // Subchannel data structure of this endpoint. Subchannel subchannel; } ``` +Initialization +- At the very beginning, `deadline` of an entry `e` is equal to `1/e.weight`. +- We assign `order_offset` to each entry while constructing the priority queue. `order_offset` assigned to an entry is distinct and nonnegative interger. During the whole lifecycle of this picker, `order_offset` of an entry is unchanged. +Pick - On each call to the `Pick`, EDF picker picks the entry `e` on the top of the queue, returns the subchannel associated with the entry. After that, picker updates the `deadline` of `e` to `e.deadline + 1/weight` and either performs a pop and push the entry back to the queue or key increase operation. -- `weighted_round_robin` updates the entries in EDF priority queue on any change of subchannel `ConnnectiveState` or endpoint `load_balancing_weight`. -- If all endpoints have the same `load_balancing_weight`, `EDF` picker degenerates to `round_robin` picker. It's easier to reason and consistent with envoy. -- Endpoints do not have `load_balancing_weight` assigned are discarded. -- `weighted_round_robin` should always be updated to the lastest `ClusterLoadAssignment`. It's xDS server's responsibility to maintain consistency. - -#### EDF picker interface -``` -/* -Add new SubChannel to EDF picker. - -weighted_round_robin sees a new SubChannel in the most recent ClusterLoadAssignment or ConnectivityState of SubChannel is READY again. -*/ -Add(SubChannel, weight) -/* -Remove Subchannel from EDF picker. +Notes +- If all endpoints have the same `load_balancing_weight`, `EDF` picker degenerates to `round_robin` picker. The order of picked subschannel is purely decided by `order_offset`. It's easier to reason and consistent with envoy. +- Endpoints do not have `load_balancing_weight` is assigned to 1 (the smallest possible weight). This is to be consistent with the [behavior of envoy on missing weight assignment](https://github.com/envoyproxy/envoy/blob/5d95032baa803f853e9120048b56c8be3dab4b0d/source/common/upstream/upstream_impl.cc#L359) -Subchannel is removed from the most recent ClusterLoadAssignment comparing to last ClusterLoadAssignment or -ConnectivityState of SubChannel becomes not READY. -*/ -Remove(SubChannel) - -/* -Update the weight of SubChannel - -There is weight change on Subchannel in the most recent ClusterLoadAssignment comparing to last ClusterLoadAssignment -*/ -Update(SubChannel, newWeight) - -/* -`Pick` picks the EdfEntry on top of the queue, returns the underlying SubChannel and updates deadline of the EdfEntry. -*/ -PickResult Pick(PickArgs args) +#### Service Config +The service config for `weighted_round_robin` is very similar to `round_robin` +``` +{ + load_balancing_config: { weighted_round_robin: {}} +} ``` -### On new `ClusterLoadAssignment` - -#### endpoints have weight change -- Update EDF priority queue by updating entries have weight change. -- Different from [Envoy](https://github.com/envoyproxy/envoy/blob/51551ae944c642e6fc61563cbea8653087e70f1f/source/common/upstream/load_balancer_impl.cc#L733-L737), we'd like to udpate EDF priority queue so that new weights applied immediately even endpoints list is not changed. +### On update of `ClusterLoadAssignment` +When an EDS update is received, an update will be sent to the `lb_policy`. The `lb_policy` will create a new picker. This is slightly Different from [Envoy](https://github.com/envoyproxy/envoy/blob/51551ae944c642e6fc61563cbea8653087e70f1f/source/common/upstream/load_balancer_impl.cc#L733-L737). We'd like to udpate EDF priority queue so that new weights applied immediately even endpoints list is not changed. -#### endpoints list change -- Udpate EDF priority queue by adding/removing entries to reflect the change immediately. -### On endpoint ConnectiveState Update -- Udpate EDF priority queue by adding/removing entries to reflect the change immediately. +### NOTE +- `weighted_round_robin` should always be updated to the lastest `ClusterLoadAssignment`. It's xDS server's responsibility to maintain consistency. ## Rationale @@ -97,7 +85,7 @@ Several applications can be built upon this feature, e.g. utilization load balan The reason to refresh EDF picker even there is only weight change on some endpoints which is different from envoy is because we'd like real time traffic shift for use cases like load testing, blackhole erroring endpoints. -The reasons to introduce a new algorithm instead of using existing `weighted_target` policy are +The reasons to introduce a new algorithm instead of re-using the same algorithm of `weighted_target` policy are - [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling) maintains FIFO order for endpoints with same weight which is easier to reason. - We want to be consistent with the behavior of Envoy. @@ -106,6 +94,3 @@ The reasons to introduce a new algorithm instead of using existing `weighted_tar N/A ## Open issues (if applicable) - -- Replace `round_robin` with `weighted_round_robin`? -- Following first issue, if not, do we want to use `weighted_round_robin` as default lb_policy for eDS response? \ No newline at end of file From d1e010e50fbc9c59676e78d2d1bf1b0936d74157 Mon Sep 17 00:00:00 2001 From: Yi-Shu Tai Date: Sun, 20 Sep 2020 23:15:37 -0700 Subject: [PATCH 04/11] xDS part --- A34-weighted-round-robin.md | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/A34-weighted-round-robin.md b/A34-weighted-round-robin.md index 4740e62c5..aa4571554 100644 --- a/A34-weighted-round-robin.md +++ b/A34-weighted-round-robin.md @@ -1,4 +1,4 @@ -`weighted_round_robin` lb_policy for per endpoint weight from `ClusterLoadAssignment` response +`weighted_round_robin` lb_policy for per endpoint `load_balancing_weight` from `ClusterLoadAssignment` response ---- * Author(s): Yi-Shu Tai (echo80313@gmail.com) * Approver: a11r, markdroth @@ -13,18 +13,17 @@ This proposal is for carrying per endpoint weight in address attribute from [`Cl This proposal is based on [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md). ## Background - -[A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md) describes resolver/LB architecture and xDS client behavior. This proposal specifically extends the behavior of EDS to pass [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) of [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) in per-address attribute to `lb_policy`, so that `lb_policy` can make use of the information for better load balancing. +[A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md) describes resolver/LB architecture and xDS client behavior. This proposal specifically extends the behavior of EDS section in [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md). We pass [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) of [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) to `lb_policy` by carrying per endpoint `load_balancing_weight` in per-address attribute. `lb_policy` can make use of the information provided by per endpoint `load_balancing_weight` for better load balancing. To best utilize the information, we also propose a new `lb_policy`, `weighted_round_robin` which works on `LbEndpoint`s within same [`LocalityLbEndpoints`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L116). +This proposal has two parts. The first part is the new `lb_policy`, `weighted_round_robin`. The second part discuss how we handle per endpoint `load_balancing_weight` from `ClusterLoadAssignment` response. ### Related Proposals: * [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md). ## Proposal - -The proposal is to carry [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) of [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) from [`ClusterLoadAssignment`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint.proto#L34) response to `lb_policy` and implement a `weighted_round_robin` policy based on Earliest deadline first scheduling algorithm picker [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling). Each endpoint will get fraction equal to the [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) for the [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) divided by the sum of the `load_balancing_weight` of all `LbEndpoint` within the same [`LocalityLbEndpoints`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L116) of traffic routed to the locality. +The proposal is to carry [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) of [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) from [`ClusterLoadAssignment`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint.proto#L34) response to `lb_policy` and introduce a new `weighted_round_robin` policy based on Earliest deadline first scheduling algorithm picker [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling). Each endpoint will get fraction equal to the [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) for the [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) divided by the sum of the `load_balancing_weight` of all `LbEndpoint` within the same [`LocalityLbEndpoints`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L116) of traffic routed to the locality. ### Overview of `weighted_round_robin` policy The core of `weighted_round_robin` is [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling) picker. @@ -72,11 +71,19 @@ The service config for `weighted_round_robin` is very similar to `round_robin` } ``` -### On update of `ClusterLoadAssignment` -When an EDS update is received, an update will be sent to the `lb_policy`. The `lb_policy` will create a new picker. This is slightly Different from [Envoy](https://github.com/envoyproxy/envoy/blob/51551ae944c642e6fc61563cbea8653087e70f1f/source/common/upstream/load_balancer_impl.cc#L733-L737). We'd like to udpate EDF priority queue so that new weights applied immediately even endpoints list is not changed. +### Handling per endpoint `load_balancing_weight` from `ClusterLoadAssignment` response + +This part extends the behavior of EDS section in [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md). Instead of discarding the per endpoint `load_balancing_weight`, we want to add it to per-address attibute and pass it along to `lb_policy`. +#### `lb_policy` for per endpoint `load_balancing_weight` from `ClusterLoadAssignment` +When the `lb_policy` field in CDS response is `ROUND_ROBIN`, we use `weighted_round_robin` as the `lb_policy`. + +As of today, we only accept `ROUND_ROBIN` as `lb_policy` in CDS response per [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md). Therefore, `weighted_round_robin` will always be used. + +#### On update of `ClusterLoadAssignment` +When an EDS update is received, an update will be sent to the `lb_policy`. The `lb_policy` will create a new picker. This is slightly Different from [Envoy](https://github.com/envoyproxy/envoy/blob/51551ae944c642e6fc61563cbea8653087e70f1f/source/common/upstream/load_balancer_impl.cc#L733-L737). We'd like to udpate EDF priority queue so that new weights applied immediately even endpoints list is not changed. -### NOTE +#### NOTE - `weighted_round_robin` should always be updated to the lastest `ClusterLoadAssignment`. It's xDS server's responsibility to maintain consistency. ## Rationale @@ -94,3 +101,4 @@ The reasons to introduce a new algorithm instead of re-using the same algorithm N/A ## Open issues (if applicable) +- Do we need a way to opt out WRR even per endpoint weight is assigned by ClusterLoadAssignment? \ No newline at end of file From 311cb4e8300458ff90aef07798ddbf1aec14ac44 Mon Sep 17 00:00:00 2001 From: Yi-Shu Tai Date: Mon, 21 Sep 2020 00:01:05 -0700 Subject: [PATCH 05/11] description of wrr --- A34-weighted-round-robin.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/A34-weighted-round-robin.md b/A34-weighted-round-robin.md index aa4571554..083d1e208 100644 --- a/A34-weighted-round-robin.md +++ b/A34-weighted-round-robin.md @@ -26,7 +26,10 @@ This proposal has two parts. The first part is the new `lb_policy`, `weighted_ro The proposal is to carry [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) of [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) from [`ClusterLoadAssignment`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint.proto#L34) response to `lb_policy` and introduce a new `weighted_round_robin` policy based on Earliest deadline first scheduling algorithm picker [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling). Each endpoint will get fraction equal to the [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) for the [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) divided by the sum of the `load_balancing_weight` of all `LbEndpoint` within the same [`LocalityLbEndpoints`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L116) of traffic routed to the locality. ### Overview of `weighted_round_robin` policy -The core of `weighted_round_robin` is [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling) picker. +`weighted_round_robin` distribute traffic to each endpoint in a way that each endpoint will get fraction traffic equal to the weight associated with the endpoint divided by the sum of the weight of all endpoints. The core of `weighted_round_robin` is [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling) picker. + +#### Weight of each endpoint +`weighted_round_robin` needs extra information `weight` of each endpoint. We pass weight to `lb_policy` as per address attribute. #### Overview of EDF scheduler EDF picker maintains a priority queue of `EdfEntry`. The key of the priority queue is `(deadline, order_offset)` pair. On the top of the queue, it’s the entry with lowest `deadline`. `order_offset` is the tie breaker when two entries have same deadline to maintain FIFO order. If there is a tie on deadline of two entries, the one with smaller `order_offset` will have higher priority. From d8588032b2c34175441c25da84b037c05c9f2668 Mon Sep 17 00:00:00 2001 From: Yi-Shu Tai Date: Sun, 8 Nov 2020 21:53:01 -0800 Subject: [PATCH 06/11] change file name, and address CR --- ...obin.md => A34-edf-weighted-round-robin.md | 30 +++++++++---------- 1 file changed, 15 insertions(+), 15 deletions(-) rename A34-weighted-round-robin.md => A34-edf-weighted-round-robin.md (72%) diff --git a/A34-weighted-round-robin.md b/A34-edf-weighted-round-robin.md similarity index 72% rename from A34-weighted-round-robin.md rename to A34-edf-weighted-round-robin.md index 083d1e208..c75a1182e 100644 --- a/A34-weighted-round-robin.md +++ b/A34-edf-weighted-round-robin.md @@ -1,35 +1,35 @@ -`weighted_round_robin` lb_policy for per endpoint `load_balancing_weight` from `ClusterLoadAssignment` response +`edf_weighted_round_robin` lb_policy for per endpoint `load_balancing_weight` from `ClusterLoadAssignment` response ---- * Author(s): Yi-Shu Tai (echo80313@gmail.com) -* Approver: a11r, markdroth +* Approver: markdroth * Status: In Review * Implemented in: N/A * Last updated: 2020-09-20 * Discussion at: https://groups.google.com/g/grpc-io/c/j76bnPgpHYo ## Abstract -This proposal is for carrying per endpoint weight in address attribute from [`ClusterLoadAssignment`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint.proto#L34) and introducing `weighted_round_robin` policy based on [earliest deadline first scheduling algorithm](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling) for taking advantage of the information which per endpoint weight provides. +This proposal is for carrying per endpoint weight in address attribute from [`ClusterLoadAssignment`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint.proto#L34) and introducing `edf_weighted_round_robin` policy based on [earliest deadline first scheduling algorithm](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling) for taking advantage of the information which per endpoint weight provides. This proposal is based on [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md). ## Background [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md) describes resolver/LB architecture and xDS client behavior. This proposal specifically extends the behavior of EDS section in [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md). We pass [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) of [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) to `lb_policy` by carrying per endpoint `load_balancing_weight` in per-address attribute. `lb_policy` can make use of the information provided by per endpoint `load_balancing_weight` for better load balancing. -To best utilize the information, we also propose a new `lb_policy`, `weighted_round_robin` which works on `LbEndpoint`s within same [`LocalityLbEndpoints`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L116). +To best utilize the information, we also propose a new `lb_policy`, `edf_weighted_round_robin` which works on `LbEndpoint`s within same [`LocalityLbEndpoints`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L116). -This proposal has two parts. The first part is the new `lb_policy`, `weighted_round_robin`. The second part discuss how we handle per endpoint `load_balancing_weight` from `ClusterLoadAssignment` response. +This proposal has two parts. The first part is the new `lb_policy`, `edf_weighted_round_robin`. The second part discuss how we handle per endpoint `load_balancing_weight` from `ClusterLoadAssignment` response. ### Related Proposals: * [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md). ## Proposal -The proposal is to carry [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) of [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) from [`ClusterLoadAssignment`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint.proto#L34) response to `lb_policy` and introduce a new `weighted_round_robin` policy based on Earliest deadline first scheduling algorithm picker [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling). Each endpoint will get fraction equal to the [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) for the [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) divided by the sum of the `load_balancing_weight` of all `LbEndpoint` within the same [`LocalityLbEndpoints`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L116) of traffic routed to the locality. +The proposal is to carry [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) of [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) from [`ClusterLoadAssignment`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint.proto#L34) response to `lb_policy` and introduce a new `edf_weighted_round_robin` policy based on Earliest deadline first scheduling algorithm picker [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling). Each endpoint will get fraction equal to the [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) for the [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) divided by the sum of the `load_balancing_weight` of all `LbEndpoint` within the same [`LocalityLbEndpoints`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L116) of traffic routed to the locality. -### Overview of `weighted_round_robin` policy -`weighted_round_robin` distribute traffic to each endpoint in a way that each endpoint will get fraction traffic equal to the weight associated with the endpoint divided by the sum of the weight of all endpoints. The core of `weighted_round_robin` is [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling) picker. +### Overview of `edf_weighted_round_robin` policy +`edf_weighted_round_robin` distribute traffic to each endpoint in a way that each endpoint will get fraction traffic equal to the weight associated with the endpoint divided by the sum of the weight of all endpoints. The core of `edf_weighted_round_robin` is [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling) picker. #### Weight of each endpoint -`weighted_round_robin` needs extra information `weight` of each endpoint. We pass weight to `lb_policy` as per address attribute. +`edf_weighted_round_robin` needs extra information `weight` of each endpoint. We pass weight to `lb_policy` as per address attribute. #### Overview of EDF scheduler EDF picker maintains a priority queue of `EdfEntry`. The key of the priority queue is `(deadline, order_offset)` pair. On the top of the queue, it’s the entry with lowest `deadline`. `order_offset` is the tie breaker when two entries have same deadline to maintain FIFO order. If there is a tie on deadline of two entries, the one with smaller `order_offset` will have higher priority. @@ -57,7 +57,7 @@ struct EdfEntry { ``` Initialization - At the very beginning, `deadline` of an entry `e` is equal to `1/e.weight`. -- We assign `order_offset` to each entry while constructing the priority queue. `order_offset` assigned to an entry is distinct and nonnegative interger. During the whole lifecycle of this picker, `order_offset` of an entry is unchanged. +- We assign `order_offset` to each entry while constructing the priority queue. `order_offset` assigned to an entry is distinct and nonnegative integer. During the whole lifecycle of this picker, `order_offset` of an entry is unchanged. Pick - On each call to the `Pick`, EDF picker picks the entry `e` on the top of the queue, returns the subchannel associated with the entry. After that, picker updates the `deadline` of `e` to `e.deadline + 1/weight` and either performs a pop and push the entry back to the queue or key increase operation. @@ -67,10 +67,10 @@ Notes - Endpoints do not have `load_balancing_weight` is assigned to 1 (the smallest possible weight). This is to be consistent with the [behavior of envoy on missing weight assignment](https://github.com/envoyproxy/envoy/blob/5d95032baa803f853e9120048b56c8be3dab4b0d/source/common/upstream/upstream_impl.cc#L359) #### Service Config -The service config for `weighted_round_robin` is very similar to `round_robin` +The service config for the `edf_weighted_round_robin` LB policy is an empty proto message ``` { - load_balancing_config: { weighted_round_robin: {}} + load_balancing_config: { edf_weighted_round_robin: {}} } ``` @@ -79,15 +79,15 @@ The service config for `weighted_round_robin` is very similar to `round_robin` This part extends the behavior of EDS section in [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md). Instead of discarding the per endpoint `load_balancing_weight`, we want to add it to per-address attibute and pass it along to `lb_policy`. #### `lb_policy` for per endpoint `load_balancing_weight` from `ClusterLoadAssignment` -When the `lb_policy` field in CDS response is `ROUND_ROBIN`, we use `weighted_round_robin` as the `lb_policy`. +When the `lb_policy` field in CDS response is `ROUND_ROBIN`, we use `edf_weighted_round_robin` as the `lb_policy`. -As of today, we only accept `ROUND_ROBIN` as `lb_policy` in CDS response per [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md). Therefore, `weighted_round_robin` will always be used. +As of today, we only accept `ROUND_ROBIN` as `lb_policy` in CDS response per [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md). Therefore, `edf_weighted_round_robin` will always be used. #### On update of `ClusterLoadAssignment` When an EDS update is received, an update will be sent to the `lb_policy`. The `lb_policy` will create a new picker. This is slightly Different from [Envoy](https://github.com/envoyproxy/envoy/blob/51551ae944c642e6fc61563cbea8653087e70f1f/source/common/upstream/load_balancer_impl.cc#L733-L737). We'd like to udpate EDF priority queue so that new weights applied immediately even endpoints list is not changed. #### NOTE -- `weighted_round_robin` should always be updated to the lastest `ClusterLoadAssignment`. It's xDS server's responsibility to maintain consistency. +- `edf_weighted_round_robin` should always be updated to the lastest `ClusterLoadAssignment`. It's xDS server's responsibility to maintain consistency. ## Rationale From fe7b887b3697f5eae59528f46f89ec183ea7637b Mon Sep 17 00:00:00 2001 From: Yi-Shu Tai Date: Mon, 9 Nov 2020 00:32:22 -0800 Subject: [PATCH 07/11] add connectivity management section --- A34-edf-weighted-round-robin.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/A34-edf-weighted-round-robin.md b/A34-edf-weighted-round-robin.md index c75a1182e..8d5f2a09e 100644 --- a/A34-edf-weighted-round-robin.md +++ b/A34-edf-weighted-round-robin.md @@ -66,6 +66,9 @@ Notes - If all endpoints have the same `load_balancing_weight`, `EDF` picker degenerates to `round_robin` picker. The order of picked subschannel is purely decided by `order_offset`. It's easier to reason and consistent with envoy. - Endpoints do not have `load_balancing_weight` is assigned to 1 (the smallest possible weight). This is to be consistent with the [behavior of envoy on missing weight assignment](https://github.com/envoyproxy/envoy/blob/5d95032baa803f853e9120048b56c8be3dab4b0d/source/common/upstream/upstream_impl.cc#L359) +#### Subchannel connectivity management +`edf_weighted_round_robin` proactively monitors the connectivity of each subchannel. `edf_weighted_round_robin` always tries to keep one connection open to each address in the address list at all times. When `edf_weighted_round_robin` is first instantiated, it immediately tries to connect to all addresses, and whenever a subchannel becomes disconnected, it immediately tries to reconnect. + #### Service Config The service config for the `edf_weighted_round_robin` LB policy is an empty proto message ``` @@ -97,11 +100,11 @@ The reason to refresh EDF picker even there is only weight change on some endpoi The reasons to introduce a new algorithm instead of re-using the same algorithm of `weighted_target` policy are - [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling) maintains FIFO order for endpoints with same weight which is easier to reason. -- We want to be consistent with the behavior of Envoy. +- We want to be consistent with the behavior of Envoy on how lb_policy picks the backend for sending the request. However, there is a difference between `edf_weighted_round_robin` and `edf_scheduler` of Envoy. `edf_weighted_round_robin` actively monitors the connectivity of each subchannel but `edf_scheduler` of Envoy does not. ## Implementation N/A ## Open issues (if applicable) -- Do we need a way to opt out WRR even per endpoint weight is assigned by ClusterLoadAssignment? \ No newline at end of file +N/A From 577f84d5f3433f5ff6ec62a3c8519abb6bc173ec Mon Sep 17 00:00:00 2001 From: Yi-Shu Tai Date: Tue, 10 Nov 2020 16:58:00 -0800 Subject: [PATCH 08/11] add open issues and store weight as uint32 --- A34-edf-weighted-round-robin.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/A34-edf-weighted-round-robin.md b/A34-edf-weighted-round-robin.md index 8d5f2a09e..c1900ba6e 100644 --- a/A34-edf-weighted-round-robin.md +++ b/A34-edf-weighted-round-robin.md @@ -49,23 +49,24 @@ struct EdfEntry { uint64 order_offset; // `load_balancing_weight` of this endpoint from address attribute of this endpoint. - double weight; + uint32 weight; // Subchannel data structure of this endpoint. Subchannel subchannel; } ``` Initialization -- At the very beginning, `deadline` of an entry `e` is equal to `1/e.weight`. +- At the very beginning, `deadline` of an entry `e` is equal to `1.0/e.weight`. - We assign `order_offset` to each entry while constructing the priority queue. `order_offset` assigned to an entry is distinct and nonnegative integer. During the whole lifecycle of this picker, `order_offset` of an entry is unchanged. Pick -- On each call to the `Pick`, EDF picker picks the entry `e` on the top of the queue, returns the subchannel associated with the entry. After that, picker updates the `deadline` of `e` to `e.deadline + 1/weight` and either performs a pop and push the entry back to the queue or key increase operation. +- On each call to the `Pick`, EDF picker picks the entry `e` on the top of the queue, returns the subchannel associated with the entry. After that, picker updates the `deadline` of `e` to `e.deadline + 1.0/weight` and either performs a pop and push the entry back to the queue or key increase operation. Notes - If all endpoints have the same `load_balancing_weight`, `EDF` picker degenerates to `round_robin` picker. The order of picked subschannel is purely decided by `order_offset`. It's easier to reason and consistent with envoy. - Endpoints do not have `load_balancing_weight` is assigned to 1 (the smallest possible weight). This is to be consistent with the [behavior of envoy on missing weight assignment](https://github.com/envoyproxy/envoy/blob/5d95032baa803f853e9120048b56c8be3dab4b0d/source/common/upstream/upstream_impl.cc#L359) + #### Subchannel connectivity management `edf_weighted_round_robin` proactively monitors the connectivity of each subchannel. `edf_weighted_round_robin` always tries to keep one connection open to each address in the address list at all times. When `edf_weighted_round_robin` is first instantiated, it immediately tries to connect to all addresses, and whenever a subchannel becomes disconnected, it immediately tries to reconnect. @@ -107,4 +108,4 @@ The reasons to introduce a new algorithm instead of re-using the same algorithm N/A ## Open issues (if applicable) -N/A +* How to desync to avoid all clients do synchronized pick? From e8e0c9ba66a29f2d55066cc549195a3fb0233a4c Mon Sep 17 00:00:00 2001 From: Yi-Shu Tai Date: Wed, 11 Nov 2020 13:22:36 -0800 Subject: [PATCH 09/11] better describe the issue --- A34-edf-weighted-round-robin.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/A34-edf-weighted-round-robin.md b/A34-edf-weighted-round-robin.md index c1900ba6e..987907067 100644 --- a/A34-edf-weighted-round-robin.md +++ b/A34-edf-weighted-round-robin.md @@ -74,7 +74,7 @@ Notes The service config for the `edf_weighted_round_robin` LB policy is an empty proto message ``` { - load_balancing_config: { edf_weighted_round_robin: {}} + load_balancing_config: {edf_weighted_round_robin: {}} } ``` @@ -108,4 +108,4 @@ The reasons to introduce a new algorithm instead of re-using the same algorithm N/A ## Open issues (if applicable) -* How to desync to avoid all clients do synchronized pick? +* How to desync to avoid all clients do synchronized pick? i.e. xDS server sends out a new list of endpoints, then `edf_weighted_round_robin` rebuilds the picker. All clients will start sending requests to endpoints in the same order, so instead of spreading load, clients start slamming endpoints one by one. To fix the issue, we can randomize the starting index in simpler round robin, but can we do the same thing in `edf_weighted_round_robin`? From 76c0ecd3d77ae6ef668a33803e26381f99d034f2 Mon Sep 17 00:00:00 2001 From: Yi-Shu Tai Date: Mon, 8 Feb 2021 01:54:02 -0800 Subject: [PATCH 10/11] switch to wrsq --- A34-edf-weighted-round-robin.md | 111 ------------------------------- A34-wrsq-weighted-round-robin.md | 93 ++++++++++++++++++++++++++ 2 files changed, 93 insertions(+), 111 deletions(-) delete mode 100644 A34-edf-weighted-round-robin.md create mode 100644 A34-wrsq-weighted-round-robin.md diff --git a/A34-edf-weighted-round-robin.md b/A34-edf-weighted-round-robin.md deleted file mode 100644 index 987907067..000000000 --- a/A34-edf-weighted-round-robin.md +++ /dev/null @@ -1,111 +0,0 @@ -`edf_weighted_round_robin` lb_policy for per endpoint `load_balancing_weight` from `ClusterLoadAssignment` response ----- -* Author(s): Yi-Shu Tai (echo80313@gmail.com) -* Approver: markdroth -* Status: In Review -* Implemented in: N/A -* Last updated: 2020-09-20 -* Discussion at: https://groups.google.com/g/grpc-io/c/j76bnPgpHYo - -## Abstract -This proposal is for carrying per endpoint weight in address attribute from [`ClusterLoadAssignment`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint.proto#L34) and introducing `edf_weighted_round_robin` policy based on [earliest deadline first scheduling algorithm](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling) for taking advantage of the information which per endpoint weight provides. - -This proposal is based on [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md). - -## Background -[A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md) describes resolver/LB architecture and xDS client behavior. This proposal specifically extends the behavior of EDS section in [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md). We pass [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) of [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) to `lb_policy` by carrying per endpoint `load_balancing_weight` in per-address attribute. `lb_policy` can make use of the information provided by per endpoint `load_balancing_weight` for better load balancing. - -To best utilize the information, we also propose a new `lb_policy`, `edf_weighted_round_robin` which works on `LbEndpoint`s within same [`LocalityLbEndpoints`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L116). - -This proposal has two parts. The first part is the new `lb_policy`, `edf_weighted_round_robin`. The second part discuss how we handle per endpoint `load_balancing_weight` from `ClusterLoadAssignment` response. - -### Related Proposals: -* [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md). - -## Proposal -The proposal is to carry [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) of [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) from [`ClusterLoadAssignment`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint.proto#L34) response to `lb_policy` and introduce a new `edf_weighted_round_robin` policy based on Earliest deadline first scheduling algorithm picker [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling). Each endpoint will get fraction equal to the [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) for the [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) divided by the sum of the `load_balancing_weight` of all `LbEndpoint` within the same [`LocalityLbEndpoints`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L116) of traffic routed to the locality. - -### Overview of `edf_weighted_round_robin` policy -`edf_weighted_round_robin` distribute traffic to each endpoint in a way that each endpoint will get fraction traffic equal to the weight associated with the endpoint divided by the sum of the weight of all endpoints. The core of `edf_weighted_round_robin` is [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling) picker. - -#### Weight of each endpoint -`edf_weighted_round_robin` needs extra information `weight` of each endpoint. We pass weight to `lb_policy` as per address attribute. - -#### Overview of EDF scheduler -EDF picker maintains a priority queue of `EdfEntry`. The key of the priority queue is `(deadline, order_offset)` pair. On the top of the queue, it’s the entry with lowest `deadline`. `order_offset` is the tie breaker when two entries have same deadline to maintain FIFO order. If there is a tie on deadline of two entries, the one with smaller `order_offset` will have higher priority. - -Proposed `EdfEntry` -``` -struct EdfEntry { - // primary key for the priority queue. The entry with least deadline is the top of the queue. - double deadline; - - // secondary key for the priority queue. Used as a tiebreaker for same deadline to - // maintain FIFO order. If there is a tie on deadline of two entries, the one with - // smaller `order_offset` will have higher priority. `order_offset` is assigned to this - // entry on constructing the priority queue for the first time and it's immutable. - // Also the `order_offset` assigned to entries is strictly increasing, in other words, - // no two entries have same `order_offset`. - uint64 order_offset; - - // `load_balancing_weight` of this endpoint from address attribute of this endpoint. - uint32 weight; - - // Subchannel data structure of this endpoint. - Subchannel subchannel; -} -``` -Initialization -- At the very beginning, `deadline` of an entry `e` is equal to `1.0/e.weight`. -- We assign `order_offset` to each entry while constructing the priority queue. `order_offset` assigned to an entry is distinct and nonnegative integer. During the whole lifecycle of this picker, `order_offset` of an entry is unchanged. - -Pick -- On each call to the `Pick`, EDF picker picks the entry `e` on the top of the queue, returns the subchannel associated with the entry. After that, picker updates the `deadline` of `e` to `e.deadline + 1.0/weight` and either performs a pop and push the entry back to the queue or key increase operation. - -Notes -- If all endpoints have the same `load_balancing_weight`, `EDF` picker degenerates to `round_robin` picker. The order of picked subschannel is purely decided by `order_offset`. It's easier to reason and consistent with envoy. -- Endpoints do not have `load_balancing_weight` is assigned to 1 (the smallest possible weight). This is to be consistent with the [behavior of envoy on missing weight assignment](https://github.com/envoyproxy/envoy/blob/5d95032baa803f853e9120048b56c8be3dab4b0d/source/common/upstream/upstream_impl.cc#L359) - - -#### Subchannel connectivity management -`edf_weighted_round_robin` proactively monitors the connectivity of each subchannel. `edf_weighted_round_robin` always tries to keep one connection open to each address in the address list at all times. When `edf_weighted_round_robin` is first instantiated, it immediately tries to connect to all addresses, and whenever a subchannel becomes disconnected, it immediately tries to reconnect. - -#### Service Config -The service config for the `edf_weighted_round_robin` LB policy is an empty proto message -``` -{ - load_balancing_config: {edf_weighted_round_robin: {}} -} -``` - -### Handling per endpoint `load_balancing_weight` from `ClusterLoadAssignment` response - -This part extends the behavior of EDS section in [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md). Instead of discarding the per endpoint `load_balancing_weight`, we want to add it to per-address attibute and pass it along to `lb_policy`. - -#### `lb_policy` for per endpoint `load_balancing_weight` from `ClusterLoadAssignment` -When the `lb_policy` field in CDS response is `ROUND_ROBIN`, we use `edf_weighted_round_robin` as the `lb_policy`. - -As of today, we only accept `ROUND_ROBIN` as `lb_policy` in CDS response per [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md). Therefore, `edf_weighted_round_robin` will always be used. - -#### On update of `ClusterLoadAssignment` -When an EDS update is received, an update will be sent to the `lb_policy`. The `lb_policy` will create a new picker. This is slightly Different from [Envoy](https://github.com/envoyproxy/envoy/blob/51551ae944c642e6fc61563cbea8653087e70f1f/source/common/upstream/load_balancer_impl.cc#L733-L737). We'd like to udpate EDF priority queue so that new weights applied immediately even endpoints list is not changed. - -#### NOTE -- `edf_weighted_round_robin` should always be updated to the lastest `ClusterLoadAssignment`. It's xDS server's responsibility to maintain consistency. - -## Rationale - -Several applications can be built upon this feature, e.g. utilization load balancing, blackhole erroring endpoints, load testing,... etc. - -The reason to refresh EDF picker even there is only weight change on some endpoints which is different from envoy is because we'd like real time traffic shift for use cases like load testing, blackhole erroring endpoints. - -The reasons to introduce a new algorithm instead of re-using the same algorithm of `weighted_target` policy are -- [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling) maintains FIFO order for endpoints with same weight which is easier to reason. -- We want to be consistent with the behavior of Envoy on how lb_policy picks the backend for sending the request. However, there is a difference between `edf_weighted_round_robin` and `edf_scheduler` of Envoy. `edf_weighted_round_robin` actively monitors the connectivity of each subchannel but `edf_scheduler` of Envoy does not. - -## Implementation - -N/A - -## Open issues (if applicable) -* How to desync to avoid all clients do synchronized pick? i.e. xDS server sends out a new list of endpoints, then `edf_weighted_round_robin` rebuilds the picker. All clients will start sending requests to endpoints in the same order, so instead of spreading load, clients start slamming endpoints one by one. To fix the issue, we can randomize the starting index in simpler round robin, but can we do the same thing in `edf_weighted_round_robin`? diff --git a/A34-wrsq-weighted-round-robin.md b/A34-wrsq-weighted-round-robin.md new file mode 100644 index 000000000..40f8a4994 --- /dev/null +++ b/A34-wrsq-weighted-round-robin.md @@ -0,0 +1,93 @@ +`wrsq_weighted_round_robin` lb_policy for per endpoint `load_balancing_weight` from `ClusterLoadAssignment` response +---- +* Author(s): Yi-Shu Tai (echo80313@gmail.com) +* Approver: markdroth +* Status: In Review +* Implemented in: N/A +* Last updated: 2021-02-08 +* Discussion at: https://groups.google.com/g/grpc-io/c/j76bnPgpHYo + +## Abstract +This proposal is for carrying per endpoint weight in address attribute from [`ClusterLoadAssignment`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint.proto#L34) and introducing `wrsq_weighted_round_robin` policy based on weighted random selection queue (WRSQ) for taking advantage of the information which per endpoint weight provides. + +This proposal is based on [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md). + +## Background +[A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md) describes resolver/LB architecture and xDS client behavior. This proposal specifically extends the behavior of EDS section in [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md). We pass [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) of [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) to `lb_policy` by carrying per endpoint `load_balancing_weight` in per-address attribute. `lb_policy` can make use of the information provided by per endpoint `load_balancing_weight` for better load balancing. + +To best utilize the information, we also propose a new `lb_policy`, `wrsq_weighted_round_robin` which works on `LbEndpoint`s within same [`LocalityLbEndpoints`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L116). + +This proposal has two parts. The first part is the new `lb_policy`, `wrsq_weighted_round_robin`. The second part discusses how we handle per endpoint `load_balancing_weight` from `ClusterLoadAssignment` response. + +### Related Proposals: +* [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md). + +## Proposal +The proposal is to carry [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) of [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) from [`ClusterLoadAssignment`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint.proto#L34) response to `lb_policy` and introduce a new `wrsq_weighted_round_robin` policy based on weighted random selection queue (WRSQ) algorithm picker. Each endpoint will get fraction equal to the [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) for the [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) divided by the sum of the `load_balancing_weight` of all `LbEndpoint` within the same [`LocalityLbEndpoints`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L116) of traffic routed to the locality. + +### Overview of `wrsq_weighted_round_robin` policy +`wrsq_weighted_round_robin` distribute traffic to each endpoint in a way that each endpoint will get fraction traffic equal to the weight associated with the endpoint divided by the sum of the weight of all endpoints. The core of `wrsq_weighted_round_robin` is weighted random selection queue (WRSQ) algorithm. + +WRSQ scheduler keeps a FIFO queue for each unique weight among all endpoints inserted and adds the endpoints to their respective queue based on weight. A queue weight, the endpoint weight times number of endpoints in the queue, is assigned to all the queues. + +#### Pick operation +Pick operation consists of 3 steps: +1. Select a queue +2. Pop the endpoint at the front of the selected queue, and the endpoint is returned to caller +3. Push the selected endpoint to the rear of the queue. + +A queue `q_i` is selected by the probability `(queue i's weight) / (sum of all queue weight)`. `Pick` returns the endpoint at the front of the picked queue, then pushes the endpoint to the rear of the queue. + +To pick a queue efficiently, we can pre-compute a prefix-sum array. The 1st element of array is `w_1` * `t_1`, the 2nd is `w_1` * `t1` + `w_2` * `t2`, ... and the nth is `w_1` * `t1` + `w_2` * `t2` + ... + `w_n` * `t_n`. To select the queue with the intended probability is equivalent to generating a random nonnegative integer `x` in [0, the last element of prefix-sum array] and finding the first element of the prefix-sum array which is larger than `x`. Say the element is index `i`, the queue `i` is picked. The time complexity of picking a queue is `O(log n)` (n is the number of queues) because prefix-sum array is sorted so binary search can be used. After queue is picked, popping endpoint from the queue is `O(1)`. + +#### Correctness +Assume that there are `w_1`, `w_2`,... `w_n` unique weight among endpoints and there are `t_1` endpoints with weight `w_1`, `t_2` endpoints with weight `w_2`,..., `t_n` endpoints with weight `w_n`. By the definition, WRSQ constructs `q_1` for `w_1`, `q_2` for `w_2`, ... , `q_n` for `w_n`. The weight of `q_i` is `w_i` times `t_i`. The expected time of a endpoint with weight `w_i` being picked among `m` pick operations is `m` times the probability of `q_i` being picked ( (`w_i` * `t_i`) / (`w_1` * `t1` + `w_2` * `t2` + ... + `w_n` * `t_n`) ) times (`1/t_i`) which is equal to `m`(`w_i` / (`w_1` * `t1` + `w_2` * `t2` + ... + `w_n` * `t_n`) ). + +#### Building picker +There are 2 parts in building the picker. +1. Construct queues for each unique weight and push all endpoints to the corresponding queue +2. Build prefix-sum array of queue weight + +Constructing queues is linear to the number of endpoints and building prefix-sum array is linear to number of queues. + +To avoid first pick determinism issue, need to shuffle the order of endpoints before pushing into the queue. + +#### Weight of each endpoint +`wrsq_weighted_round_robin` needs extra information `weight` of each endpoint. We pass weight to `lb_policy` as per address attribute. + + +#### Subchannel connectivity management +`wrsq_weighted_round_robin` proactively monitors the connectivity of each subchannel. `wrsq_weighted_round_robin` always tries to keep one connection open to each address in the address list at all times. When `wrsq_weighted_round_robin` is first instantiated, it immediately tries to connect to all addresses, and whenever a subchannel becomes disconnected, it immediately tries to reconnect. + +#### Service Config +The service config for the `wrsq_weighted_round_robin` LB policy is an empty proto message +``` +{ + load_balancing_config: {wrsq_weighted_round_robin: {}} +} +``` + +### Handling per endpoint `load_balancing_weight` from `ClusterLoadAssignment` response +This part extends the behavior of EDS section in [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md). Instead of discarding the per endpoint `load_balancing_weight`, we want to add it to per-address attribute and pass it along to `lb_policy`. + +#### `lb_policy` for per endpoint `load_balancing_weight` from `ClusterLoadAssignment` +When the `lb_policy` field in CDS response is `ROUND_ROBIN`, we use `wrsq_weighted_round_robin` as the `lb_policy`. + +We only accept `ROUND_ROBIN` as `lb_policy` in CDS response per [A27: xDS-Based Global Load Balancing](https://github.com/grpc/proposal/blob/master/A27-xds-global-load-balancing.md). Therefore, `wrsq_weighted_round_robin` will always be used. + +#### On update of `ClusterLoadAssignment` +When an EDS update is received, an update will be sent to the `lb_policy`. The `lb_policy` will create a new picker. Only positive integer will be accepted as valid endpoint weight, otherwise will be assigned lowest valid weight `1`. + +#### NOTE +- `wrsq_weighted_round_robin` should always be updated to the latest `ClusterLoadAssignment`. It's xDS server's responsibility to maintain consistency. + +## Rationale +The alternative algorithm for weighted round robin is earliest deadline first scheduling algorithm [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling). The reasons we pick WRSQ are +1. Consistent with Envoy [issue 14597](https://github.com/envoyproxy/envoy/issues/14597) +2. It's easier to avoid all clients do synchronized pick the same endpoint especially the first pick. + +## Implementation + +N/A + +## Open issues (if applicable) From 45169f729a4ab290e2edd63be2d246bc631b43d0 Mon Sep 17 00:00:00 2001 From: Yi-Shu Tai Date: Wed, 10 Feb 2021 20:36:35 -0800 Subject: [PATCH 11/11] fix wording --- A34-wrsq-weighted-round-robin.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/A34-wrsq-weighted-round-robin.md b/A34-wrsq-weighted-round-robin.md index 40f8a4994..e07c79c2e 100644 --- a/A34-wrsq-weighted-round-robin.md +++ b/A34-wrsq-weighted-round-robin.md @@ -4,7 +4,7 @@ * Approver: markdroth * Status: In Review * Implemented in: N/A -* Last updated: 2021-02-08 +* Last updated: 2021-02-10 * Discussion at: https://groups.google.com/g/grpc-io/c/j76bnPgpHYo ## Abstract @@ -28,17 +28,17 @@ The proposal is to carry [`load_balancing_weight`](https://github.com/envoyproxy ### Overview of `wrsq_weighted_round_robin` policy `wrsq_weighted_round_robin` distribute traffic to each endpoint in a way that each endpoint will get fraction traffic equal to the weight associated with the endpoint divided by the sum of the weight of all endpoints. The core of `wrsq_weighted_round_robin` is weighted random selection queue (WRSQ) algorithm. -WRSQ scheduler keeps a FIFO queue for each unique weight among all endpoints inserted and adds the endpoints to their respective queue based on weight. A queue weight, the endpoint weight times number of endpoints in the queue, is assigned to all the queues. +WRSQ scheduler keeps a FIFO queue for each unique weight among all endpoint weights. A queue weight, the endpoint weight times number of endpoints in the queue, is assigned to each of the queues. #### Pick operation Pick operation consists of 3 steps: -1. Select a queue +1. Select a queue with the probability based on the queue weight. 2. Pop the endpoint at the front of the selected queue, and the endpoint is returned to caller 3. Push the selected endpoint to the rear of the queue. -A queue `q_i` is selected by the probability `(queue i's weight) / (sum of all queue weight)`. `Pick` returns the endpoint at the front of the picked queue, then pushes the endpoint to the rear of the queue. +A queue `q_i` is selected by the probability `(queue i's weight) / (sum of all queue weight)`. `Pick` returns the endpoint at the front of the selected queue, then pushes the endpoint to the rear of the queue. -To pick a queue efficiently, we can pre-compute a prefix-sum array. The 1st element of array is `w_1` * `t_1`, the 2nd is `w_1` * `t1` + `w_2` * `t2`, ... and the nth is `w_1` * `t1` + `w_2` * `t2` + ... + `w_n` * `t_n`. To select the queue with the intended probability is equivalent to generating a random nonnegative integer `x` in [0, the last element of prefix-sum array] and finding the first element of the prefix-sum array which is larger than `x`. Say the element is index `i`, the queue `i` is picked. The time complexity of picking a queue is `O(log n)` (n is the number of queues) because prefix-sum array is sorted so binary search can be used. After queue is picked, popping endpoint from the queue is `O(1)`. +To select a queue efficiently, we can pre-compute a prefix-sum array. The 1st element of array is `w_1` * `t_1`, the 2nd is `w_1` * `t1` + `w_2` * `t2`, ... and the nth is `w_1` * `t1` + `w_2` * `t2` + ... + `w_n` * `t_n`. To select the queue with the intended probability is equivalent to generating a random nonnegative integer `x` in [0, the last element of prefix-sum array] and finding the first element of the prefix-sum array which is larger than `x`. Say the element is index `i`, the queue `i` is picked. The time complexity of picking a queue is `O(log n)` (n is the number of queues) because prefix-sum array is sorted so binary search can be used. After queue is picked, popping endpoint from the queue is `O(1)`. #### Correctness Assume that there are `w_1`, `w_2`,... `w_n` unique weight among endpoints and there are `t_1` endpoints with weight `w_1`, `t_2` endpoints with weight `w_2`,..., `t_n` endpoints with weight `w_n`. By the definition, WRSQ constructs `q_1` for `w_1`, `q_2` for `w_2`, ... , `q_n` for `w_n`. The weight of `q_i` is `w_i` times `t_i`. The expected time of a endpoint with weight `w_i` being picked among `m` pick operations is `m` times the probability of `q_i` being picked ( (`w_i` * `t_i`) / (`w_1` * `t1` + `w_2` * `t2` + ... + `w_n` * `t_n`) ) times (`1/t_i`) which is equal to `m`(`w_i` / (`w_1` * `t1` + `w_2` * `t2` + ... + `w_n` * `t_n`) ). @@ -48,9 +48,9 @@ There are 2 parts in building the picker. 1. Construct queues for each unique weight and push all endpoints to the corresponding queue 2. Build prefix-sum array of queue weight -Constructing queues is linear to the number of endpoints and building prefix-sum array is linear to number of queues. +Time complexity of constructing queues is linear to the number of endpoints and building prefix-sum array is linear to number of queues. -To avoid first pick determinism issue, need to shuffle the order of endpoints before pushing into the queue. +To avoid all clients pick the same endpoint synchronously, we need to randomly shuffle the order of endpoints before pushing into the queue. #### Weight of each endpoint `wrsq_weighted_round_robin` needs extra information `weight` of each endpoint. We pass weight to `lb_policy` as per address attribute. @@ -84,7 +84,7 @@ When an EDS update is received, an update will be sent to the `lb_policy`. The ` ## Rationale The alternative algorithm for weighted round robin is earliest deadline first scheduling algorithm [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling). The reasons we pick WRSQ are 1. Consistent with Envoy [issue 14597](https://github.com/envoyproxy/envoy/issues/14597) -2. It's easier to avoid all clients do synchronized pick the same endpoint especially the first pick. +2. It's easier to avoid all clients perform synchronized pick the same endpoint especially the first pick. ## Implementation