Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A34 weighted_round_robin lb_policy for per endpoint weight from ClusterLoadAssignment response #202

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

yishuT
Copy link

@yishuT yishuT commented Aug 17, 2020

No description provided.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Aug 17, 2020

CLA Check
The committers are authorized under a signed CLA.

@markdroth
Copy link
Member

Please fix the commit to include your user ID, as per the EasyCLA check above.

@yishuT
Copy link
Author

yishuT commented Sep 7, 2020

Hey, would like to hear an update even you think this proposal is far from the roadmap. I personally have some cycles in coming weeks to make this proposal aligned with grpc roadmap.

Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for taking so long to get to this!

Please let me know if you have any questions about any of this.

A34-weighted-round-robin.md Outdated Show resolved Hide resolved

The proposal is to carry [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) of [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) from [`ClusterLoadAssignment`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint.proto#L34) response to `lb_policy` and implement a `weighted_round_robin` policy based on Earliest deadline first scheduling algorithm picker [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling). Each endpoint will get fraction equal to the [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) for the [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) divided by the sum of the `load_balancing_weight` of all `LbEndpoint` within the same [`LocalityLbEndpoints`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L116) of traffic routed to the locality.

### Overview of `weighted_round_robin` policy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to call this policy something other than "weighted_round_robin", since we are planning to eventually provide a policy with that name that works slightly differently: instead of getting the endpoint weights along with the addresses, the policy will automatically compute the endpoint weights based on utilization data reported by the endpoints.

I can't offhand come up with a really ideal suggestion for a better name here. I can think of the following options, none of which seem quite ideal:

  • address_weighted_round_robin
  • injected_weighted_round_robin
  • weighted_round_robin_with_injected_weights

I'm open to other options here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using "utilization_weighted_round_robin" for the other policy? That policy could potentially be implemented as a wrapper of this policy, FWIW.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting idea -- I hadn't thought of wrapping one with the other.

In order to do this, we'd have to use the same WRR algorithm in both cases. @menghanl, you've looked into how our internal WRR algorithm works. How does that compare to the EDF algorithm proposed here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapping one with another means the wrapper needs to convert the loads into weights, and send the new weights along with the addresses to the child whenever the load changes?

I would think if we make an EDF scheduler that's capable of handling both situations, the two balancers can reuse this scheduler implementation.

The WRR algorithm is similar to the internal one. I will double check the details.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no preference toward the name. Per discussion here, I would leave it as weighted_round_robin and other policy can build upon it. Or maybe weighted_round_robin_by_edf to indicate underlying impl?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@menghanl, did you have a chance to look up how the algorithm compares to the internal WRR algorithm?

I guess for now, let's call this edf_weighted_round_robin.

A34-weighted-round-robin.md Outdated Show resolved Hide resolved
A34-weighted-round-robin.md Outdated Show resolved Hide resolved
A34-weighted-round-robin.md Outdated Show resolved Hide resolved
A34-weighted-round-robin.md Outdated Show resolved Hide resolved
A34-weighted-round-robin.md Outdated Show resolved Hide resolved
A34-weighted-round-robin.md Outdated Show resolved Hide resolved
A34-weighted-round-robin.md Outdated Show resolved Hide resolved

The proposal is to carry [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) of [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) from [`ClusterLoadAssignment`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint.proto#L34) response to `lb_policy` and implement a `weighted_round_robin` policy based on Earliest deadline first scheduling algorithm picker [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling). Each endpoint will get fraction equal to the [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) for the [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) divided by the sum of the `load_balancing_weight` of all `LbEndpoint` within the same [`LocalityLbEndpoints`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L116) of traffic routed to the locality.

### Overview of `weighted_round_robin` policy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using "utilization_weighted_round_robin" for the other policy? That policy could potentially be implemented as a wrapper of this policy, FWIW.

Comment on lines 37 to 38
// secondary key for the priority queue. Used as a tiebreaker for same deadline to maintain FIFO order.
uint64 order_offset;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How this works needs to be documented. Is the smaller value preferred? What if there is a tie here? Is that illegal? Do we validate that?

```

- On each call to the `Pick`, EDF picker picks the entry `e` on the top of the queue, returns the subchannel associated with the entry. After that, picker updates the `deadline` of `e` to `e.deadline + 1/weight` and either performs a pop and push the entry back to the queue or key increase operation.
- `weighted_round_robin` updates the entries in EDF priority queue on any change of subchannel `ConnnectiveState` or endpoint `load_balancing_weight`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should that be ConnectivityState?


- On each call to the `Pick`, EDF picker picks the entry `e` on the top of the queue, returns the subchannel associated with the entry. After that, picker updates the `deadline` of `e` to `e.deadline + 1/weight` and either performs a pop and push the entry back to the queue or key increase operation.
- `weighted_round_robin` updates the entries in EDF priority queue on any change of subchannel `ConnnectiveState` or endpoint `load_balancing_weight`.
- If all endpoints have the same `load_balancing_weight`, `EDF` picker degenerates to `round_robin` picker. It's easier to reason and consistent with envoy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use the order_offset to determine the order in this case?

A34-weighted-round-robin.md Outdated Show resolved Hide resolved
@dfawley
Copy link
Member

dfawley commented Sep 8, 2020

cc @menghanl

A34-weighted-round-robin.md Outdated Show resolved Hide resolved

The proposal is to carry [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) of [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) from [`ClusterLoadAssignment`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint.proto#L34) response to `lb_policy` and implement a `weighted_round_robin` policy based on Earliest deadline first scheduling algorithm picker [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling). Each endpoint will get fraction equal to the [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) for the [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) divided by the sum of the `load_balancing_weight` of all `LbEndpoint` within the same [`LocalityLbEndpoints`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L116) of traffic routed to the locality.

### Overview of `weighted_round_robin` policy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section needs to specify:

  • format of balancing policy config (will always be empty strings for for this policy, because there's no policy level config)
  • addresses (with weights in address.attributes)

/*
`Pick` picks the EdfEntry on top of the queue, returns the underlying SubChannel and updates deadline of the EdfEntry.
*/
PickResult Pick(PickArgs args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pick would be only public API of picker. The other "update"s should be internal to the balancing policy.


### On new `ClusterLoadAssignment`

#### endpoints have weight change
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to add more details of how the change will be handled. E.g., only the weight is updated, but not the deadline, so the entry at top of the queue is still at the top?

Similar for the following two updates. This helps to keep the implementations consistent cross languages.

@yishuT
Copy link
Author

yishuT commented Sep 14, 2020

Thanks for all the feedback. I will post another update to address all of them next couple of days. Please feel free to leave any comments in the meantime.

@yishuT
Copy link
Author

yishuT commented Oct 14, 2020

soft ping. any udpates?

Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay on this!

Please let me know if you have any questions. Thanks!

A34-weighted-round-robin.md Outdated Show resolved Hide resolved

The proposal is to carry [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) of [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) from [`ClusterLoadAssignment`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint.proto#L34) response to `lb_policy` and implement a `weighted_round_robin` policy based on Earliest deadline first scheduling algorithm picker [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling). Each endpoint will get fraction equal to the [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) for the [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) divided by the sum of the `load_balancing_weight` of all `LbEndpoint` within the same [`LocalityLbEndpoints`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L116) of traffic routed to the locality.

### Overview of `weighted_round_robin` policy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@menghanl, did you have a chance to look up how the algorithm compares to the internal WRR algorithm?

I guess for now, let's call this edf_weighted_round_robin.

uint64 order_offset;
// `load_balancing_weight` of this endpoint from the latest `ClusterLoadAssignment`.

// `load_balancing_weight` of this endpoint from address attribute of this endpoint.
double weight;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a double instead of a uint32_t? The endpoint weight in the xDS API is an integer, and integers are usually a lot less error-prone to work with, because there's no rounding error.

Same question for the deadline field. Is there a way we can represent is as an integer instead of a double? For example, can we simply multiply by 1000 and say that we are storing milli-units?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I make weight double because of deadline is double. I am not sure we can use uint32 for deadline. On updating deadline of an entry after a pick, one way to implement (or at least envoy does)

new_deadline = deadline + 1/weight

if we want to use uint32, we likely need to

new_deadline = deadline + 1000/weight

or similar. This will need the assumption that per endpoint weight is never larger than 1000.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the deadlines and weights are all multiplied by 1000, then there's no requirement that endpoint weight is never larger than 1000. We are doing exactly the same calculation as we would be doing with doubles, except that we're storing milliunits instead of whole units. The only thing we're losing is the ability to store more 3 decimal places of precision, and that seems like it shouldn't really affect the algorithm in a material way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, sry, I am lost. how to handle the inverse of weight 1/weight, anything larger than the numerator is rounded to 0?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, sorry, let me try this again.

The idea here is that you can do the actual division using a double, just don't use doubles for storage. I think we don't actually need to store the weight as a double at all; it comes as an integer in xDS, and we can keep it that way. We can then do the calculation as a double but immediately round the result for storage.

I think this can be done using something like this:

int32_t weight = <weight from xDS>;
int32_t deadline += std::lround(1000.0 / weight);

Then we continue to use the lowest deadline value as the sorting key.

Does that seem reasonable, or is there something I'm missing about this algorithm? The latter is quite possible, since I haven't grokked it fully -- for example, if deadline is constantly increasing, how does it avoid wrap-around? Is it depending on the double representation to avoid that?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll know in the next few weeks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning to make a more rigorous simulation over the weekend. Hopefully something we can just check in and use for other LBs. Main focus is on what the hosts in the pool experience w.r.t. syncing behavior. The perf numbers are more favoriable to IWRR, so we'll just need to see if there's a catch or unwanted change in behavior.

Stay tuned.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Howdy folks! Happy New Year

@tonya11en do we have a final decision that envoy will use iwrr to replace edf?

Copy link

@tonya11en tonya11en Jan 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we'll be replacing EDF with WRSQ (in review envoyproxy/envoy#14681) for weighted RR among other things. More details on where it'll be used in envoyproxy/envoy#14597.

Weighted Least Request will still use the EDF scheduler.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, thanks

A34-weighted-round-robin.md Outdated Show resolved Hide resolved
A34-weighted-round-robin.md Outdated Show resolved Hide resolved
A34-weighted-round-robin.md Outdated Show resolved Hide resolved
A34-weighted-round-robin.md Outdated Show resolved Hide resolved
N/A

## Open issues (if applicable)
* How to desync to avoid all clients do synchronized pick?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this is referring to. Can you explain the problem here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated. Please let me know it's still not clear

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I understand the problem. I see two possible solutions here:

  • The simple approach is, as you suggest, randomize the starting point in the list whenever we create a new picker, just like round_robin does. This will avoid having all clients hammer the same endpoints at the same time (and also, keep in mind that the control plane could actually send different results to different clients -- each client could see different endpoints or the same endpoints in a different order or with different weights). But it will disrupt the expected scheduling whenever the picker changes.

  • A more complicated solution would be to maintain the current scheduler state in the LB policy and use a mutex to synchronize access to it between the picker and the LB policy. This is more complicated to implement, and the synchronization imposes a performance penalty (because you have to acquire the mutex for every pick), but it should work.

@yishuT
Copy link
Author

yishuT commented May 14, 2022

@markdroth I want to revisit this proposal. Does gRPC community have any plan for per endpoint weight LB policy? If there is no other active proposal or plan, I can resume the work this month

@markdroth
Copy link
Member

We don't have any other plan to do this, so if it's still something you'd like, please feel free to continue moving this design forward.

@s-matyukevich
Copy link
Contributor

I am confused about the status of this proposal. It is not yet merged, but looks like we are already using endpoint weights, at least in grpc-go.

  • Here EDS response is parsed and weights are stored.
  • Next we use the weight property to configure weighted round robin LB in the EDS (clusterresolver) load balancer link
  • We also use those weights in the ringhash EDS implementation link

@yishuT
Copy link
Author

yishuT commented Jan 30, 2023

Hey @s-matyukevich

afaik, weights are stored but grpc client does not have weighted round robin implemented for every language. The proposal is to have a unified protocol for implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants