-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New implementation of roundrobin and pickfirst #1506
Conversation
c248e34
to
6a94105
Compare
8906133
to
74a3567
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. It's a big PR, so it was easy to put off. :)
balancer/balancer.go
Outdated
@@ -182,6 +182,10 @@ type Picker interface { | |||
// the connectivity states. | |||
// | |||
// It also generates and updates the Picker used by gRPC to pick SubConns for RPCs. | |||
// | |||
// HandleSubConnectionStateChange, HandleResolvedAddrs and Close are guaranteed | |||
// to be called sequentially by the same goroutine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"sequentially" means "in order". I think we want "synchronously from the same goroutine" instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
balancer/balancer.go
Outdated
@@ -196,6 +200,7 @@ type Balancer interface { | |||
// An empty address slice and a non-nil error will be passed if the resolver returns | |||
// non-nil error to gRPC. | |||
HandleResolvedAddrs([]resolver.Address, error) | |||
// Close closes the balancer. | |||
// Close closes the balancer. Balancer is expected to call RemoveSubConn for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change: "The balancer is not required to call ClientConn.RemoveSubConn for its existing SubConns."?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
balancer/roundrobin/roundrobin.go
Outdated
* | ||
*/ | ||
|
||
// Package roundrobin defines a roundrobin balancer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document how to use this or when/by whom it should be used?
Should this package register itself with grpc when imported, instead of exporting NewBuilder()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments updated, PTAL.
Most users don't need to call NewBuilder
. But if they want to have a custom balancer on top of roundrobin, they can call this function to create a builder. (I did this in new grpclb at first, but removed it later).
But there's already another way to get a roundrobin builder: balancer.Get("roundrobin")
. I will unexport this function.
balancer/roundrobin/roundrobin.go
Outdated
type roundrobinBuilder struct{} | ||
|
||
func (*roundrobinBuilder) Build(cc balancer.ClientConn, opt balancer.BuildOptions) balancer.Balancer { | ||
b := &roundrobinBalancer{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: return directly instead of creating b.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
balancer/roundrobin/roundrobin.go
Outdated
return | ||
} | ||
grpclog.Infoln("roundrobinBalancer: got new resolved addresses: ", addrs) | ||
// addrsSet is the set converted from addrs, it's used to quick lookup for an address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"to quickly lookup an address" or "used for quick lookup of an address".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
resolver/resolver.go
Outdated
// Unregister removes the resolver builder with the given scheme from the | ||
// resolver map. | ||
// This function is for testing only. | ||
func Unregister(scheme string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UnregisterForTesting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
clientconn.go
Outdated
@@ -162,6 +163,14 @@ func WithBalancer(b Balancer) DialOption { | |||
} | |||
} | |||
|
|||
// WithBalancerBuilder is for testing only and should be removed. | |||
// TODO(bar) remove this or change the comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Someone once told me not to put TODOs in docstrings... Seems like a reasonable policy since it's for code maintainers and not users? Maybe move this inside the function or above the docstring comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
) | ||
if cc.balancer == nil { | ||
func (cc *ClientConn) getTransport(ctx context.Context, failfast bool) (transport.ClientTransport, func(balancer.DoneInfo), error) { | ||
if cc.balancerWrapper == nil { | ||
// If balancer is nil, there should be only one addrConn available. | ||
cc.mu.RLock() | ||
if cc.conns == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if len(cc.conns) == 0 instead? Then you don't need the "if ac == nil" below.
Why does this return toRPCErr(ErrClientConnClosing) but we return errConnClosing below? We need to clean up this error stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc.conns
is set to nil in cc.Close
, so the error returned when cc.conns == nil
is different from the error returned when ac == nil
...
Added a TODO for error cleanup.
resolver/passthrough/passthrough.go
Outdated
* | ||
*/ | ||
|
||
// Package passthrough implements a pass-through resolver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disclaimer about "for grpc internal use only"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
resolver_conn_wrapper.go
Outdated
|
||
rb := resolver.Get(scheme) | ||
if rb == nil { | ||
// TODO(bar) return error when DNS becomes the default (implemeneted and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*implemented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
todos, comments and prints remove mutex from roundrobin move todos and comments fixes in b wrapper split tuple to scstate tuple and resolver tuple add default select for done blocking picker add blockingpick test remove picker version cleanup in grpc files cleanup and comments in r and b wrapper
74a3567
to
c76f122
Compare
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed, please reply here (e.g.
|
CLAs look good, thanks! |
cbb4d08
to
b2fe11a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the review! All done. PTAL.
balancer/balancer.go
Outdated
@@ -182,6 +182,10 @@ type Picker interface { | |||
// the connectivity states. | |||
// | |||
// It also generates and updates the Picker used by gRPC to pick SubConns for RPCs. | |||
// | |||
// HandleSubConnectionStateChange, HandleResolvedAddrs and Close are guaranteed | |||
// to be called sequentially by the same goroutine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
balancer/balancer.go
Outdated
@@ -196,6 +200,7 @@ type Balancer interface { | |||
// An empty address slice and a non-nil error will be passed if the resolver returns | |||
// non-nil error to gRPC. | |||
HandleResolvedAddrs([]resolver.Address, error) | |||
// Close closes the balancer. | |||
// Close closes the balancer. Balancer is expected to call RemoveSubConn for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
balancer/roundrobin/roundrobin.go
Outdated
* | ||
*/ | ||
|
||
// Package roundrobin defines a roundrobin balancer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments updated, PTAL.
Most users don't need to call NewBuilder
. But if they want to have a custom balancer on top of roundrobin, they can call this function to create a builder. (I did this in new grpclb at first, but removed it later).
But there's already another way to get a roundrobin builder: balancer.Get("roundrobin")
. I will unexport this function.
balancer/roundrobin/roundrobin.go
Outdated
type roundrobinBuilder struct{} | ||
|
||
func (*roundrobinBuilder) Build(cc balancer.ClientConn, opt balancer.BuildOptions) balancer.Balancer { | ||
b := &roundrobinBalancer{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
balancer/roundrobin/roundrobin.go
Outdated
return | ||
} | ||
grpclog.Infoln("roundrobinBalancer: got new resolved addresses: ", addrs) | ||
// addrsSet is the set converted from addrs, it's used to quick lookup for an address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pickfirst.go
Outdated
sc balancer.SubConn | ||
} | ||
|
||
func (p *picker) Pick(ctx context.Context, opts balancer.PickOptions) (conn balancer.SubConn, put func(balancer.DoneInfo), err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Names removed.
resolver/passthrough/passthrough.go
Outdated
* | ||
*/ | ||
|
||
// Package passthrough implements a pass-through resolver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
resolver/resolver.go
Outdated
// Unregister removes the resolver builder with the given scheme from the | ||
// resolver map. | ||
// This function is for testing only. | ||
func Unregister(scheme string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
resolver_conn_wrapper.go
Outdated
|
||
rb := resolver.Get(scheme) | ||
if rb == nil { | ||
// TODO(bar) return error when DNS becomes the default (implemeneted and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
stream.go
Outdated
for { | ||
t, put, err = cc.getTransport(ctx, gopts) | ||
t, put, err = cc.getTransport(ctx, c.failFast) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's done.
- balancer subdir - conn wrapper - picker - resolver - put -> done and make grpclb blocking dial
b2fe11a
to
fd903ae
Compare
if !ok && failfast { | ||
return nil, nil, Errorf(codes.Unavailable, "there is no connection available") | ||
} | ||
if s, ok := bw.connSt[sc]; failfast && (!ok || s.s != connectivity.Ready) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this mostly because the zero value isn't ideal (i.e. an Invalid
or Unknown
). Otherwise, the code ends up simpler because there are fewer conditions involved.
balancer/roundrobin/roundrobin.go
Outdated
// subConns is the snapshot of the roundrobin balancer when this picker was | ||
// created. The slice is immutable. Each Get() will do a round robin | ||
// selection from it and return the selected SubConn. | ||
size int // size if the size of subConns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"is"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// Package roundrobin defines a roundrobin balancer. Roundrobin balancer is | ||
// installed as one of the default balancers in gRPC, users don't need to | ||
// explicitly install this balancer. | ||
package roundrobin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/roundrobin// for package variables/functions/types where possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type balancer
will conflict with the imported balancer
package. I renamed them to rrBuilder
and rrBalancer
...
stream.go
Outdated
for { | ||
t, put, err = cc.getTransport(ctx, gopts) | ||
t, put, err = cc.getTransport(ctx, c.failFast) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you did there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 🎆 🍾
This causes a regression in CockroachDB - we're seeing a number of flaky tests. I haven't yet determined the exact bug, but The symptom appears to be a hanging streaming RPC, though I continue to investigate. My understanding is that this new implementation was meant to be opt-in, but there's definitely a behaviour change here. EDIT: to reproduce in CockroachDB run |
It seems RPCs are The related change in this PR is this line, which actually fixed a broken behavior that failfast didn't fail properly before. Please try to make all the RPCs non-failfast by using |
Note that if you're doing unary RPCs, then this could be dangerous right now if they are not idempotent because of #1532. I revived my branch that fixes this problem and hope to have it checked in this week. EDIT: added "if they are not idempotent" |
Help me understand the change that was made here. Previously every call to Now, help me understand what would happen in the following (racy) situation:
Is that right? I think that's probably the issue we're seeing in cockroach - multiple nodes are started in non-deterministic order by the test harness and all attempt to talk to each other, but now that GRPC tanks failfast RPCs even before the connection is ever established (after this change) we're seeing those races trigger start-up failures. Would it be possible for the transport to remain in Please correct me if I've misunderstood the behaviour. |
What you observed is correct. The first failfast RPC could fail because the connection is not ready yet. I will also make a change to skip the |
if len(targetSplitted) >= 2 { | ||
scheme = targetSplitted[0] | ||
} | ||
grpclog.Infof("dialing to target with scheme: %q", scheme) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new log line means that all existing code has grpc emit logs when it dials, and along with line 56 all existing code that dials with a host or ip and no specified scheme gets a second log line about the lack of resolver for ""
scheme.
This is a problem because grpc is used in CLI clients and having grpc log these internal, unimportant details makes the UX of these CLIs confusing; random logs are showing up, suggesting that there's connection errors when there aren't.
It would be good if these two log lines could be removed, or reduced to a debug level.
ClientConn
New implementations:
Fixes #1504