-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client: backoff before reconnecting if an HTTP2 server preface was not received #1648
Conversation
9a60fe7
to
31151d2
Compare
clientconn.go
Outdated
@@ -114,6 +115,16 @@ func UseCompressor(name string) CallOption { | |||
}) | |||
} | |||
|
|||
// WithWaitForServerSettings makes the Dial wait until the client receives initial settings (server preface) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
With...Settings blocks until the initial settings frame is received from the
server before assigning RPCs to the connection.
Also, how about WithWaitForHandshake
? I think I'll be renaming the ServerOption
to HandshakeTimeout
to match Java/C.
clientconn.go
Outdated
// ready is closed and becomes nil when a new transport is up or failed | ||
// due to timeout. | ||
ready chan struct{} | ||
transport transport.ClientTransport | ||
|
||
// The reason this addrConn is torn down. | ||
tearDownErr error | ||
|
||
retries int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please give this a more specific name; I'm implementing a feature called "retry" too. 😄
clientconn.go
Outdated
// backoffDeadline is the time until which resetTransport needs to | ||
// wait before increasing retries count. | ||
backoffDeadline time.Time | ||
// connectDeadline is the time by which dial and tls handshake should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and also the server preface, right?
How about "connectDeadline is the time by which all connection negotiation must complete"?
clientconn.go
Outdated
// The created transport must receive initial settings frame from the server. | ||
// In case that doesnt happen, transportMonitor will kill the newly created | ||
// transport after connectDeadline has expired. | ||
// In case there was an error on the tranpsort before the settings frame was |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*transport
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
clientconn.go
Outdated
// transport after connectDeadline has expired. | ||
// In case there was an error on the tranpsort before the settings frame was | ||
// received, resetTransport resumes connecting to backends after the one that | ||
// is previously connected to. In case end of the list is reached, resetTransport |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is->was
clientconn.go
Outdated
newTransport, err := transport.NewClientTransport(connectCtx, ac.cc.ctx, sinfo, copts, func() { | ||
close(done) | ||
ac.mu.Lock() | ||
if !ac.backoffDeadline.IsZero() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be faster (and is simpler, code-wise) to just clear everything unconditionally.
clientconn.go
Outdated
ac.mu.Unlock() | ||
// Block until we receive a goaway or an error occurs. | ||
select { | ||
case <-t.GoAway(): | ||
case <-t.Error(): | ||
case <-cdeadline: | ||
timer.Stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timer fired; stopping it is unnecessary.
clientconn.go
Outdated
timer = nil | ||
// No server preface received until deadline. | ||
// Kill the connection. | ||
grpclog.Errorf("grpc: addrConn.transportMonitor didn't get server preface after waiting. Closing the new transport now.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warning?
resolver/manual/manual.go
Outdated
bootstrapAddrs []resolver.Address | ||
} | ||
|
||
// BootstrapWithAddrs adds resolved addresses to the resolver so that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"InitialAddrs"?
vet.sh
Outdated
@@ -64,7 +64,7 @@ trap cleanup EXIT | |||
git ls-files "*.go" | xargs sed -i 's:"golang.org/x/net/context":"context":' | |||
set +o pipefail | |||
# TODO: Stop filtering pb.go files once golang/protobuf#214 is fixed. | |||
go tool vet -all . 2>&1 | grep -vF '.pb.go:' | tee /dev/stderr | (! read) | |||
go tool vet -all . 2>&1 | grep -vE 'cancel (function|var)' | grep -vF '.pb.go:' | tee /dev/stderr | (! read) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are you filtering out with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a successful dial we can't call cancel on the context used for it's dialing because of Go 1.6 issue. But Go 1.9 vet catches that and throws an error with a message that this grep captures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used to do this only for the file it was in, IIRC. Can you make this grep for 'clientconn.go:.*cancel ........
?
4edbaca
to
21b0f5e
Compare
21b0f5e
to
f1f3cf6
Compare
clientconn.go
Outdated
// It returns true if a connection was established. | ||
func (ac *addrConn) createTransport(connectRetryNum int, backoffDeadline, connectDeadline time.Time, addrs []resolver.Address, prevAddr resolver.Address, copts transport.ConnectOptions) (bool, error) { | ||
startFrom := 0 | ||
for idx, addr := range addrs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe put this inside a if prevAddr != resolver.Address{}
for clarity?
Could prevAddr
be an int
instead to avoid the need for this search? If the address list is changed during a connection attempt, which is pretty unlikely, we could just restart from the top of the list and reset the backoff - what do you think?
// ac.tearDown(...) has been invoked. | ||
ac.mu.Unlock() | ||
return errConnClosing | ||
if ac.state != connectivity.Shutdown { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be TransientFailure
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the state was Shutdown
we don't want to put it to TransientFailure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. If we are here the only possibilities are Connecting
or Shutdown
.
clientconn.go
Outdated
case <-done: | ||
case <-connectCtx.Done(): | ||
// Didn't receive server preface, must kill this new | ||
// transport now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Unwrap please.
clientconn.go
Outdated
case <-connectCtx.Done(): | ||
// Didn't receive server preface, must kill this new | ||
// transport now. | ||
grpclog.Warningf("grpc: addrConn.createTransport will close the newly established transport since no server preface was received.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about: grpc: addrConn.createTransport failed to receive server preface before deadline
?
clientconn.go
Outdated
// transport now. | ||
grpclog.Warningf("grpc: addrConn.createTransport will close the newly established transport since no server preface was received.") | ||
newTr.Close() | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not return false
?
clientconn.go
Outdated
ac.cc.handleSubConnStateChange(ac.acbw, ac.state) | ||
if ac.transport != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete these lines; ac.transport
must be nil
.
select { | ||
case <-sent: | ||
default: | ||
t.Fatalf("Dial returned before server settings were sent") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this still a little racy? We write the settings immediately after closing sent. I think we should sleep for a little bit after closing sent, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dial waits until the settings are sent and the channel is closed before they are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see.. So it is still possible it returned before the settings were sent in this case, because we close sent before sending, but it's unlikely because of the sleep before sending.
vet.sh
Outdated
@@ -64,7 +64,7 @@ trap cleanup EXIT | |||
git ls-files "*.go" | xargs sed -i 's:"golang.org/x/net/context":"context":' | |||
set +o pipefail | |||
# TODO: Stop filtering pb.go files once golang/protobuf#214 is fixed. | |||
go tool vet -all . 2>&1 | grep -vF '.pb.go:' | tee /dev/stderr | (! read) | |||
go tool vet -all . 2>&1 | grep -vE 'cancel (function|var)' | grep -vF '.pb.go:' | tee /dev/stderr | (! read) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used to do this only for the file it was in, IIRC. Can you make this grep for 'clientconn.go:.*cancel ........
?
c77d7fc
to
57890cb
Compare
57890cb
to
c8bfe9a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 nit and 1 minor cleanup, otherwise LGTM.
// ac.tearDown(...) has been invoked. | ||
ac.mu.Unlock() | ||
return errConnClosing | ||
if ac.state != connectivity.Shutdown { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. If we are here the only possibilities are Connecting
or Shutdown
.
select { | ||
case <-sent: | ||
default: | ||
t.Fatalf("Dial returned before server settings were sent") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see.. So it is still possible it returned before the settings were sent in this case, because we close sent before sending, but it's unlikely because of the sleep before sending.
clientconn.go
Outdated
state connectivity.State | ||
mu sync.Mutex | ||
curAddr resolver.Address | ||
reconnectIdx int // The index in adder list to start reconnecting from. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: "addrs"
clientconn.go
Outdated
case <-cdeadline: | ||
var isConnected bool | ||
ac.mu.Lock() | ||
if ac.backoffDeadline.IsZero() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isConnected := ac.backoffDeadline.IsZero() // Client received server preface.
When security is disabled, not waiting for the HTTP/2 handshake can lead to DoS-style behavior. For details, see: grpc/grpc-go#954. This requirement will incur an extra half-RTT latency before the first RPC can be sent under plaintext, but this is negligible and unencrypted connections are rarer than secure ones. Under TLS, the server will effectively send its part of the HTTP/2 handshake along with its final TLS "server finished" message, which the client must wait for before transmitting any data securely. This means virtually no extra latency is incurred by this requirement. Go had attempted to separate "connection ready" with "connection successful" (Issue: grpc/grpc-go#1444 PR: grpc/grpc-go#1648). However, this is confusing to users and introduces an arbitrary distinction between these two events. It has led to several bugs in our reconnection logic (e.g.s grpc/grpc-go#2380, grpc/grpc-go#2391, grpc/grpc-go#2392), due to the complexity, and it makes custom transports (grpc/proposal#103) more difficult for users to implement. We are aware of some use cases (in particular, https://github.com/soheilhy/cmux) expecting the behavior of transmitting an RPC before the HTTP/2 handshake is completed. Before making behavior changes to implement this, we will reach out to our users to the best of our abilities.
fixes #1444