-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal: fix GO_AWAY deadlock #2391
Conversation
e5dad73
to
209d62c
Compare
test/end2end_test.go
Outdated
s1.Stop() | ||
|
||
// Wait for client to close. | ||
time.Sleep(100 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this work instead? <-stream.Context().Done()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By stream are you suggesting, stream, err := client.FullDuplexCall(ctx)
?
If so, it does not appear to ever finish.
edit: Is there something you're supposed to do to the stream other than server.Stop() to cause it to unblock?
209d62c
to
258cbc5
Compare
A deadlock can occur when a GO_AWAY is followed by a connection closure. This happens because onClose needlessly closes the current ac.transport: if a GO_AWAY already occured, and the transport was already reset, then the later closure (of the original address) sets ac.transport - which is now healthy - to nil. The manifestation of this problem is that picker_wrapper spins forever trying to use a READY connection whose ac.transport is nil.
258cbc5
to
eb547cf
Compare
PTAL |
test/end2end_test.go
Outdated
@@ -7055,7 +7056,8 @@ func TestGoAwayThenClose(t *testing.T) { | |||
|
|||
client := testpb.NewTestServiceClient(cc) | |||
|
|||
// Should go on connection 1. | |||
// Should go on connection 1. We use a long-lived RPC because it will cause GracefulStop to send GO_AWAY, but the | |||
// connection doesn't get closed until the server stops and the client receives. | |||
stream, err := client.FullDuplexCall(ctx) | |||
if err != nil { | |||
t.Fatalf("UnaryCall(_) = _, %v; want _, nil", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*FullDuplexCall
bafeee3
to
30127c1
Compare
30127c1
to
981c5e4
Compare
When security is disabled, not waiting for the HTTP/2 handshake can lead to DoS-style behavior. For details, see: grpc/grpc-go#954. This requirement will incur an extra half-RTT latency before the first RPC can be sent under plaintext, but this is negligible and unencrypted connections are rarer than secure ones. Under TLS, the server will effectively send its part of the HTTP/2 handshake along with its final TLS "server finished" message, which the client must wait for before transmitting any data securely. This means virtually no extra latency is incurred by this requirement. Go had attempted to separate "connection ready" with "connection successful" (Issue: grpc/grpc-go#1444 PR: grpc/grpc-go#1648). However, this is confusing to users and introduces an arbitrary distinction between these two events. It has led to several bugs in our reconnection logic (e.g.s grpc/grpc-go#2380, grpc/grpc-go#2391, grpc/grpc-go#2392), due to the complexity, and it makes custom transports (grpc/proposal#103) more difficult for users to implement. We are aware of some use cases (in particular, https://github.com/soheilhy/cmux) expecting the behavior of transmitting an RPC before the HTTP/2 handshake is completed. Before making behavior changes to implement this, we will reach out to our users to the best of our abilities.
A deadlock can occur when a GO_AWAY is followed by a connection closure. This
happens because onClose needlessly closes the current ac.transport: if a
GO_AWAY already occured, and the transport was already reset, then the later
closure (of the original address) sets ac.transport - which is now healthy -
to nil.
The manifestation of this problem is that picker_wrapper spins forever trying
to use a READY connection whose ac.transport is nil.