client: fix race between transport draining and new RPCs #2919

dfawley · 2019-07-19T22:16:55Z

Before these fixes, it was possible to see errors on new RPCs after a
connection began draining, and before establishing a new connection. There is
an inherent race between choosing a SubConn and attempting to creating a stream
on it. We should be able to avoid application-visible RPC errors due to this
with transparent retry. However, several bugs were preventing this from
working correctly:

Non-wait-for-ready RPCs were skipping transparent retry, though the retry
design calls for retrying them.
The transport closed itself (and would consequently error new RPCs) before
notifying the SubConn that it was draining.
The SubConn wasn't synchronously updating itself once it was notified about
the closing or draining state.
The SubConn would go into the TRANSIENT_FAILURE state instantaneously,
causing RPCs to fail instead of queue.

Before these fixes, it was possible to see errors on new RPCs after a connection began draining, and before establishing a new connection. There is an inherent race between choosing a SubConn and attempting to creating a stream on it. We should be able to avoid application-visible RPC errors due to this with transparent retry. However, several bugs were preventing this from working correctly: 1. Non-wait-for-ready RPCs were skipping transparent retry, though the retry design calls for retrying them. 2. The transport closed itself (and would consequently error new RPCs) before notifying the SubConn that it was draining. 3. The SubConn wasn't synchronously updating itself once it was notified about the closing or draining state. 4. The SubConn would go into the TRANSIENT_FAILURE state instantaneously, causing RPCs to fail instead of queue.

test/goaway_test.go

gyuho · 2019-07-23T19:53:23Z

@dfawley Can we confirm 1.23 release date? https://github.com/grpc/grpc-go/milestone/21 says August 13. etcd is using 1.22. Would like to use 1.23 with this fix for our 3.4 release. Thanks!

/cc @jpbetz

menghanl · 2019-07-23T20:28:56Z

@gyuho The milestone has the right date. Release 1.23 is scheduled for August 13.

gyuho · 2019-07-23T20:30:55Z

@menghanl Thanks for the confirmation!

dfawley added the Type: Bug label Jul 19, 2019

dfawley added this to the 1.23 Release milestone Jul 19, 2019

dfawley requested a review from menghanl July 19, 2019 22:16

dfawley assigned menghanl Jul 19, 2019

menghanl approved these changes Jul 22, 2019

View reviewed changes

test/goaway_test.go Outdated Show resolved Hide resolved

"how do they work?"

9e06ffc

dfawley merged commit 9771422 into grpc:master Jul 22, 2019

dfawley deleted the pickerrace branch July 22, 2019 23:07

dfawley mentioned this pull request Jul 22, 2019

Unavailable error when MaxConnectionAge and MaxConnectionAgeGrace is enabled on the server #2767

Closed

lock bot locked as resolved and limited conversation to collaborators Jan 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

client: fix race between transport draining and new RPCs #2919

client: fix race between transport draining and new RPCs #2919

dfawley commented Jul 19, 2019 •

edited

Loading

gyuho commented Jul 23, 2019

menghanl commented Jul 23, 2019

gyuho commented Jul 23, 2019

client: fix race between transport draining and new RPCs #2919

client: fix race between transport draining and new RPCs #2919

Conversation

dfawley commented Jul 19, 2019 • edited Loading

gyuho commented Jul 23, 2019

menghanl commented Jul 23, 2019

gyuho commented Jul 23, 2019

dfawley commented Jul 19, 2019 •

edited

Loading