Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
util/tracing: fix crash in StreamClientInterceptor
Before this patch, our client-side tracing interceptor for streaming rpc calls was exposed to gRPC bugs being currently fixed in github.com/grpc/grpc-go/pull/5323. This had to do with calls to clientStream.Context() panicking with an NPE under certain races with RPCs failing. We've recently gotten two crashes seemingly because of this. It's unclear why this hasn't shown up before, as nothing seems new (either on our side or on the grpc side). In 22.2 we do use more streaming RPCs than before (for example for span configs), so maybe that does it. This patch eliminates the problem by eliminating the problematic call into ClientStream.Context(). The background is that our interceptors needs to watch for ctx cancelation and consider the RPC done at that point. But, there was no reason for that call; we can more simply use the RPC caller's ctx for the purposes of figuring out if the caller cancels the RPC. In fact, calling ClientStream.Context() is bad for other reasons, beyond exposing us to the bug: 1) ClientStream.Context() pins the RPC attempt to a lower-level connection, and inhibits gRPC's ability to sometimes transparently retry failed calls. In fact, there's a comment on ClientStream.Context() that tells you not to call it before using the stream through other methods like Recv(), which imply that the RPC is already "pinned" and transparent retries are no longer possible anyway. We were breaking this. 2) One of the grpc-go maintainers suggested that, due to the bugs reference above, this call could actually sometimes give us "the wrong context", although how wrong exactly is not clear to me (i.e. could we have gotten a ctx that doesn't inherit from the caller's ctx? Or a ctx that's canceled independently from the caller?) Release note: A rare crash indicating a nil-pointer deference in google.golang.org/grpc/internal/transport.(*Stream).Context(...) was fixed.
- Loading branch information