-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gRPC handler timeout #5295
Comments
So what I'm doing is that I have a Nestjs server using the graphql-mesh generated clients to call a gRPC endpoint.. Nestjs --> graphql-mesh client -1-> gRPC server. What I'm seeing is a |
Maybe you can share a reproduction on CodeSandbox or StackBlitz. |
@ardatan yeah I'm really trying to do that but the error seems to happen after some period of time (probably based upon some condition that is unknown to me now). I'm trying to lock down how to even reproduce the problem consistently in the first place. Am I right that the |
We use the official gRPC client for gRPC requests; |
@ardatan Here's what I've found... So what happens (not exactly sure when) is that the underlying http2 session gets cancelled at some point. Once that happens that connection/session is never going to work again (it's canceled) and a new connection/session would need to be established for the client in order to make new calls. Annoyingly this condition isn't a fail-fast type of deal either, so the So it looks like something is amiss in either how this package is using the grpc-js client (e.g. shouldn't memoize, not responding to a channel state change like CANCELLED appropriately, etc) or there's something internally off in the client itself. On the latter, I made sure I'm at the latest version of the grpc-js client (which as of now is 1.8.13). I'd love to be able to reproduce this consistently but don't have a ton of time to dig into this right now. Thoughts on this? |
So in the interim, I've changed the NestJS service class (which uses the api) to not be a singleton. In NestJS parlance that means changing the service to be REQUEST scoped rather than DEFAULT. I'm going to test that out now but my guess is that it'll work fine but maybe a little less efficiently. |
So even creating a new wrapper class on each request didn't help as the |
I see. Still I need a reproduction in order to help you :) |
@ardatan Can you check out the log I posted here to see what you can make of it? In particular it was unexpected to me that there were two client creations calls during the Edit (explanation of multiple Clients)So the graphql-mesh implementation creates a ServiceClient for each That makes sense to me.. maybe it's a problem with the 'retry' logic in the |
@ardatan So I was able to get to the bottom (theoretically) about the issue. It seems like we have an unstable server environment that's going up and down. My currently hypothesis is that mesh does not seem to be setting the 'deadline' in the
So I think this is ultimately the issue I'm seeing. |
I can send up a PR but I figured it out.. essentially gRPC's client constructor takes an additional argument which specifies a When you call the sdk you do so by passing a second object with a deadline present (a future Date).. e.g.
|
@ardatan So enabling deadlines definitely helped with individual request stability. So each request has a defined timeout and won't stay open for minutes (mine were staying open for 15 minutes). So even with that in place, the http2 session and connections do not clean themselves up when the deadline is hit. For connection cleanup to happen (more frequently), I've enabled grpc keepalives. The following patch has both things in there:
These are both simple patches and are not the ideal implementation for both a GraphiQL/web client vs using the typescript generated client. The KeepAlive and Deadline settings would ideally be configurable via the |
Currently when I'm using the generated SDK I set a timeout of 4000, so 4 seconds. What is happening though is that the connection seems to go into la la land for a full 10 minutes. The response from the sdk call doesn't come back until then.
I know this isn't enough to troubleshoot and I'm willing to step through it myself but when I tried this I had no idea which library was responsible for actually making the gRPC call (not when generating the graphql schema but when using it).
Anyone point me in the right direction?
The text was updated successfully, but these errors were encountered: