-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ping from client breaks connection with "too many pings" message. Possibly related to bug in gRPC v1.8.2 #2444
Comments
So I profiled the http2 client to see what is going on. Server is dropping the connection with What keepalive value / what time frame between pings is acceptable to prevent the connection from being dropped? At the moment I'm using 5,000ms and that seems to be too little. Full trace can be found here: https://gist.github.com/ospaarmann/c592fa984f775810b2698bf0b4d82228 |
I increased the time between pings to 120,000ms (2 full minutes). The connection still drops. This seems like a bug to me since I cannot ping the server. Since a keepalive value between the pings of 5,000ms causes a connection drop after 20,000ms I assume that Dgraph drops the connection on the fourth ping. So it can handle pings, it just tells you to get lost after number 4 😆 |
Hey, why dont you try doing it with a Raw HTTP client... |
Thank you for your reply @kdsgambhir! But I don't really see your point. I am writing a gRPC client. gRPC supports pings / keepalive. 👉https://godoc.org/google.golang.org/grpc/keepalive Using a different client to check over a different protocol if the server is alive is a worst case workaround for me. It would be much more complex on my side. I would need a special background process and would basically write a server monitor for something that is supported by the protocol itself. And it would also kind of defeat the purpose. I think there is a bug here. More specifically in gRPC itself. It was addressed in gRPC 1.8.4. And another potential fix for this issue was introduced in gRPC 1.10.1. See this issue: grpc/grpc-node#138 Dgraph still uses gRPC 1.8.2. See: dgraph/contrib/scripts/install.sh Line 16 in 616e1c1
I am more than happy to submit a PR to upgrade gRPC to 1.10.1 and solve this issue. Thanks! ❤️ |
Sure, I'd be happy to accept a PR to upgrade to latest Grpc. Though, I'd ask you to test that Dgraph works fine after the upgrade. |
Sure @manishrjain. Any idea when v1.0.6 will be released on Docker Hub so that I can use it for testing? Thanks! |
This week is what I'm aiming for. Just ironing out some last outstanding bugs. |
Just tested it with v1.0.6. Issue is still present. What I find a bit strange: I could only find a specific version of grpc in the travis setup.sh. As far as I can see the rest of the code uses the latest package. But I am also new to Go so it is possible that I am mistaken. What do you think @manishrjain ? I am happy to look into this and prepare a PR to fix it. |
I've upgraded Grpc to the latest release (see contrib/release.sh, Grpc version 1.13.0). If that still doesn't solve your issue, I'd recommend you to use the HTTP endpoints, which are designed specifically for languages which don't work well with Grpc. |
It seems to me that there is an option to solve this, according to this code. /** How many misbehaving pings the server can bear before sending goaway and |
Closing due to no activity. |
Hey,
I am running
The connection of the Elixir client that I am developing at the moment (ExDgraph) kept breaking every ~ 20 seconds. Investigating this issue I found that modifying the http2 connect options of gun (the http2 client the grpc client is using for the connection) would get rid of this issue. More specifically setting the
keepalive
value toinfinity
and thereby disabling the ping that gun sends to Dgraph to keep the connection alive. The default value was 5,000 ms which caused a connection break every 20,000ms. Decreasing thekeepalive
value also decreased the time between connection crashes in the same ratio so this is a strong correlation for me.The downside of this "fix" is that this disables pings entirely and so I might not know when a server is gone until the write buffers are full.
I suspect that this is a bug either in Dgraph and how it handles the ping message or in gun. I also opened an issue over in the gun repo.
The gun docs state about the ping:
Any help with fixing this issue is greatly appreciated. If you need more information let me know.
Thanks
The text was updated successfully, but these errors were encountered: