Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ping from client breaks connection with "too many pings" message. Possibly related to bug in gRPC v1.8.2 #2444

Closed
ospaarmann opened this issue Jun 15, 2018 · 11 comments
Labels
kind/question Something requiring a response.

Comments

@ospaarmann
Copy link
Contributor

ospaarmann commented Jun 15, 2018

Hey,

I am running

  • Dgraph v1.0.5
  • on MacOS 10.13.5, 16GB RAM
  • with the default docker-compose file

The connection of the Elixir client that I am developing at the moment (ExDgraph) kept breaking every ~ 20 seconds. Investigating this issue I found that modifying the http2 connect options of gun (the http2 client the grpc client is using for the connection) would get rid of this issue. More specifically setting the keepalive value to infinity and thereby disabling the ping that gun sends to Dgraph to keep the connection alive. The default value was 5,000 ms which caused a connection break every 20,000ms. Decreasing the keepalive value also decreased the time between connection crashes in the same ratio so this is a strong correlation for me.

The downside of this "fix" is that this disables pings entirely and so I might not know when a server is gone until the write buffers are full.

I suspect that this is a bug either in Dgraph and how it handles the ping message or in gun. I also opened an issue over in the gun repo.

The gun docs state about the ping:

Time between pings in milliseconds. Since the HTTP protocol has no standardized way to ping the server, Gun will simply send an empty line when the connection is idle. Gun only makes a best effort here as servers usually have configurable limits to drop idle connections. Use infinity to disable.

Any help with fixing this issue is greatly appreciated. If you need more information let me know.
Thanks

@ospaarmann
Copy link
Contributor Author

ospaarmann commented Jun 15, 2018

So I profiled the http2 client to see what is going on. Server is dropping the connection with {goaway,0,enhance_your_calm,<<"too_many_pings">>}.

What keepalive value / what time frame between pings is acceptable to prevent the connection from being dropped? At the moment I'm using 5,000ms and that seems to be too little.

Full trace can be found here: https://gist.github.com/ospaarmann/c592fa984f775810b2698bf0b4d82228

@ospaarmann
Copy link
Contributor Author

ospaarmann commented Jun 21, 2018

I increased the time between pings to 120,000ms (2 full minutes). The connection still drops. This seems like a bug to me since I cannot ping the server.

Since a keepalive value between the pings of 5,000ms causes a connection drop after 20,000ms I assume that Dgraph drops the connection on the fourth ping. So it can handle pings, it just tells you to get lost after number 4 😆

@ghost
Copy link

ghost commented Jun 21, 2018

Hey, why dont you try doing it with a Raw HTTP client...
https://docs.dgraph.io/clients/#raw-http

@ospaarmann
Copy link
Contributor Author

ospaarmann commented Jun 22, 2018

Thank you for your reply @kdsgambhir! But I don't really see your point. I am writing a gRPC client. gRPC supports pings / keepalive.

👉https://godoc.org/google.golang.org/grpc/keepalive

Using a different client to check over a different protocol if the server is alive is a worst case workaround for me. It would be much more complex on my side. I would need a special background process and would basically write a server monitor for something that is supported by the protocol itself. And it would also kind of defeat the purpose.

I think there is a bug here. More specifically in gRPC itself.

It was addressed in gRPC 1.8.4. And another potential fix for this issue was introduced in gRPC 1.10.1. See this issue: grpc/grpc-node#138

Dgraph still uses gRPC 1.8.2. See:

git checkout v1.8.2

I am more than happy to submit a PR to upgrade gRPC to 1.10.1 and solve this issue.

Thanks! ❤️

@ospaarmann ospaarmann changed the title Ping from client breaks connection Ping from client breaks connection with "too many pings" message. Jun 22, 2018
@ospaarmann ospaarmann changed the title Ping from client breaks connection with "too many pings" message. Ping from client breaks connection with "too many pings" message. Possibly related to gRPC bug in v1.8.2 Jun 22, 2018
@ospaarmann ospaarmann changed the title Ping from client breaks connection with "too many pings" message. Possibly related to gRPC bug in v1.8.2 Ping from client breaks connection with "too many pings" message. Possibly related to bug in gRPC v1.8.2 Jun 22, 2018
@manishrjain
Copy link
Contributor

Sure, I'd be happy to accept a PR to upgrade to latest Grpc. Though, I'd ask you to test that Dgraph works fine after the upgrade.

@ospaarmann
Copy link
Contributor Author

Sure @manishrjain. Any idea when v1.0.6 will be released on Docker Hub so that I can use it for testing? Thanks!

@manishrjain
Copy link
Contributor

This week is what I'm aiming for. Just ironing out some last outstanding bugs.

@ospaarmann
Copy link
Contributor Author

Just tested it with v1.0.6. Issue is still present.

What I find a bit strange: I could only find a specific version of grpc in the travis setup.sh. As far as I can see the rest of the code uses the latest package. But I am also new to Go so it is possible that I am mistaken. What do you think @manishrjain ?

I am happy to look into this and prepare a PR to fix it.

@manishrjain
Copy link
Contributor

I've upgraded Grpc to the latest release (see contrib/release.sh, Grpc version 1.13.0). If that still doesn't solve your issue, I'd recommend you to use the HTTP endpoints, which are designed specifically for languages which don't work well with Grpc.

@shaneakr shaneakr added the kind/bug Something is broken. label Jun 29, 2018
@MichelDiz
Copy link
Contributor

It seems to me that there is an option to solve this, according to this code.
GRPC_ARG_HTTP2_MAX_PINGS_WITHOUT_DATA
https://github.com/grpc/grpc/blob/e121fef19efdac51c20a4d0dab352b7ada387c72/include/grpc/impl/codegen/grpc_types.h#L210

/** How many misbehaving pings the server can bear before sending goaway and
closing the transport? (0 indicates that the server can bear an infinite
number of misbehaving pings) */

@manishrjain manishrjain added kind/question Something requiring a response. and removed kind/bug Something is broken. labels Jul 6, 2018
@manishrjain
Copy link
Contributor

Closing due to no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Something requiring a response.
Development

No branches or pull requests

4 participants