-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hanging server issue via gRPC #2231
Comments
Observing the logs I see a large number of pending RAFT entries when a server restarts. Can you please upgrade to
Are all the restarts manual? I have a hunch that since a snapshot isn't happening, memory grows until a container is getting killed by Docker and restarted. |
@pawanrawal Will do. I used dgraph:latest mistakenly and Docker was caching the 1.0.3 image. I am using the We are using the following command to load the RDF file: |
AHHH!!!
I assume this is good news? |
Snapshot not occurring could happen if there are pending transactions which were neither committed/aborted by the client. We added a mechanism on the server to abort such pending old transactions in |
Gotcha, makes sense.
When upgrading to 1.0.4, I am seeing that a lot. |
@pawanrawal I rebuilt my graph. I finally got the snapshotting to work. I waited a while and made several queries to the "leader" before starting up the second and third replica server and now it is copying snapshots between servers. My RDF file is close to 800MB. Is there a way to tell when a server is ready to be copied from? It seems if I start the other replicas too early, it doesn't copy the snapshot over at all and instead creates a new |
That sounds like a bug, I would expect it to copy over the snapshots to the follower if its already the leader. Feel free to file another issue mentioning this and we will get it checked and resolved. |
Filed another issue #2236 . Any progress on this issue? I updated our server to 64GB to try to prevent OOM from the snapshots not copying just in case this happens again for now. |
The snapshotting issue shouldn't occur in master. Try with The slowness could be due to the fact that Docker is trying to use swap. Do you have any limits on the memory used by the container in your
What is weird here is that applied watermark is less than txn watermark, which shouldn't happen. I couldn't reproduce it on the directory shared by you. If it still happens for you, can you share the steps to reproduce or directories from other servers as well? |
@emhagman /ping |
@pawanrawal I am getting this issue again even after the graphs synced properly via snapshots in the beginning. I have more data of a new instance if you want to read it. Logs example:
I upped our memory to 64GB and the server has reached 46GB now. At some point it's going to be unusable... I am now starting to get write timeouts or just writes that take forever. |
Closing due to no activity. Feel free to create a new issue if you find a related bug. |
Sometimes my node will fail without any error and become unresponsive via gRPC. Restarting seems to fix it. I have 3 replica nodes with one "zero" instance. When querying via Ratel using HTTP the query succeeds.
Logs here:
https://drive.google.com/file/d/1lYDTgMyATApiSVRV8rRsDwRLYNNg9tCW/view?usp=sharing
The text was updated successfully, but these errors were encountered: