-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dgraph cluster stuck #2054
Comments
Hey @Levatius I am investigating this. Do you get any errors from your queries or mutations while this is happening or they don't return any results? |
They don’t return any results, the query just run forever.
On Tue, 30 Jan 2018 at 05:31, Pawan Rawal ***@***.***> wrote:
Hey @Levatius <https://github.com/levatius>
I am investigating this. Do you get any errors from your queries or
mutations while this is happening or they don't return any results?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2054 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACKSZuO3Aple0BuXd2JjGywwyE77NZArks5tPqkdgaJpZM4Ruk7N>
.
--
Thanks,
Vlad
|
Hi @pawanrawal - We still have this issue, where Zero works fine and then all of a sudden it's trying to do this and gets completely stuck. We can't read or write form the server when it gets to this stage.
|
I am going to work on this today. Is it possible for you to move to |
Yep, this is now happening with 1.0.3 now. Let us know if you need any more
information to pinpoint this.
This is running in a clustered set up on kubernetes with 3 servers and 1
zero.
On Tue, 13 Feb 2018 at 20:10, Pawan Rawal ***@***.***> wrote:
I am going to work on this today. Is it possible for you to move to v1.0.3
if not done already?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2054 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACKSZkOxGht9rhCr58Q7VWp3JjBrb6_cks5tUewogaJpZM4Ruk7N>
.
--
Thanks,
Vlad
|
Did you start Zero with |
No replicas - all servers run different groups. Nothing weird in the server
logs, they all just stop and don't do anything any more.
On Tue, 13 Feb 2018 at 22:47, Pawan Rawal ***@***.***> wrote:
Did you start Zero with replicas flag? Or are all the Dgraph servers
serving different groups? And from what I understand, once a predicate move
fails you are not able to run any queries/mutations on any of the Dgraph
servers? Is there anything unusual in the Dgraph server logs?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2054 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACKSZkEXb0GMy7cWzpzshohdPkDZsa4zks5tUhDmgaJpZM4Ruk7N>
.
--
Thanks,
Vlad
|
This is a bit weird. I simulated conflict error but the subsequent predicate move after that went smoothly. Do you have any logs like the below on Dgraph servers when the predicate move is happening?
|
One more thing, can you please share the output of the |
Hi, sorry for the late reply. The server is stuck again now (in the 3 node Kubernetes setup - see #2172). Output form state below:
|
Cross posting from issue #2172
Did the cluster get stuck when one of the servers was down? |
Hi,
Yes, it’s to do with the error/crash raised in the other issue. When one
server crashes, it doesn’t become a leader when Kubernetes respawns the
pod. We’ll create a new issue to track this, but the error and behaviour is
already explained in the “Unbalanced load” issue.
On Wed, 14 Mar 2018 at 06:11, Pawan Rawal ***@***.***> wrote:
Cross posting from issue #2172
<#2172>
Since you have three servers all serving different groups, if one of them
goes down then all mutations which touch the predicates on that server will
get stuck. What we have to see is that why could the server not become a
leader after a restart. You could also mitigate this problem by having
replicas.
Did the cluster get stuck when one of the servers was down?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2054 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACKSZiSqifnhcgUnDBQva9nGUX_5an5mks5teLR2gaJpZM4Ruk7N>
.
--
Thanks,
Vlad
|
I am keen to solve this issue, it would be great if you guys could share logs from a recent run. |
Raised a follow-up issue (#2254), probably can close this one and move to the new one. |
Hi there,
I have a question regarding my dgraph cluster (v1.0.2) seeming to freeze (it is unable to process any more operations; queries, mutations, commits).
In my cluster, I have three server instances running off a single zero. This is the log from one the servers, however similar logs can be found on the other two as well:
The zero logs:
It seems that the zero is having trouble performing operations between servers while there are transactions in progress on the servers?
Issues:
The text was updated successfully, but these errors were encountered: