-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft: state.commit is out of range #5664
Comments
Seems like a bad assumption we made in raft library. |
can reordering the raft message in test reproduce this? |
@siddontang Probably. You need a full raft restart + receive an out of order message from the previous connection (or the sender holds it for a really long time). I do not expect this to happen a lot in practice. But, yes, we need to fix this. |
I have a node that won't start because of this error, I believe because the other members have a bad commit and therefore the node is failing with this error on recovery. Any suggestions on how to recover this node and/or the cluster? |
@marclennox Can you please provide the full startup log? |
|
@marclennox this looks like a different problem from the one in the issue (although it panics on the same path). I'll see if I can reproduce this behavior. In the meantime, I think the easiest fix (but not 100% sure this will work) is to delete the broken node's etcd data directory (back up the directory somewhere first just in case) so that the node can rebuild its raft state on joining the cluster. /cc @xiang90 |
Thanks @heyitsanthony. I've already tried deleting the data directory, same error when it tries to rebuild its raft state. |
@marclennox From the log, it looks like raft node lost its previous state somehow (snapshot file is broken/missing?) So your cluster is still running? The easiest to recovery is treat the node as a failed one. Then you can remove the bad member using etcd member API, and add it back. |
Thanks @xiang90 Now I'm getting the following error
|
@marclennox What is the version of your etcdserver? Have you cleaned up the data-dir entirely before rejoining? |
I'm running it using the published docker image |
Oh I see, so this is now version 3.0.0. |
I'll revert to 2.3.7 and see how that goes. |
@marclennox OK. Thanks! |
Yep, that fixed it. Thanks @xiang90 and @heyitsanthony for helping me work through the problem. :) Sorry for the noise. |
@heyitsanthony how do you solve the problem you said? |
@tbchj if the raft state is corrupted for a single node, then disaster recovery is usually the only way out of it, if possible. |
@heyitsanthony sorry, busy. |
Hi. Does anybody know the root reason of this problem like "panic: bda4ffc1bc48207d state.commit 472372997 is out of range [472308405, 472310039]"?? And in which version it has been fixed already ???? |
I does you said,remove broken files under wal and delete the latest snap file ,then restart etcd,it solves my problem,thank you. |
via local-tester with reordering:
The text was updated successfully, but these errors were encountered: