raft: state.commit is out of range #5664

heyitsanthony · 2016-06-14T02:55:00Z

via local-tester with reordering:

2016-06-13 17:47:28.006281 I | etcdmain: etcd Version: 3.0.0-beta.0+git
2016-06-13 17:47:28.006332 I | etcdmain: Git SHA: 65e19a1
2016-06-13 17:47:28.006337 I | etcdmain: Go Version: go1.6
2016-06-13 17:47:28.006348 I | etcdmain: Go OS/Arch: linux/amd64
2016-06-13 17:47:28.006353 I | etcdmain: setting maximum number of CPUs to 8, total number of available CPUs is 8
2016-06-13 17:47:28.006359 W | etcdmain: no data-dir provided, using default data-dir ./infra1.etcd
2016-06-13 17:47:28.006391 N | etcdmain: the server is already initialized as member before, starting as etcd member... 
2016-06-13 17:47:28.006451 I | etcdmain: listening for peers on http://127.0.0.1:12380
2016-06-13 17:47:28.006474 I | etcdmain: listening for client requests on 127.0.0.1:11119 
2016-06-13 17:47:28.013070 I | etcdserver: recovered store from snapshot at index 38626 
2016-06-13 17:47:28.013081 I | etcdserver: name = infra1
2016-06-13 17:47:28.013087 I | etcdserver: data dir = infra1.etcd 
2016-06-13 17:47:28.013093 I | etcdserver: member dir = infra1.etcd/member
2016-06-13 17:47:28.013098 I | etcdserver: heartbeat = 100ms
2016-06-13 17:47:28.013104 I | etcdserver: election = 1000ms
2016-06-13 17:47:28.013109 I | etcdserver: snapshot count = 1000
2016-06-13 17:47:28.013118 I | etcdserver: advertise client URLs = http://127.0.0.1:2379
2016-06-13 17:47:28.065753 I | etcdserver: restarting member 5da0b1f0ade347d1 in cluster ea3db81f3897e3ad at commit index 38040
2016-06-13 17:47:28.065854 C | raft: 5da0b1f0ade347d1 state.commit 38040 is out of range [38626, 38626]
panic: 
5da0b1f0ade347d1 state.commit 38040 is out of range [38626, 38626]
goroutine 1 [running]:
panic(0xcd2fa0, 0xc8205ff170) 
        /usr/lib/go/src/runtime/panic.go:464 +0x3e6
github.com/coreos/etcd/cmd/vendor/github.com/coreos/pkg/capnslog.(*PackageLogger).Panicf(0xc82010d100, 0x119e200, 0x2b, 0xc8204a5d40, 0x4, 0x4)
        /home/anthony/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/pkg/capnslog/pkg_logger.go:73 +0x191
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft.(*raft).loadState(0xc820204340, 0x67a, 0xd2643f51f16cc22b, 0x9498, 0x0, 0x0, 0x0)
        /home/anthony/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft/raft.go:942 +0x2a2
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft.newRaft(0xc8201f7b30, 0x451b20)
        /home/anthony/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft/raft.go:225 +0x8ff
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft.RestartNode(0xc8201f7b30, 0x0, 0x0)
        /home/anthony/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft/node.go:212 +0x45
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.restartNode(0xc8202163c0, 0xc820208990, 0x29, 0xc8201f7f68, 0x0, 0x0, 0x0, 0x0)
        /home/anthony/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/raft.go:361 +0x7c7
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.NewServer(0xc8202163c0, 0x0, 0x0, 0x0)
        /home/anthony/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/server.go:350 +0x3cf2
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdmain.startEtcd(0xc820160800, 0x0, 0x0, 0x0)
        /home/anthony/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdmain/etcd.go:366 +0x23ea
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdmain.startEtcdOrProxyV2()
        /home/anthony/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdmain/etcd.go:116 +0x213d
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdmain.Main()
        /home/anthony/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdmain/main.go:36 +0x21e
main.main()
        /home/anthony/go/src/github.com/coreos/etcd/cmd/main.go:28 +0x14
Terminating etcd1

The text was updated successfully, but these errors were encountered:

xiang90 · 2016-06-15T01:10:10Z

Seems like a bad assumption we made in raft library.

siddontang · 2016-06-15T01:44:45Z

can reordering the raft message in test reproduce this?
@xiang90

xiang90 · 2016-06-15T01:46:56Z

@siddontang Probably. You need a full raft restart + receive an out of order message from the previous connection (or the sender holds it for a really long time). I do not expect this to happen a lot in practice. But, yes, we need to fix this.

marclennox · 2016-06-30T23:32:22Z

I have a node that won't start because of this error, I believe because the other members have a bad commit and therefore the node is failing with this error on recovery.

Any suggestions on how to recover this node and/or the cluster?

xiang90 · 2016-06-30T23:44:35Z

@marclennox Can you please provide the full startup log?

marclennox · 2016-07-01T00:09:58Z

Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.989579 I | flags: recognized and used environment variable ETCD_ADVERTISE_CLIENT_URLS=https://jrprd-db01.justreply.co:2379
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.989739 I | flags: recognized and used environment variable ETCD_CERT_FILE=/parasite-config/conf/etcd/client-cert.pem
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.989765 I | flags: recognized and used environment variable ETCD_CLIENT_CERT_AUTH=1
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.989788 I | flags: recognized and used environment variable ETCD_DATA_DIR=/parasite-data/etcd
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.989818 I | flags: recognized and used environment variable ETCD_ELECTION_TIMEOUT=5000
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.989839 I | flags: recognized and used environment variable ETCD_HEARTBEAT_INTERVAL=1000
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.989864 I | flags: recognized and used environment variable ETCD_INITIAL_ADVERTISE_PEER_URLS=https://jrprd-db01.justreply.co:2380
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.989973 I | flags: recognized and used environment variable ETCD_INITIAL_CLUSTER=jrprd-db01=https://jrprd-db01.justreply.co:2380,jrprd-web01=https://jrprd-web01.justreply.co:2380,jrstg-db01=https://jrstg-db01.justreply.co:2380,jrstg-web01=https://jrstg-web01.justreply.co:2380,jrprd-ops01=https://jrprd-ops01.justreply.co:2380
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.989996 I | flags: recognized and used environment variable ETCD_INITIAL_CLUSTER_STATE=existing
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.990013 I | flags: recognized and used environment variable ETCD_KEY_FILE=/parasite-config/conf/etcd/client-key.pem
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.990044 I | flags: recognized and used environment variable ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.990062 I | flags: recognized and used environment variable ETCD_LISTEN_PEER_URLS=https://0.0.0.0:2380
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.990084 I | flags: recognized and used environment variable ETCD_NAME=jrprd-db01
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.990138 I | flags: recognized and used environment variable ETCD_PEER_CERT_FILE=/parasite-config/conf/etcd/peer-cert.pem
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.990176 I | flags: recognized and used environment variable ETCD_PEER_CLIENT_CERT_AUTH=1
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.990195 I | flags: recognized and used environment variable ETCD_PEER_KEY_FILE=/parasite-config/conf/etcd/peer-key.pem
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.990209 I | flags: recognized and used environment variable ETCD_PEER_TRUSTED_CA_FILE=/parasite-config/conf/etcd/peer-ca.pem
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.990241 I | flags: recognized and used environment variable ETCD_TRUSTED_CA_FILE=/parasite-config/conf/etcd/client-ca.pem
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.990407 I | etcdmain: etcd Version: 3.0.0
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.990425 I | etcdmain: Git SHA: 6f48bda
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.990436 I | etcdmain: Go Version: go1.6.2
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.990447 I | etcdmain: Go OS/Arch: linux/amd64
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.990459 I | etcdmain: setting maximum number of CPUs to 2, total number of available CPUs is 2
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.990560 N | etcdmain: the server is already initialized as member before, starting as etcd member...
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.990609 I | etcdmain: peerTLS: cert = /parasite-config/conf/etcd/peer-cert.pem, key = /parasite-config/conf/etcd/peer-key.pem, ca = , trusted-ca = /parasite-config/conf/etcd/peer-ca.pem, client-cert-auth = true
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.992868 I | etcdmain: listening for peers on https://0.0.0.0:2380
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.992935 I | etcdmain: clientTLS: cert = /parasite-config/conf/etcd/client-cert.pem, key = /parasite-config/conf/etcd/client-key.pem, ca = , trusted-ca = /parasite-config/conf/etcd/client-ca.pem, client-cert-auth = true
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.995093 I | etcdmain: listening for client requests on 0.0.0.0:2379
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.998423 I | etcdserver: name = jrprd-db01
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.998480 I | etcdserver: data dir = /parasite-data/etcd
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.998498 I | etcdserver: member dir = /parasite-data/etcd/member
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.998510 I | etcdserver: heartbeat = 1000ms
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.998521 I | etcdserver: election = 5000ms
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.998533 I | etcdserver: snapshot count = 10000
Jun 30 23:32:15 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:15.998549 I | etcdserver: advertise client URLs = https://jrprd-db01.justreply.co:2379
Jun 30 23:32:16 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:16.000775 I | etcdserver: restarting member 39246a319e218d4a in cluster 7b622c05bd899518 at commit index 51644801
Jun 30 23:32:16 jrprd-db01.justreply.co docker[25549]: 2016-06-30 23:32:16.001104 C | raft: 39246a319e218d4a state.commit 51644801 is out of range [0, 0]
Jun 30 23:32:16 jrprd-db01.justreply.co docker[25549]: panic: 39246a319e218d4a state.commit 51644801 is out of range [0, 0]
Jun 30 23:32:16 jrprd-db01.justreply.co docker[25549]: goroutine 1 [running]:
Jun 30 23:32:16 jrprd-db01.justreply.co docker[25549]: panic(0xd44e00, 0xc82012e430)
Jun 30 23:32:16 jrprd-db01.justreply.co docker[25549]: /usr/local/go/src/runtime/panic.go:481 +0x3e6
Jun 30 23:32:16 jrprd-db01.justreply.co docker[25549]: github.com/coreos/etcd/cmd/vendor/github.com/coreos/pkg/capnslog.(*PackageLogger).Panicf(0xc8201ba380, 0x1235f80, 0x2b, 0xc820136600, 0x4, 0x4)
Jun 30 23:32:16 jrprd-db01.justreply.co docker[25549]: /home/gyuho/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/pkg/capnslog/pkg_logger.go:75 +0x191
Jun 30 23:32:16 jrprd-db01.justreply.co docker[25549]: github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft.(*raft).loadState(0xc8204ec0d0, 0x856, 0x0, 0x3140981, 0x0, 0x0, 0x0)
Jun 30 23:32:16 jrprd-db01.justreply.co docker[25549]: /home/gyuho/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft/raft.go:942 +0x2a2
Jun 30 23:32:16 jrprd-db01.justreply.co docker[25549]: github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft.newRaft(0xc820157a88, 0x0)
Jun 30 23:32:16 jrprd-db01.justreply.co docker[25549]: /home/gyuho/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft/raft.go:225 +0x8ff
Jun 30 23:32:16 jrprd-db01.justreply.co docker[25549]: github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft.RestartNode(0xc820157a88, 0x0, 0x0)
Jun 30 23:32:16 jrprd-db01.justreply.co docker[25549]: /home/gyuho/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft/node.go:213 +0x45
Jun 30 23:32:16 jrprd-db01.justreply.co docker[25549]: github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.restartNode(0xc820299680, 0x0, 0x7f22acf5b028, 0xc82019e730, 0x0, 0x0, 0xc82001611a, 0x26)
Jun 30 23:32:16 jrprd-db01.justreply.co docker[25549]: /home/gyuho/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/raft.go:369 +0x7c7
Jun 30 23:32:16 jrprd-db01.justreply.co docker[25549]: github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.NewServer(0xc820299680, 0x0, 0x7f22acf5b028, 0xc82019e730)
Jun 30 23:32:16 jrprd-db01.justreply.co docker[25549]: /home/gyuho/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/server.go:353 +0x411d
Jun 30 23:32:16 jrprd-db01.justreply.co docker[25549]: github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdmain.startEtcd(0xc8201cc400, 0x0, 0x0, 0x0)
Jun 30 23:32:16 jrprd-db01.justreply.co docker[25549]: /home/gyuho/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdmain/etcd.go:366 +0x23ea
Jun 30 23:32:16 jrprd-db01.justreply.co docker[25549]: github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdmain.startEtcdOrProxyV2()
Jun 30 23:32:16 jrprd-db01.justreply.co docker[25549]: /home/gyuho/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdmain/etcd.go:116 +0x213d
Jun 30 23:32:16 jrprd-db01.justreply.co docker[25549]: github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdmain.Main()
Jun 30 23:32:16 jrprd-db01.justreply.co docker[25549]: /home/gyuho/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdmain/main.go:36 +0x21e
Jun 30 23:32:16 jrprd-db01.justreply.co docker[25549]: main.main()
Jun 30 23:32:16 jrprd-db01.justreply.co docker[25549]: /home/gyuho/go/src/github.com/coreos/etcd/cmd/main.go:28 +0x14

heyitsanthony · 2016-07-01T01:45:21Z

@marclennox this looks like a different problem from the one in the issue (although it panics on the same path). I'll see if I can reproduce this behavior.

In the meantime, I think the easiest fix (but not 100% sure this will work) is to delete the broken node's etcd data directory (back up the directory somewhere first just in case) so that the node can rebuild its raft state on joining the cluster. /cc @xiang90

marclennox · 2016-07-01T02:12:36Z

Thanks @heyitsanthony. I've already tried deleting the data directory, same error when it tries to rebuild its raft state.

xiang90 · 2016-07-01T02:44:54Z

@marclennox From the log, it looks like raft node lost its previous state somehow (snapshot file is broken/missing?)

So your cluster is still running? The easiest to recovery is treat the node as a failed one. Then you can remove the bad member using etcd member API, and add it back.

Check: https://github.com/coreos/etcd/blob/release-2.3/Documentation/runtime-configuration.md#replace-a-failed-machine

marclennox · 2016-07-01T03:18:07Z

Thanks @xiang90

Now I'm getting the following error

Jul 01 03:16:03 jrprd-db01.justreply.co docker[46726]: 2016-07-01 03:16:03.322319 I | raft: db49ed2084c203b4 [commit: 0, lastindex: 0, lastterm: 0] starts to restore snapshot [index: 51674804, term: 2134]
Jul 01 03:16:03 jrprd-db01.justreply.co docker[46726]: 2016-07-01 03:16:03.322375 I | raft: log [committed=0, applied=0, unstable.offset=1, len(unstable.Entries)=0] starts to restore snapshot [index: 51674804, term: 2134]
Jul 01 03:16:03 jrprd-db01.justreply.co docker[46726]: 2016-07-01 03:16:03.322399 I | raft: db49ed2084c203b4 restored progress of 39246a319e218d4a [next = 51674805, match = 0, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
Jul 01 03:16:03 jrprd-db01.justreply.co docker[46726]: 2016-07-01 03:16:03.322423 I | raft: db49ed2084c203b4 restored progress of 4bb64a6466927376 [next = 51674805, match = 0, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
Jul 01 03:16:03 jrprd-db01.justreply.co docker[46726]: 2016-07-01 03:16:03.322438 I | raft: db49ed2084c203b4 restored progress of 77d8cbcb900dd306 [next = 51674805, match = 0, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
Jul 01 03:16:03 jrprd-db01.justreply.co docker[46726]: 2016-07-01 03:16:03.322452 I | raft: db49ed2084c203b4 restored progress of aee0eec5f4946784 [next = 51674805, match = 0, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
Jul 01 03:16:03 jrprd-db01.justreply.co docker[46726]: 2016-07-01 03:16:03.322466 I | raft: db49ed2084c203b4 restored progress of f48c6b505d0dc072 [next = 51674805, match = 0, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
Jul 01 03:16:03 jrprd-db01.justreply.co docker[46726]: 2016-07-01 03:16:03.322899 I | raft: db49ed2084c203b4 [commit: 51674804] restored snapshot [index: 51674804, term: 2134]
Jul 01 03:16:03 jrprd-db01.justreply.co docker[46726]: 2016-07-01 03:16:03.341853 I | etcdserver: applying snapshot at index 0...
Jul 01 03:16:03 jrprd-db01.justreply.co docker[46726]: 2016-07-01 03:16:03.347874 C | etcdserver: get database snapshot file path error: snap: snapshot file doesn't exist
Jul 01 03:16:03 jrprd-db01.justreply.co docker[46726]: 2016-07-01 03:16:03.347889 I | etcdserver: finished applying incoming snapshot at index 0
Jul 01 03:16:03 jrprd-db01.justreply.co docker[46726]: panic: get database snapshot file path error: snap: snapshot file doesn't exist
Jul 01 03:16:03 jrprd-db01.justreply.co docker[46726]: goroutine 192 [running]:
Jul 01 03:16:03 jrprd-db01.justreply.co docker[46726]: panic(0xd44e00, 0xc822ece9b0)
Jul 01 03:16:03 jrprd-db01.justreply.co docker[46726]: /usr/local/go/src/runtime/panic.go:481 +0x3e6
Jul 01 03:16:03 jrprd-db01.justreply.co docker[46726]: github.com/coreos/etcd/cmd/vendor/github.com/coreos/pkg/capnslog.(*PackageLogger).Panicf(0xc820135980, 0x125b5c0, 0x29, 0xc8204d16b8, 0x1, 0x1)
Jul 01 03:16:03 jrprd-db01.justreply.co docker[46726]: /home/gyuho/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/pkg/capnslog/pkg_logger.go:75 +0x191
Jul 01 03:16:03 jrprd-db01.justreply.co docker[46726]: github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.(*EtcdServer).applySnapshot(0xc820365200, 0xc8203e2d80, 0xc8216f1ce0)
Jul 01 03:16:03 jrprd-db01.justreply.co docker[46726]: /home/gyuho/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/server.go:650 +0x5a1
Jul 01 03:16:03 jrprd-db01.justreply.co docker[46726]: github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.(*EtcdServer).applyAll(0xc820365200, 0xc8203e2d80, 0xc8216f1ce0)
Jul 01 03:16:03 jrprd-db01.justreply.co docker[46726]: /home/gyuho/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/server.go:611 +0x60
Jul 01 03:16:03 jrprd-db01.justreply.co docker[46726]: github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.(*EtcdServer).run.func2(0x7fe98a741590, 0xc8203e2d40)
Jul 01 03:16:03 jrprd-db01.justreply.co docker[46726]: /home/gyuho/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/server.go:592 +0x32
Jul 01 03:16:03 jrprd-db01.justreply.co docker[46726]: github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/pkg/schedule.(*fifo).run(0xc8203dfc20)
Jul 01 03:16:03 jrprd-db01.justreply.co docker[46726]: /home/gyuho/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/pkg/schedule/schedule.go:160 +0x323
Jul 01 03:16:03 jrprd-db01.justreply.co docker[46726]: created by github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/pkg/schedule.NewFIFOScheduler
Jul 01 03:16:03 jrprd-db01.justreply.co docker[46726]: /home/gyuho/go/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/pkg/schedule/schedule.go:71 +0x27d

xiang90 · 2016-07-01T03:19:38Z

@marclennox What is the version of your etcdserver? Have you cleaned up the data-dir entirely before rejoining?

marclennox · 2016-07-01T03:21:40Z

I'm running it using the published docker image quay.io/coreos/etcd:latest, and yes definitely cleaned out data directory.

marclennox · 2016-07-01T03:22:23Z

Oh I see, so this is now version 3.0.0.

marclennox · 2016-07-01T03:22:50Z

I'll revert to 2.3.7 and see how that goes.

xiang90 · 2016-07-01T03:26:06Z

@marclennox OK. Thanks!

marclennox · 2016-07-01T03:28:55Z

Yep, that fixed it. Thanks @xiang90 and @heyitsanthony for helping me work through the problem. :) Sorry for the noise.

tbchj · 2017-03-17T06:02:07Z

@heyitsanthony how do you solve the problem you said?
clean the data directory? is this the only way?
in my cluster,i only have one node. what can i do? clean all data?
i don't think this is the best way to solve question.

heyitsanthony · 2017-03-17T18:35:48Z

@tbchj if the raft state is corrupted for a single node, then disaster recovery is usually the only way out of it, if possible.

tbchj · 2017-04-14T02:22:53Z

@heyitsanthony sorry, busy.
i solved the problem i said . but not the way clean all data. maybe clean all data also works.
in the data directory, i remove the broken file both wal and snap.
after several times retry. i found it maybe was both wal and snap are inconsistent. the newest wal message lose, so i delete the newest snap file, and i think the message will write from wal to snap again.
and it works. the etcd start work.

cwx559275 · 2018-06-05T07:17:22Z

Hi. Does anybody know the root reason of this problem like "panic: bda4ffc1bc48207d state.commit 472372997 is out of range [472308405, 472310039]"?? And in which version it has been fixed already ????
please tell me.

Queetinliu · 2021-12-20T07:56:54Z

@heyitsanthony sorry, busy. i solved the problem i said . but not the way clean all data. maybe clean all data also works. in the data directory, i remove the broken file both wal and snap. after several times retry. i found it maybe was both wal and snap are inconsistent. the newest wal message lose, so i delete the newest snap file, and i think the message will write from wal to snap again. and it works. the etcd start work.

I does you said,remove broken files under wal and delete the latest snap file ,then restart etcd,it solves my problem,thank you.

heyitsanthony added the type/bug label Jun 14, 2016

xiang90 added the area/raft label Jun 15, 2016

xiang90 mentioned this issue Jun 16, 2016

etcdserver: save state before save snapshot #5690

Merged

xiang90 closed this as completed in #5690 Jun 16, 2016

feikesteenbergen mentioned this issue Aug 5, 2016

Use etcd version 3 instead of 2 zalando-stups/stups-etcd-cluster#29

Closed

gyuho mentioned this issue Mar 10, 2017

etcd doesn't start after crash #7477

Closed

wojtek-t mentioned this issue Nov 29, 2017

Commit is out of range #8935

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

raft: state.commit is out of range #5664

raft: state.commit is out of range #5664

heyitsanthony commented Jun 14, 2016

xiang90 commented Jun 15, 2016

siddontang commented Jun 15, 2016

xiang90 commented Jun 15, 2016

marclennox commented Jun 30, 2016

xiang90 commented Jun 30, 2016

marclennox commented Jul 1, 2016

heyitsanthony commented Jul 1, 2016

marclennox commented Jul 1, 2016

xiang90 commented Jul 1, 2016 •

edited

Loading

marclennox commented Jul 1, 2016

xiang90 commented Jul 1, 2016

marclennox commented Jul 1, 2016

marclennox commented Jul 1, 2016

marclennox commented Jul 1, 2016

xiang90 commented Jul 1, 2016

marclennox commented Jul 1, 2016

tbchj commented Mar 17, 2017

heyitsanthony commented Mar 17, 2017 •

edited

Loading

tbchj commented Apr 14, 2017

cwx559275 commented Jun 5, 2018

Queetinliu commented Dec 20, 2021

raft: state.commit is out of range #5664

raft: state.commit is out of range #5664

Comments

heyitsanthony commented Jun 14, 2016

xiang90 commented Jun 15, 2016

siddontang commented Jun 15, 2016

xiang90 commented Jun 15, 2016

marclennox commented Jun 30, 2016

xiang90 commented Jun 30, 2016

marclennox commented Jul 1, 2016

heyitsanthony commented Jul 1, 2016

marclennox commented Jul 1, 2016

xiang90 commented Jul 1, 2016 • edited Loading

marclennox commented Jul 1, 2016

xiang90 commented Jul 1, 2016

marclennox commented Jul 1, 2016

marclennox commented Jul 1, 2016

marclennox commented Jul 1, 2016

xiang90 commented Jul 1, 2016

marclennox commented Jul 1, 2016

tbchj commented Mar 17, 2017

heyitsanthony commented Mar 17, 2017 • edited Loading

tbchj commented Apr 14, 2017

cwx559275 commented Jun 5, 2018

Queetinliu commented Dec 20, 2021

xiang90 commented Jul 1, 2016 •

edited

Loading

heyitsanthony commented Mar 17, 2017 •

edited

Loading