Add cluster member fail when restore etcd data #7615

luweijie007 · 2017-03-28T06:43:15Z

Bug reporting

A good bug report has some very specific qualities, so please read over our short document on reporting bugs before submitting a bug report.

To ask a question, go ahead and ignore this.

luweijie007 · 2017-03-28T06:45:16Z

I use backup to restore etcd data
1> etcdctl backup --data-dir /opt/dzhyun/etcd-cluster-1/data/ --wal-dir /opt/dzhyun/etcd-cluster-1/data/ --backup-dir /home/wwf/etcd_back/ --backup-wal-dir home/wwf/etcd_back/

2> etcd -data-dir=/home/wwf/etcd_back/ -force-new-cluster --name infra0 --initial-advertise-peer-urls http://10.15.209.165:2480 --listen-peer-urls http://10.15.209.165:2480 --listen-client-urls http://10.15.209.165:2479,http://127.0.0.1:2479 --advertise-client-urls http://10.15.209.165:2479 --initial-cluster-token etcd-cluster-1 --initial-cluster infra0=http://10.15.209.165:2480 --initial-cluster-state new

I can not found any data in this new restore etcd server, why?
I guess If I want to read this data from this new restore etcd, maybe I need to create 2 new etcd and make three nodes to been one cluster. So I to do as follow:

and this next I run 2 new etcd server on other machine, like follow:
//on 10.15.107.143:
./etcd --initial-advertise-peer-urls http://10.15.107.143:2380 --listen-peer-urls http://10.15.107.143:2380 --listen-client-urls http://10.15.107.143:2379,http://127.0.0.1:2379 --advertise-client-urls http://10.15.107.143:2379
//on 10.15.107.141:
./etcd --name infra1 --initial-advertise-peer-urls http://10.15.107.141:2381 --listen-peer-urls http://10.15.107.141:2381 --listen-client-urls http://10.15.107.141:2379,http://127.0.0.1:2379 --advertise-client-urls http://10.15.107.141:2379

on 10.15.209.165 I try to add this new 2 etcd server as cluster members:
[root@10 member]# etcdctl --endpoint 10.15.209.165:2479 member add infra0 http://10.15.107.141:2379
Added member named infra0 with ID 3d6bf7d7459a39cb to cluster

ETCD_NAME="infra0"
ETCD_INITIAL_CLUSTER="infra0=http://10.15.209.165:2480,infra0=http://10.15.107.141:2379"
ETCD_INITIAL_CLUSTER_STATE="existing"

But add other fail:

[root@10 member]# etcdctl --endpoint 10.15.209.165:2479 member add infra1 http://10.15.107.143:2379
client: etcd cluster is unavailable or misconfigured; error #0: client: etcd member http://10.15.209.165:2479 has no leader

and on 10.15.209.165 etcd servrer print log as follow:
2017-03-28 14:44:06.063336 I | raft: 30c2969a5d0e09f0 is starting a new election at term 297
2017-03-28 14:44:06.063395 I | raft: 30c2969a5d0e09f0 became candidate at term 298
2017-03-28 14:44:06.063416 I | raft: 30c2969a5d0e09f0 received MsgVoteResp from 30c2969a5d0e09f0 at term 298
2017-03-28 14:44:06.063437 I | raft: 30c2969a5d0e09f0 [logterm: 2, index: 21] sent MsgVote request to 3d6bf7d7459a39cb at term 298
2017-03-28 14:44:07.763282 I | raft: 30c2969a5d0e09f0 is starting a new election at term 298
2017-03-28 14:44:07.763341 I | raft: 30c2969a5d0e09f0 became candidate at term 299
2017-03-28 14:44:07.763362 I | raft: 30c2969a5d0e09f0 received MsgVoteResp from 30c2969a5d0e09f0 at term 299
2017-03-28 14:44:07.763381 I | raft: 30c2969a5d0e09f0 [logterm: 2, index: 21] sent MsgVote request to 3d6bf7d7459a39cb at term 299
2017-03-28 14:44:09.063349 I | raft: 30c2969a5d0e09f0 is starting a new election at term 299
2017-03-28 14:44:09.063412 I | raft: 30c2969a5d0e09f0 became candidate at term 300
2017-03-28 14:44:09.063434 I | raft: 30c2969a5d0e09f0 received MsgVoteResp from 30c2969a5d0e09f0 at term 300
2017-03-28 14:44:09.063452 I | raft: 30c2969a5d0e09f0 [logterm: 2, index: 21] sent MsgVote request to 3d6bf7d7459a39cb at term 300
2017-03-28 14:44:09.970830 W | rafthttp: health check for peer 3d6bf7d7459a39cb could not connect: json: cannot unmarshal number into Go value of type probing.Health
2017-03-28 14:44:10.363266 I | raft: 30c2969a5d0e09f0 is starting a new election at term 300
2017-03-28 14:44:10.363310 I | raft: 30c2969a5d0e09f0 became candidate at term 301
2017-03-28 14:44:10.363330 I | raft: 30c2969a5d0e09f0 received MsgVoteResp from 30c2969a5d0e09f0 at term 301
2017-03-28 14:44:10.363350 I | raft: 30c2969a5d0e09f0 [logterm: 2, index: 21] sent MsgVote request to 3d6bf7d7459a39cb at term 301
2017-03-28 14:44:12.063308 I | raft: 30c2969a5d0e09f0 is starting a new election at term 301
2017-03-28 14:44:12.063382 I | raft: 30c2969a5d0e09f0 became candidate at term 302
2017-03-28 14:44:12.063405 I | raft: 30c2969a5d0e09f0 received MsgVoteResp from 30c2969a5d0e09f0 at term 302
2017-03-28 14:44:12.063426 I | raft: 30c2969a5d0e09f0 [logterm: 2, index: 21] sent MsgVote request to 3d6bf7d7459a39cb at term 302
2017-03-28 14:44:13.963319 I | raft: 30c2969a5d0e09f0 is starting a new election at term 302
2017-03-28 14:44:13.963374 I | raft: 30c2969a5d0e09f0 became candidate at term 303
2017-03-28 14:44:13.963412 I | raft: 30c2969a5d0e09f0 received MsgVoteResp from 30c2969a5d0e09f0 at term 303
2017-03-28 14:44:13.963433 I | raft: 30c2969a5d0e09f0 [logterm: 2, index: 21] sent MsgVote request to 3d6bf7d7459a39cb at term 303
2017-03-28 14:44:14.971181 W | rafthttp: health check for peer 3d6bf7d7459a39cb could not connect: json: cannot unmarshal number into Go value of
// too much log ........

If some one can tell me how to do etcd restore,I has read:
https://github.com/coreos/etcd/blob/40ae83beab6ecc55ed64825bac59db21a7e0c2c2/Documentation/op-guide/recovery.md
and
https://github.com/coreos/etcd/blob/40ae83beab6ecc55ed64825bac59db21a7e0c2c2/Documentation/v2/admin_guide.md#disaster-recovery

And my final question is that I has a etcd cluster which both has v2 data and v3 data ,How can I restore this cluser data?

fanminshi · 2017-03-28T17:31:32Z

taking a look.

fanminshi · 2017-03-28T18:53:21Z

I can not found any data in this new restore etcd server, why?

from your command
etcdctl backup --data-dir /opt/dzhyun/etcd-cluster-1/data/ --wal-dir /opt/dzhyun/etcd-cluster-1/data/ --backup-dir /home/wwf/etcd_back/ --backup-wal-dir home/wwf/etcd_back/

It seems to me that your wal file is in your data dir. So there is no need to specify --wal-dir flag.

try etcdctl backup --data-dir /opt/dzhyun/etcd-cluster-1/data/ --backup-dir /home/wwf/etcd_back/

then start etcd with the new backup dir should work.

luweijie007 · 2017-03-29T01:41:04Z

@fanminshi thanks your suggust!
I try :
etcdctl backup --data-dir /opt/dzhyun/etcd-cluster-1/data/ --backup-dir /home/wwf/etcd_back/
and next:

it still unwork, and I has check this file: /home/wwf/etcd_back/member/snap/db is not exist .

fanminshi · 2017-03-29T23:16:26Z

I was able to reproduce the same error /home/wwf/etcd_back/member/snap/db is not exist .

Setup:
etcd Version: 3.2.0+git
Git SHA: 123b258
Go Version: go1.8
Go OS/Arch: darwin/amd64

Steps:

$ bin/etcd --snapshot-count 5
...
2017-03-29 15:59:28.008316 I | etcdserver: start to snapshot (applied: 6, lastsnap: 0)
2017-03-29 15:59:28.027225 I | etcdserver: saved snapshot at index 6
2017-03-29 15:59:28.027254 I | etcdserver: compacted raft log at 1
...

// another window
// trigger etcd to snapshot
$ ETCDCTL_API=2 bin/etcdctl set foo bar1
bar1
$ ETCDCTL_API=2 bin/etcdctl set foo bar2
bar2
$ ETCDCTL_API=2 bin/etcdctl set foo bar3
bar3
$ ETCDCTL_API=2 bin/etcdctl set foo bar4
bar4
$ ETCDCTL_API=2 bin/etcdctl set foo bar5
bar5
$ ETCDCTL_API=2 bin/etcdctl set foo bar6

$ tree default.etcd/
default.etcd/
└── member
    ├── snap
    │   ├── 0000000000000002-0000000000000006.snap
    │   └── db
    └── wal
        └── 0000000000000000-0000000000000000.wal

// backup
$ ETCDCTL_API=2 bin/etcdctl backup --data-dir default.etcd/ --backup-dir backup/
// backup doesn't contain a db file
$ tree backup
backup
└── member
    ├── snap
    │   └── 0000000000000002-0000000000000006.snap
    └── wal
        └── 0000000000000000-0000000000000000.wal

// kill old etcd proccess
// start new one with backup
$ bin/etcd -data-dir backup -force-new-cluster
2017-03-29 16:06:53.180176 I | etcdserver: recovered store from snapshot at index 6
2017-03-29 16:06:53.180185 I | etcdserver: name = default
2017-03-29 16:06:53.180198 I | etcdserver: force new cluster
2017-03-29 16:06:53.180200 I | etcdserver: data dir = backup
2017-03-29 16:06:53.180203 I | etcdserver: member dir = backup/member
2017-03-29 16:06:53.180208 I | etcdserver: heartbeat = 100ms
2017-03-29 16:06:53.180210 I | etcdserver: election = 1000ms
2017-03-29 16:06:53.180212 I | etcdserver: snapshot count = 100000
2017-03-29 16:06:53.180217 I | etcdserver: advertise client URLs = http://localhost:2379
2017-03-29 16:06:53.237776 I | etcdserver: forcing restart of member 5b1c4f256b01 in cluster 5b1c4f256b02 at commit index 12
2017-03-29 16:06:53.237886 I | raft: 5b1c4f256b01 became follower at term 2
2017-03-29 16:06:53.237915 I | raft: newRaft 5b1c4f256b01 [peers: [8e9e05c52164694d], term: 2, commit: 12, applied: 6, lastindex: 12, lastterm: 2]
2017-03-29 16:06:53.238150 I | etcdserver/api: enabled capabilities for version 3.2
2017-03-29 16:06:53.238175 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster 5b1c4f256b02 from store
2017-03-29 16:06:53.238183 I | etcdserver/membership: set the cluster version to 3.2 from store
2017-03-29 16:06:53.243627 C | etcdmain: database file (backup/member/snap/db) of the backend is missing

the issue is that the db file is not present in the backup folder. etcd fails at this check https://github.com/coreos/etcd/blob/master/etcdserver/server.go#L391

@luweijie007 I am investigating this issue. I'll let you know my progress.

heyitsanthony · 2017-03-29T23:28:02Z

3.1 expects a db file since restoring from backup is expected to come from an etcdctl snapshot restore. The simplest workaround would probably be to add an empty db file when creating the backup with etcdctl backup

fanminshi · 2017-03-29T23:32:21Z

@heyitsanthony I was also able get around of this issue by just copying the db from original data-dir to the backup-dir after running etcdctl backup . However, I wasn't sure if that's correct. And it seems to me that etcdctl snapshot save only saves v3 key-value pairs but not v2 key-value pairs.

heyitsanthony · 2017-03-29T23:34:53Z

@fanminshi there is already an open issue on this subject at #7002. I don't think copying the db file is safe for when doing a v2 restore because the WAL's membership data will not match the membership data in the db.

fanminshi · 2017-03-29T23:36:52Z

@heyitsanthony agreed.

heyitsanthony · 2017-03-29T23:41:29Z

@luweijie007 is the cluster only storing v3 keys? If so, that would explain why the data isn't showing up after etcdctl backup / restore. Try etcdctl snapshot's save and restore, there's an example in the etcd3 recovery guide

fanminshi · 2017-03-29T23:46:29Z

@luweijie007

When after backing up with
etcdctl backup --data-dir /opt/dzhyun/etcd-cluster-1/data/ --backup-dir /home/wwf/etcd_back/

create an empty db file with
touch /home/wwf/etcd_back/db

then start etcd with the new backup dir should work; I tested that with cluster storing both v2 and v3 keys.

edit: this doesn't work as intended. see #7615 (comment) below.

luweijie007 · 2017-03-30T02:04:03Z

@heyitsanthony @fanminshi
this cluster I want to restore has keep v3 anv v2 key. I do the restore follow this doc:
https://github.com/coreos/etcd/blob/40ae83beab6ecc55ed64825bac59db21a7e0c2c2/Documentation/v2/admin_guide.md#disaster-recovery
ok I will try to test your way fanminshi
thinks

heyitsanthony · 2017-03-30T02:17:20Z

@luweijie007 it's not possible to restore both v2 and v3 keys, hence the issue #7002. backup will only save v2 keys.

luweijie007 · 2017-03-30T02:41:06Z

@heyitsanthony , you mean that It is not no way to restore a cluster which has keep v2 and v3 keys?
but and from this comment from @fanminshi

I consider it can restore both v2 and v3 keys .
this issue #7002, i may been can not do this

heyitsanthony · 2017-03-30T02:43:38Z

@luweijie007 that's storing keys. Not retrieving old keys. The v3 keys are held in the db; creating an empty db file won't restore them.

luweijie007 · 2017-03-30T03:04:13Z

ok , I try as fanminshi instruction. and this new cluster can get v2 keys, but v3 keys just has a little. NO all v3 keys.
so this result is that If one cluster which store v2 and v3 keys is not way to restore all keys at this time?
@fanminshi , so can I store key v2 and v3 keys one the same cluster etcd ?

heyitsanthony · 2017-03-30T03:10:13Z

@luweijie007 what is happening in this case is the backed up WAL still has some v3 proposals in it. However, there's no guarantee it will have all the v3 keys since the WAL is periodically pruned and the v3 keys are saved into the DB. It's not a reliable backup method for v3 keys.

It's possible to store both v2 and v3 keys into an etcd cluster, but there's no way official way to restore both into a new cluster.

luweijie007 · 2017-03-30T05:34:29Z

@heyitsanthony @fanminshi
ok, I understand about restore from your comment .
thinks a lots!

heyitsanthony assigned fanminshi Mar 28, 2017

luweijie007 closed this as completed Mar 30, 2017

gyuho mentioned this issue Jul 28, 2017

v2 backup can not be restored for etcd 3.1.7 #8331

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cluster member fail when restore etcd data #7615

Add cluster member fail when restore etcd data #7615

luweijie007 commented Mar 28, 2017

luweijie007 commented Mar 28, 2017 •

edited

Loading

fanminshi commented Mar 28, 2017

fanminshi commented Mar 28, 2017 •

edited

Loading

luweijie007 commented Mar 29, 2017

fanminshi commented Mar 29, 2017 •

edited

Loading

heyitsanthony commented Mar 29, 2017

fanminshi commented Mar 29, 2017

heyitsanthony commented Mar 29, 2017

fanminshi commented Mar 29, 2017

heyitsanthony commented Mar 29, 2017

fanminshi commented Mar 29, 2017 •

edited

Loading

luweijie007 commented Mar 30, 2017 •

edited

Loading

heyitsanthony commented Mar 30, 2017

luweijie007 commented Mar 30, 2017

heyitsanthony commented Mar 30, 2017

luweijie007 commented Mar 30, 2017

heyitsanthony commented Mar 30, 2017

luweijie007 commented Mar 30, 2017 •

edited

Loading

Add cluster member fail when restore etcd data #7615

Add cluster member fail when restore etcd data #7615

Comments

luweijie007 commented Mar 28, 2017

Bug reporting

luweijie007 commented Mar 28, 2017 • edited Loading

fanminshi commented Mar 28, 2017

fanminshi commented Mar 28, 2017 • edited Loading

luweijie007 commented Mar 29, 2017

fanminshi commented Mar 29, 2017 • edited Loading

heyitsanthony commented Mar 29, 2017

fanminshi commented Mar 29, 2017

heyitsanthony commented Mar 29, 2017

fanminshi commented Mar 29, 2017

heyitsanthony commented Mar 29, 2017

fanminshi commented Mar 29, 2017 • edited Loading

luweijie007 commented Mar 30, 2017 • edited Loading

heyitsanthony commented Mar 30, 2017

luweijie007 commented Mar 30, 2017

heyitsanthony commented Mar 30, 2017

luweijie007 commented Mar 30, 2017

heyitsanthony commented Mar 30, 2017

luweijie007 commented Mar 30, 2017 • edited Loading

luweijie007 commented Mar 28, 2017 •

edited

Loading

fanminshi commented Mar 28, 2017 •

edited

Loading

fanminshi commented Mar 29, 2017 •

edited

Loading

fanminshi commented Mar 29, 2017 •

edited

Loading

luweijie007 commented Mar 30, 2017 •

edited

Loading

luweijie007 commented Mar 30, 2017 •

edited

Loading