etcdctl member list ==> Failed to get leader: client: etcd cluster is unavailable or misconfigured #7171

hdfeng265 · 2017-01-17T16:52:31Z

etcdctl member list ==> Failed to get leader: client: etcd cluster is unavailable or misconfigured

version: etcdctl version 2.3.7
OS: CentOS7
cluster has two node: centos1 centos2
use Static mechanism to bootstrap cluster

centos1 is master,can start normally,but is unhealthy,can't access via etcdctl,any operation is failed
logs on centos1
1月 18 00:35:31 centos1 etcd[7426]: ce2a822cea30bfca is starting a new election at term 3895
1月 18 00:35:31 centos1 etcd[7426]: ce2a822cea30bfca became candidate at term 3896
1月 18 00:35:31 centos1 etcd[7426]: ce2a822cea30bfca received vote from ce2a822cea30bfca at term 3896
1月 18 00:35:31 centos1 etcd[7426]: ce2a822cea30bfca [logterm: 69, index: 552812] sent vote request to 4d38d14ebef23f13 at term 3896
1月 18 00:35:32 centos1 etcd[7426]: publish error: etcdserver: request timed out
1月 18 00:35:32 centos1 etcd[7426]: ce2a822cea30bfca is starting a new election at term 3896
1月 18 00:35:32 centos1 etcd[7426]: ce2a822cea30bfca became candidate at term 3897
1月 18 00:35:32 centos1 etcd[7426]: ce2a822cea30bfca received vote from ce2a822cea30bfca at term 3897
1月 18 00:35:32 centos1 etcd[7426]: ce2a822cea30bfca [logterm: 69, index: 552812] sent vote request to 4d38d14ebef23f13 at term 3897
1月 18 00:35:33 centos1 etcd[7426]: ce2a822cea30bfca is starting a new election at term 3897
1月 18 00:35:33 centos1 etcd[7426]: ce2a822cea30bfca became candidate at term 3898

centos2 is added to cluster,can't start
logs on centos2
1月 18 00:06:37 centos2 etcd[6230]: resolving centos1:2380 to 192.168.126.128:2380
1月 18 00:06:37 centos2 etcd[6230]: resolving centos1:2380 to 192.168.126.128:2380
1月 18 00:06:37 centos2 etcd[6230]: stopping listening for client requests on https://centos2:2379
1月 18 00:06:37 centos2 etcd[6230]: stopping listening for peers on https://centos2:2380
1月 18 00:06:37 centos2 etcd[6230]: error validating peerURLs {ClusterID:7e27652122e8b2ae Members:[&{ID:4d38d14ebef23f13 RaftAttributes:RaftAttributes:{PeerURLs:[https://centos2:2380]} Attributes:{Name: ClientURLs:[]}} RaftAttributes:{PeerURLs:[http://centos1:2380]} Attributes:{Name:centos1 ClientURLs:[https://centos1:2379]}}] RemovedMemberIDs:[]}: unmatched member
1月 18 00:06:37 centos2 systemd[1]: etcd.service: main process exited, code=exited, status=1/FAILURE
1月 18 00:06:37 centos2 systemd[1]: Failed to start Etcd Server.
-- Subject: Unit etcd.service has failed

http://centos1:2380 is wrong, the correct is https://centos1:2380

it seems
centos1 waiting for centos2 to vote,but centos2 can't start because centos1's wrong config
and centos1 can't be access because there's no leader

is is right, and how to fix this problem, thank you

gyuho · 2017-01-17T17:10:54Z

You may not add 1 member to 1-node cluster since it immediately loses its quorum, so loses it leader.

Also see #6114 and #5940.

hdfeng265 · 2017-01-18T02:10:35Z

@gyuho i have see #6114 and #5940,and understand the concept of quorum
but i still don't know how to resolve this problem.
i want to remove member from centos1 to decrease quorum,but it can't be executed

error logs

1月 18 10:21:42 centos1 etcd[3929]: error removing member 4d38d14ebef23f13 (context deadline exceeded)
1月 18 10:21:42 centos1 etcd[3929]: got unexpected response error (context deadline exceeded)

cluster-health

member 4d38d14ebef23f13 is unreachable: no available published client urls
member ce2a822cea30bfca is unhealthy: got unhealthy result from https://centos1:2379
cluster is unhealthy

centos1 is unaccessable after i add centos2

from #6114

Additionally, that new member is risky because it may turn out to be misconfigured or incapable of joining the cluster. In that case, there's no way to recover quorum because the cluster has two members down and two members up, but needs three votes to change membership to undo the botched membership addition. etcd will by default (as of last week) reject member add attempts that could take down the cluster in this manner

is it can't be fixed?

gyuho · 2017-01-18T02:43:57Z

@hdfeng265 If you added a wrong endpoint to 1-node cluster, there's no way to fix it right now.
We are planning to implement some protection in the future. Subscribe to this #6420 to track progress.

hdfeng265 · 2017-01-18T03:59:09Z

Disaster Recovery
https://coreos.com/etcd/docs/3.0.15/v2/admin_guide.html#disaster-recovery

372046933 · 2018-07-13T06:13:13Z

The recovery procedure is rather simple. When you accidentally add a new member to a single member cluster. Just start the original etcd with ETCD_FORCE_NEW_CLUSTER=“true".

gyuho closed this as completed Jan 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etcdctl member list ==> Failed to get leader: client: etcd cluster is unavailable or misconfigured #7171

etcdctl member list ==> Failed to get leader: client: etcd cluster is unavailable or misconfigured #7171

hdfeng265 commented Jan 17, 2017

gyuho commented Jan 17, 2017

hdfeng265 commented Jan 18, 2017 •

edited

Loading

gyuho commented Jan 18, 2017

hdfeng265 commented Jan 18, 2017

372046933 commented Jul 13, 2018

etcdctl member list ==> Failed to get leader: client: etcd cluster is unavailable or misconfigured #7171

etcdctl member list ==> Failed to get leader: client: etcd cluster is unavailable or misconfigured #7171

Comments

hdfeng265 commented Jan 17, 2017

gyuho commented Jan 17, 2017

hdfeng265 commented Jan 18, 2017 • edited Loading

gyuho commented Jan 18, 2017

hdfeng265 commented Jan 18, 2017

372046933 commented Jul 13, 2018

hdfeng265 commented Jan 18, 2017 •

edited

Loading