Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcdctl member list ==> Failed to get leader: client: etcd cluster is unavailable or misconfigured #7171

Closed
hdfeng265 opened this issue Jan 17, 2017 · 5 comments

Comments

@hdfeng265
Copy link

etcdctl member list ==> Failed to get leader: client: etcd cluster is unavailable or misconfigured

version: etcdctl version 2.3.7
OS: CentOS7
cluster has two node: centos1 centos2
use Static mechanism to bootstrap cluster

centos1 is master,can start normally,but is unhealthy,can't access via etcdctl,any operation is failed
logs on centos1
1月 18 00:35:31 centos1 etcd[7426]: ce2a822cea30bfca is starting a new election at term 3895
1月 18 00:35:31 centos1 etcd[7426]: ce2a822cea30bfca became candidate at term 3896
1月 18 00:35:31 centos1 etcd[7426]: ce2a822cea30bfca received vote from ce2a822cea30bfca at term 3896
1月 18 00:35:31 centos1 etcd[7426]: ce2a822cea30bfca [logterm: 69, index: 552812] sent vote request to 4d38d14ebef23f13 at term 3896
1月 18 00:35:32 centos1 etcd[7426]: publish error: etcdserver: request timed out
1月 18 00:35:32 centos1 etcd[7426]: ce2a822cea30bfca is starting a new election at term 3896
1月 18 00:35:32 centos1 etcd[7426]: ce2a822cea30bfca became candidate at term 3897
1月 18 00:35:32 centos1 etcd[7426]: ce2a822cea30bfca received vote from ce2a822cea30bfca at term 3897
1月 18 00:35:32 centos1 etcd[7426]: ce2a822cea30bfca [logterm: 69, index: 552812] sent vote request to 4d38d14ebef23f13 at term 3897
1月 18 00:35:33 centos1 etcd[7426]: ce2a822cea30bfca is starting a new election at term 3897
1月 18 00:35:33 centos1 etcd[7426]: ce2a822cea30bfca became candidate at term 3898

centos2 is added to cluster,can't start
logs on centos2
1月 18 00:06:37 centos2 etcd[6230]: resolving centos1:2380 to 192.168.126.128:2380
1月 18 00:06:37 centos2 etcd[6230]: resolving centos1:2380 to 192.168.126.128:2380
1月 18 00:06:37 centos2 etcd[6230]: stopping listening for client requests on https://centos2:2379
1月 18 00:06:37 centos2 etcd[6230]: stopping listening for peers on https://centos2:2380
1月 18 00:06:37 centos2 etcd[6230]: error validating peerURLs {ClusterID:7e27652122e8b2ae Members:[&{ID:4d38d14ebef23f13 RaftAttributes:RaftAttributes:{PeerURLs:[https://centos2:2380]} Attributes:{Name: ClientURLs:[]}} RaftAttributes:{PeerURLs:[http://centos1:2380]} Attributes:{Name:centos1 ClientURLs:[https://centos1:2379]}}] RemovedMemberIDs:[]}: unmatched member
1月 18 00:06:37 centos2 systemd[1]: etcd.service: main process exited, code=exited, status=1/FAILURE
1月 18 00:06:37 centos2 systemd[1]: Failed to start Etcd Server.
-- Subject: Unit etcd.service has failed

http://centos1:2380 is wrong, the correct is https://centos1:2380

it seems
centos1 waiting for centos2 to vote,but centos2 can't start because centos1's wrong config
and centos1 can't be access because there's no leader

is is right, and how to fix this problem, thank you

@gyuho
Copy link
Contributor

gyuho commented Jan 17, 2017

You may not add 1 member to 1-node cluster since it immediately loses its quorum, so loses it leader.

Also see #6114 and #5940.

@hdfeng265
Copy link
Author

hdfeng265 commented Jan 18, 2017

@gyuho i have see #6114 and #5940,and understand the concept of quorum
but i still don't know how to resolve this problem.
i want to remove member from centos1 to decrease quorum,but it can't be executed

error logs

1月 18 10:21:42 centos1 etcd[3929]: error removing member 4d38d14ebef23f13 (context deadline exceeded)
1月 18 10:21:42 centos1 etcd[3929]: got unexpected response error (context deadline exceeded)

cluster-health

member 4d38d14ebef23f13 is unreachable: no available published client urls
member ce2a822cea30bfca is unhealthy: got unhealthy result from https://centos1:2379
cluster is unhealthy

centos1 is unaccessable after i add centos2

from #6114

Additionally, that new member is risky because it may turn out to be misconfigured or incapable of joining the cluster. In that case, there's no way to recover quorum because the cluster has two members down and two members up, but needs three votes to change membership to undo the botched membership addition. etcd will by default (as of last week) reject member add attempts that could take down the cluster in this manner

is it can't be fixed?

@gyuho
Copy link
Contributor

gyuho commented Jan 18, 2017

@hdfeng265 If you added a wrong endpoint to 1-node cluster, there's no way to fix it right now.
We are planning to implement some protection in the future. Subscribe to this #6420 to track progress.

@gyuho gyuho closed this as completed Jan 18, 2017
@hdfeng265
Copy link
Author

@372046933
Copy link

The recovery procedure is rather simple. When you accidentally add a new member to a single member cluster. Just start the original etcd with ETCD_FORCE_NEW_CLUSTER=“true".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants