kubeadm join control-plane node times out (etcd timeout) #1712

chrischdi · 2019-08-07T13:11:56Z

What keywords did you search in kubeadm issues before filing this one?

etcd join timeout
kubeadm join timeout

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version): kubeadm version: &version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08-05T09:20:51Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Kubernetes version (use kubectl version): v1.15.2
Cloud provider or hardware configuration: Openstack
OS (e.g. from /etc/os-release): Container Linux by CoreOS 2135.5.0 (Rhyolite)
Kernel (e.g. uname -a): Linux os1pi019-kube-master01 4.19.50-coreos-r1 kubeadm join on slave node fails preflight checks #1 SMP Mon Jul 1 19:07:03 -00 2019 x86_64 Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz GenuineIntel GNU/Linux
Others:

What happened?

kubeadm join was invoced and failed.
The etcd container did start up 7 seconds after kubeadm timed out / did exit with failure.
See the following logs (this include kubeadm logs and timestamps for pod-manifest starts):

09:30:27 kubeadm service starts
09:30:27 kubeadm[2025]: [preflight] Reading configuration from the cluster...                                                                                                                                                                                                                                                                                   
09:30:27 kubeadm[2025]: [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'                                                                                                                                                                                                                            
09:30:27 kubeadm[2025]: [control-plane] Using manifest folder "/etc/kubernetes/manifests"                                                                                                                                                                                                                                                                       
09:30:27 kubeadm[2025]: [control-plane] Creating static Pod manifest for "kube-apiserver"                                                                                                                                                                                                                                                                       
09:30:27 kubeadm[2025]: [controlplane] Adding extra host path mount "audit-policy" to "kube-apiserver"                                                                                                                                                                                                                                                          
09:30:27 kubeadm[2025]: [controlplane] Adding extra host path mount "policy-controller" to "kube-apiserver"                                                                                                                                                                                                                                                     
09:30:27 kubeadm[2025]: [controlplane] Adding extra host path mount "audit-log" to "kube-apiserver"                                                                                                                                                                                                                                                             
09:30:27 kubeadm[2025]: [controlplane] Adding extra host path mount "scheduler-policy" to "kube-scheduler"                                                                                                                                                                                                                                                      
09:30:27 kubeadm[2025]: [control-plane] Creating static Pod manifest for "kube-controller-manager"                                                                                                                                                                                                                                                              
09:30:27 kubeadm[2025]: [controlplane] Adding extra host path mount "audit-policy" to "kube-apiserver"                                                                                                                                                                                                                                                          
09:30:27 kubeadm[2025]: [controlplane] Adding extra host path mount "policy-controller" to "kube-apiserver"                                                                                                                                                                                                                                                     
09:30:27 kubeadm[2025]: [controlplane] Adding extra host path mount "audit-log" to "kube-apiserver"                                                                                                                                                                                                                                                             
09:30:27 kubeadm[2025]: [controlplane] Adding extra host path mount "scheduler-policy" to "kube-scheduler"                                                                                                                                                                                                                                                      
09:30:27 kubeadm[2025]: [control-plane] Creating static Pod manifest for "kube-scheduler"                                                                                                                                                                                                                                                                       
09:30:27 kubeadm[2025]: [controlplane] Adding extra host path mount "audit-policy" to "kube-apiserver"                                                                                                                                                                                                                                                          
09:30:27 kubeadm[2025]: [controlplane] Adding extra host path mount "policy-controller" to "kube-apiserver"                                                                                                                                                                                                                                                     
09:30:27 kubeadm[2025]: [controlplane] Adding extra host path mount "audit-log" to "kube-apiserver"                                                                                                                                                                                                                                                             
09:30:27 kubeadm[2025]: [controlplane] Adding extra host path mount "scheduler-policy" to "kube-scheduler"                                                                                                                                                                                                                                                      
09:30:27 kubeadm[2025]: [check-etcd] Checking that the etcd cluster is healthy                                                                                                                                                                                                                                                                                  
09:30:27 kubeadm[2025]: [kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.15" ConfigMap in the kube-system namespace                                                                                                                                                                                                         
09:30:27 kubeadm[2025]: [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"                                                                                                                                                                                                                                                    
09:30:27 kubeadm[2025]: [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"                                                                                                                                                                                                                                
09:30:27 kubeadm[2025]: [kubelet-start] Activating the kubelet service                                                                                                                                                                                                                                                                                          
09:30:27 kubeadm[2025]: [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...                                                                                                                                                                                                                                                                 
09:30:29 kubeadm[2025]: [etcd] Announced new etcd member joining to the existing etcd cluster                                                                                                                                                                                                                                                                   
09:30:29 kubeadm[2025]: [etcd] Wrote Static Pod manifest for a local etcd member to "/etc/kubernetes/manifests/etcd.yaml"                                                                                                                                                                                                                                       
09:30:29 kubeadm[2025]: [etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s                                                                                                                                                                                                                                                     
09:30:38 etcd pause shim create
09:30:30 kube-scheduler pause shim create
09:30:30 kube-controller-manager pause shim create
09:30:34 kube-scheduler shim create
09:30:35 kube-scheduler first logs
09:30:36 kube-apiserver pause shim create
09:31:07 kubeadm[2025]: [kubelet-check] Initial timeout of 40s passed.                                                                                                                                                                                                                                                                                          
09:31:25 kube-controller-manager shim create
09:31:25 kube-controller-manager first logs
09:31:43 kube-apiserver shim create
09:31:44 kubeadm[2025]: error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: timeout waiting for etcd cluster to be available                                                                                                                                                                                     
09:31:44 systemd[1]: kubeadm.service: Main process exited, code=exited, status=1/FAILURE                                                                                                                                                                                                                                                                        
09:31:44 kube-apiserver first logs
09:31:51 etcd shim create
09:31:52.081609 etcd first logs

The timeout we hit here is this one which uses hardcoded values (8 times 5 seconds -> 40s)

What you expected to happen?

The etcd member get's joined to the existing control-plane node and kubeadm succeeds.

How to reproduce it (as minimally and precisely as possible)?

Hard to say.
Try lots of kubeadm joins of control-plane nodes

Anything else we need to know?

In kubeadm init there is a similar looking parameter called TimeoutForControlPlane which defaults to 4 Minutes and is used here to wait for the API server.

This is similar to me because the problem described here and the code at the kubeadm init phase waits for a specific pod, started by the kubelet via a pod manifest.

I see three options:

increase the hardcoded values
use the same parameter as already used during init (TimeoutForControlPlane) which would result in no change to the kubeadm specs
add an additional parameter to the kubeadm spec

The text was updated successfully, but these errors were encountered:

timothysc · 2019-08-07T15:01:01Z

@chrischdi Several questions

Do you have the docker logs detailing what happened?
Did the etcd container start?
What version of docker are you running?
Does this happen consistently or intermittently?

chrischdi · 2019-08-07T15:11:27Z

@chrischdi Several questions

Do you have the docker logs detailing what happened?

Not anymore but we are running some builds every night and I will catch the logs on the next occurrence.

Did the etcd container start?

Yes the etcd container was running from docker perspective. The kubernetes cluster is already deleted, so I don't know its exact state. I will also try to get more information on the next occurrence.

What version of docker are you running?

We've got the coreos built-in docker version which is 18.06.3

Does this happen consistently or intermittently?
I currently cannot reproduce it reliably.

I hope to have it occure again to get all the details and more information.

neolit123 · 2019-08-07T16:20:02Z

@chrischdi are you joining concurrently btw?

ereslibre · 2019-08-07T16:21:55Z

/assign

chrischdi · 2019-08-07T16:48:06Z

@chrischdi are you joining concurrently btw?

No only one controle-plane node or worker node at the same time / sequentially

neolit123 · 2019-08-07T18:54:00Z

the same report here:
kubernetes/website#15637

i changed the priority and we possibly need to increase the timeout and backport to 1.15.

chrischdi · 2019-08-08T06:39:33Z

the same report here:
kubernetes/website#15637

i changed the priority and we possibly need to increase the timeout and backport to 1.15.

Let me know if I can help on this :-)

sunvk · 2019-08-08T15:36:04Z

Any ETA for this? I am currently blocked with my multi master setup.

neolit123 · 2019-08-20T20:33:15Z

@chrischdi @sunvk are you reproducing this consistently?
i tried today running inside VMs and i couldn't.

also our CI is consistently green and we are not seeing the same timeouts.
(consistently, minus some other aspects)

09:31:44 kubeadm[2025]: error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: timeout waiting for etcd cluster to be available

40 seconds should be more than enough for the etcd cluster to report healthy endpoints.
what are you seeing with kubeadm ... --v=2?

neolit123 · 2019-08-20T20:37:59Z

in terms of making this user controllable we have a field in v1beta2 and v1beat1
https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2

called timeoutForControlPlane, but it's under the apiServer config.
it feels to me that etcd timeouts needs a new field, except that we cannot add this field or backport it to v1beta1 and v1beta2, so it has to wait for v1beta3.

the alternative is to just increase the hardcoded timeouts, but this ticket needs more evidence that it's a consistent bug.

sbueringer · 2019-08-21T18:39:58Z

@chrischdi Is currently on leave but he can provide some more details of our problems next Tuesday. Afaik we ended up with patching the hard-coded timeout because we couldn't get our nightly installs consistently green without patching it.

neolit123 · 2019-08-21T19:06:38Z

please do.
my only explanation here would be slow hardware or networking and i would like to get the exact causes.

chrischdi · 2019-08-28T10:04:21Z

I've got some more data :-)

Maybe the timeout does not need to get increased in our case. We had problems with our loadbalancers in getting active and routing traffic to the offline APIServer which caused the timeouts here.

I will need to retest using our improved loadbalancer setup (activating backends after the kubeadm init went through) if we are still hitting this issue.

ereslibre · 2019-08-28T10:16:11Z

We had problems with our loadbalancers in getting active and routing traffic to the offline APIServer which caused the timeouts here.

Thank you for the feedback.

I will need to retest using our improved loadbalancer setup (activating backends after the kubeadm init went through) if we are still hitting this issue.

+1

neolit123 · 2019-08-29T18:19:03Z

adding back "awaiting more evidence"

chrischdi · 2019-09-23T08:41:33Z

As of now I'm not able to reproduce the problem anymore in our Deployment pipelines using upstream v1.15.3 kubeadm and v1.16.0 kubeadm.
@neolit123 I propose to close this one?

ereslibre · 2019-09-23T08:54:25Z

Let's close this issue and reopen if we see it bubbling up again. Thank you for your feedback @chrischdi.

/close

k8s-ci-robot · 2019-09-23T08:54:26Z

@ereslibre: Closing this issue.

In response to this:

Let's close this issue and reopen if we see it bubbling up again. Thank you for your feedback @chrischdi.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

christian-2 · 2019-11-04T09:04:21Z

I am facing the same issue when attempting to add a second master to a v1.15.2 cluster with kubeadm join .... An etcd snapshot taken on the first master is 19M in size. kubelet.service and the static pods (etcd, kube-apiserver, kube-controller-manager, kube-scheduler) came up on the second master, and an etcd snaptshot taken on the second master is also 19M in size. (So maybe the error indicates a non-event?)

neolit123 · 2019-11-04T16:04:49Z

hi, are you also getting:

error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: timeout waiting for etcd cluster to be available

?

christian-2 · 2019-11-05T05:47:29Z

hi, are you also getting:

error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: timeout waiting for etcd cluster to be available

?

@neolit123 Yes, exactly.

neolit123 · 2019-11-05T14:45:00Z

could it be that the retry of ~12 seconds between joining the second and third etcd member is not enough in your case?

christian-2 · 2019-11-05T16:16:08Z

could it be that the retry of ~12 seconds between joining the second and third etcd member is not enough in your case?

There is currently no third etcd member in my setup. The issue already occurs when I try adding a second master node (with stacked control plane nodes) to an existing single-master cluster.

Where do the 12 seconds that you cite come from? Is this a configurable timeout that I could increase?

neolit123 · 2019-11-05T16:46:03Z

you can try building kubeadm from source:

cd kubernetes
git checkout v1.15.2

<apply patch>

make all WHAT=cmd/kubeadm

the timeout is here:
https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/util/etcd/etcd.go#L41-L48

but i don't think this will solve the problem. seems to me something else is at play.

do you have the option to try 1.16.2?

christian-2 · 2019-11-06T07:49:52Z

@neolit123 Thx for the exact pointer into source code.

Yes, the option exists. I anticipate upgrading the cluster to v1.16.2 and then adding a third master/etcd. (Other tasks first on my list, though.)

rmja · 2020-05-01T22:04:58Z

I belive that I have found the issue to this.

When observing /etc/kubernetes/manifests/etcd.yaml on the backup master that is trying to join, you will see that it advertises on a different IP range than the primary master.

To avoid this, you must specify the advertise address manually when joining:

kubeadm join .. --control-plane --apiserver-advertise-address <ip>

Where <ip> is an address in the same subnet as the control plane.

chanhz · 2020-12-14T12:41:48Z

I belive that I have found the issue to this.

When observing /etc/kubernetes/manifests/etcd.yaml on the backup master that is trying to join, you will see that it advertises on a different IP range than the primary master.

To avoid this, you must specify the advertise address manually when joining:
kubeadm join .. --control-plane --apiserver-advertise-address <ip>
Where <ip> is an address in the same subnet as the control plane.

It works! You save my life! tks so much.

APoniatowski · 2021-04-09T10:12:24Z

I can confirm this

I created a cluster on centos 8 stream (fully updated as of today, including k8s) and when I added slave/worker nodes... they were added instantly/quickly. But adding another master (via load balancer):

sudo kubeadm join {LOADBALANCER DNS}:6443 --token {TOKEN} --discovery-token-ca-cert-hash [HASH] --control-plane --certificate-key [HASH]

this took anywhere between an hour or 2 (started it at 5pm CET and checked back at 9PM CET and saw it was fine/up and running)

So when adding another control-plane/master completely blocks the cluster off for a few hours

bmmpp · 2022-05-23T02:33:42Z

I also encountered this problem on 1.24. How did you solve it. thanks

neolit123 added area/etcd kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Aug 7, 2019

neolit123 added this to the v1.16 milestone Aug 7, 2019

timothysc added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. and removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Aug 7, 2019

k8s-ci-robot assigned ereslibre Aug 7, 2019

neolit123 changed the title ~~kubeadm join control-plane node times out~~ kubeadm join control-plane node times out (etcd timeout) Aug 7, 2019

neolit123 mentioned this issue Aug 7, 2019

Multimaster Setup - Master 1 corrupting when issues join command on Master-2 kubernetes/website#15637

Closed

neolit123 added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Aug 7, 2019

neolit123 closed this as completed Aug 7, 2019

neolit123 reopened this Aug 7, 2019

neolit123 added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Aug 29, 2019

neolit123 modified the milestones: v1.16, v1.17 Sep 10, 2019

k8s-ci-robot closed this as completed Sep 23, 2019

micahhausler mentioned this issue Apr 29, 2021

add API support for controlling various timeouts #2463

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubeadm join control-plane node times out (etcd timeout) #1712

kubeadm join control-plane node times out (etcd timeout) #1712

chrischdi commented Aug 7, 2019 •

edited

Loading

timothysc commented Aug 7, 2019

chrischdi commented Aug 7, 2019 •

edited

Loading

neolit123 commented Aug 7, 2019

ereslibre commented Aug 7, 2019

chrischdi commented Aug 7, 2019

neolit123 commented Aug 7, 2019 •

edited

Loading

chrischdi commented Aug 8, 2019

sunvk commented Aug 8, 2019 •

edited

Loading

neolit123 commented Aug 20, 2019 •

edited

Loading

neolit123 commented Aug 20, 2019 •

edited

Loading

sbueringer commented Aug 21, 2019

neolit123 commented Aug 21, 2019

chrischdi commented Aug 28, 2019

ereslibre commented Aug 28, 2019

neolit123 commented Aug 29, 2019

chrischdi commented Sep 23, 2019

ereslibre commented Sep 23, 2019

k8s-ci-robot commented Sep 23, 2019

christian-2 commented Nov 4, 2019

neolit123 commented Nov 4, 2019 •

edited

Loading

christian-2 commented Nov 5, 2019

neolit123 commented Nov 5, 2019

christian-2 commented Nov 5, 2019

neolit123 commented Nov 5, 2019 •

edited

Loading

christian-2 commented Nov 6, 2019

rmja commented May 1, 2020

chanhz commented Dec 14, 2020

APoniatowski commented Apr 9, 2021

bmmpp commented May 23, 2022

kubeadm join control-plane node times out (etcd timeout) #1712

kubeadm join control-plane node times out (etcd timeout) #1712

Comments

chrischdi commented Aug 7, 2019 • edited Loading

What keywords did you search in kubeadm issues before filing this one?

Is this a BUG REPORT or FEATURE REQUEST?

Versions

What happened?

What you expected to happen?

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

timothysc commented Aug 7, 2019

chrischdi commented Aug 7, 2019 • edited Loading

neolit123 commented Aug 7, 2019

ereslibre commented Aug 7, 2019

chrischdi commented Aug 7, 2019

neolit123 commented Aug 7, 2019 • edited Loading

chrischdi commented Aug 8, 2019

sunvk commented Aug 8, 2019 • edited Loading

neolit123 commented Aug 20, 2019 • edited Loading

neolit123 commented Aug 20, 2019 • edited Loading

sbueringer commented Aug 21, 2019

neolit123 commented Aug 21, 2019

chrischdi commented Aug 28, 2019

ereslibre commented Aug 28, 2019

neolit123 commented Aug 29, 2019

chrischdi commented Sep 23, 2019

ereslibre commented Sep 23, 2019

k8s-ci-robot commented Sep 23, 2019

christian-2 commented Nov 4, 2019

neolit123 commented Nov 4, 2019 • edited Loading

christian-2 commented Nov 5, 2019

neolit123 commented Nov 5, 2019

christian-2 commented Nov 5, 2019

neolit123 commented Nov 5, 2019 • edited Loading

christian-2 commented Nov 6, 2019

rmja commented May 1, 2020

chanhz commented Dec 14, 2020

APoniatowski commented Apr 9, 2021

bmmpp commented May 23, 2022

chrischdi commented Aug 7, 2019 •

edited

Loading

chrischdi commented Aug 7, 2019 •

edited

Loading

neolit123 commented Aug 7, 2019 •

edited

Loading

sunvk commented Aug 8, 2019 •

edited

Loading

neolit123 commented Aug 20, 2019 •

edited

Loading

neolit123 commented Aug 20, 2019 •

edited

Loading

neolit123 commented Nov 4, 2019 •

edited

Loading

neolit123 commented Nov 5, 2019 •

edited

Loading