-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DNS Container Fails - Kops 1.6 #2529
Comments
We're seeing the same issue |
Having the same issue with
It appears to be related to this issue? |
Having the same problem with kops 1.6.0-beta.1, Kubernetes 1.6.2 and calico, deleting pods does not help |
@georgebuckerfield Logs of |
I'm having the same issue. I'm using Version 1.6.0-beta.1 (git-77f222d) of kops. My reason for using this version was to get Kubernetes 1.6.2 working. Is it possible to have kops 1.5.3 (brew install) create a v1.6.2 Kubernetes cluster? |
@deleonjavier kops 1.5.3 does not support kubernetes 1.6.x. Master includes an updated version of Calico, which may help with this problem. |
@chrislovecnm Actually I just used kops (Version 1.5.3) to create a Kubernetes Cluster (v1.6.2). Looks like the broken parameter for me was using |
@georgebuckerfield I have done that few times, was the first thing to try, and still didn't work. I did not delete the cluster and after 52 restarts of kube-dns service (done automatically), it has managed to recover! I could not troubleshoot on the networking though. I hope we get a feedback on this. |
Can we confirm this with the 1.6.0 kops release? Calico has been upgraded. |
@chrislovecnm I'm seeing this right now as well, with 1.6.0. I upgraded a 1.5.7 cluster to 1.6.2 with flannel and I'm seeing dns pods stuck in creating for the new dns rs. Trying to get more info. |
I just attempted to do a new cluster build with 1.6.0/Calico and am seeing these errors |
I'm also getting the same error. Restarting kube-dns does not alleviate the issue. |
Seeing the same issue here with the kops 1.6.0 release with Calico on a fresh cluster, no issues at all when using Canal however. Restarting kube-dns after ensuring Calico is running on nodes doesn't seem to have an effect. Configured with
Some hopefully useful logs:
|
OK, found a surprisingly simple workaround (on a fresh cluster, kops 1.6.0 w/ calico) Seems like what happened was the original
The YAML for the job was pulled from here: After which kube-dns started running automatically:
And A note here, my master nodes are consistently starting before any of my regular nodes. Here's the log line from the original
|
@chrislovecnm I have tried all kops 1.6 (alpha 1 and 2, beta 1) releases with all Kubernetes 1.6 (1.6.0, 1.6.1, 1.6.2) releases. |
I've also experienced this issue - kops 1.6 & k8s 1.6.3. |
cc @caseydavenport , @shadoi Casey any ideas on how we can diagnose what is going on? |
Yeah, definitely a configure-calico problem. The logs from the failed Pod would hopefully indicate what went wrong. That said, the latest release of Calico doesn't require that Job - we should update the manifest to remove it and use the CALICO_IPV4POOL_CIDR configuration option in the DaemonSet instead. From the latest upstream manifest:
|
I'll open up a PR that implements the change @caseydavenport is describing. |
I'm assuming it's a race condition with the calico-node DaemonSet getting scheduled before some part of the kops bootstrap process has put the service accounts in place? I'm not sure if that's done after things would start getting scheduled? Maybe the master nodes could be cordoned until these steps have been verified first. |
…nt first This fixes the behaviour described in kubernetes#2529 which was fixed by kubernetes#2590, by avoiding the configure-calico job all together.
I currently have an issue after creating the cluster with kops 1.6 alpha2 git-d57ceda and kubernetes 1.6.2. The cluster is created successfully but the DNS containers fails to start, it stays in "Creating Container" state or rpc error. I am using Calico for my networking. Here is the command am using to create the cluster.
When I describe the pods in kube-system, I get those errors
"message":"cannot join network of a non running container:
andnetwork: No configured Calico pools
.I tried the latest kops release as well and same applies.
Thanks a lot of your support.
The text was updated successfully, but these errors were encountered: