DNS Container Fails - Kops 1.6 #2529

odehsemreen · 2017-05-08T15:42:53Z

I currently have an issue after creating the cluster with kops 1.6 alpha2 git-d57ceda and kubernetes 1.6.2. The cluster is created successfully but the DNS containers fails to start, it stays in "Creating Container" state or rpc error. I am using Calico for my networking. Here is the command am using to create the cluster.

kops create cluster \
    --channel alpha \
    --node-count 3 \
    --zones eu-west-1a, eu-west-1b, eu-west-1c \
    --master-zones eu-west-1a, eu-west-1b, eu-west-1c \
    --dns-zone cluster.k8s.domain.com \
    --node-size c3.large \
    --master-size c3.large \
    --topology private \
    --networking calico \
    --ssh-public-key ~/.ssh/id_rsa \
    --vpc=vpc-123456 \
    --bastion \
    cluster.k8s.domain.com

When I describe the pods in kube-system, I get those errors "message":"cannot join network of a non running container: and network: No configured Calico pools.

I tried the latest kops release as well and same applies.

Thanks a lot of your support.

The text was updated successfully, but these errors were encountered:

igorcanadi · 2017-05-10T22:34:02Z

We're seeing the same issue

georgebuckerfield · 2017-05-11T13:53:59Z

Having the same issue with 1.6.3. I was able to workaround by deleting the failed pod and letting the replication set re-launch it.

Pod Failures in kube-system
NAME
configure-calico-wpskv
kube-dns-autoscaler-387649234-x9d1d

Validation Failed
Ready Master(s) 1 out of 1.
Ready Node(s) 0 out of 2.

your nodes are NOT ready kubernetes.example.com

$ kubectl get deployment --all-namespaces
NAMESPACE     NAME                       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
kube-system   calico-policy-controller   1         1         1            1           1m
kube-system   dns-controller             1         1         1            1           1m
kube-system   kube-dns                   1         1         1            0           1m
kube-system   kube-dns-autoscaler        1         1         1            0           1m

$ kubectl delete pod --namespace=kube-system kube-dns-autoscaler-387649234-x9d1d
pod "kube-dns-autoscaler-387649234-x9d1d" deleted

$ kubectl get deployments --all-namespaces
NAMESPACE     NAME                       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
kube-system   calico-policy-controller   1         1         1            1           14m
kube-system   dns-controller             1         1         1            1           14m
kube-system   kube-dns                   2         2         2            2           14m
kube-system   kube-dns-autoscaler        1         1         1            1           14m

It appears to be related to this issue?

a-chernykh · 2017-05-12T20:19:04Z

Having the same problem with kops 1.6.0-beta.1, Kubernetes 1.6.2 and calico, deleting pods does not help

dolftax · 2017-05-14T04:36:59Z

@georgebuckerfield Logs of configure-calico-wpskv and kube-dns-autoscaler-387649234-x9d1d ?

deleonjavier · 2017-05-15T18:11:59Z

I'm having the same issue. I'm using Version 1.6.0-beta.1 (git-77f222d) of kops. My reason for using this version was to get Kubernetes 1.6.2 working. Is it possible to have kops 1.5.3 (brew install) create a v1.6.2 Kubernetes cluster?

chrislovecnm · 2017-05-15T20:12:36Z

@deleonjavier kops 1.5.3 does not support kubernetes 1.6.x. Master includes an updated version of Calico, which may help with this problem.

deleonjavier · 2017-05-15T20:38:26Z

@chrislovecnm Actually I just used kops (Version 1.5.3) to create a Kubernetes Cluster (v1.6.2). Looks like the broken parameter for me was using --networking cni. I was using that --networking setting , so I could in the future use weave but remain on kubenet networking till then.

odehsemreen · 2017-05-16T08:22:59Z

@georgebuckerfield I have done that few times, was the first thing to try, and still didn't work. I did not delete the cluster and after 52 restarts of kube-dns service (done automatically), it has managed to recover! I could not troubleshoot on the networking though. I hope we get a feedback on this.

chrislovecnm · 2017-05-17T15:22:34Z

Can we confirm this with the 1.6.0 kops release? Calico has been upgraded.

mikesplain · 2017-05-17T15:49:50Z

@chrislovecnm I'm seeing this right now as well, with 1.6.0. I upgraded a 1.5.7 cluster to 1.6.2 with flannel and I'm seeing dns pods stuck in creating for the new dns rs. Trying to get more info.

cbuckley01 · 2017-05-17T17:15:23Z

I just attempted to do a new cluster build with 1.6.0/Calico and am seeing these errors

ottoyiu · 2017-05-17T20:54:16Z

I'm also getting the same error. Restarting kube-dns does not alleviate the issue.

willtrking · 2017-05-18T00:10:49Z

Seeing the same issue here with the kops 1.6.0 release with Calico on a fresh cluster, no issues at all when using Canal however. Restarting kube-dns after ensuring Calico is running on nodes doesn't seem to have an effect.

Configured with

kops create cluster \
    --admin-access XXX.XXX.XX.X/32 \
    --node-count 4 \
    --encrypt-etcd-storage \
    --zones us-west-2a,us-west-2b,us-west-2c \
    --master-zones us-west-2a,us-west-2b,us-west-2c \
    --node-size m4.xlarge \
    --master-size m4.large \
    --topology private \
    --networking calico \
    --ssh-public-key=XXXXX.pub \
    --bastion \
    ${NAME}

Some hopefully useful logs:

  14m		13m		6	default-scheduler							Warning		FailedScheduling	no nodes available to schedule pods
  13m		13m		2	default-scheduler							Warning		FailedScheduling	No nodes are available that match all of the following predicates:: PodToleratesNodeTaints (2).
  13m		13m		2	default-scheduler							Warning		FailedScheduling	No nodes are available that match all of the following predicates:: PodToleratesNodeTaints (3).
  12m		12m		1	default-scheduler							Normal		Scheduled		Successfully assigned kube-dns-1321724180-1w9ds to ip-172-20-123-54.us-west-2.compute.internal
  11m		11m		1	kubelet, ip-172-20-123-54.us-west-2.compute.internal			Warning		FailedSync		Error syncing pod, skipping: rpc error: code = 2 desc = Error: No such container: c5511c8981cb4613b6ee8a9879163f5887e9f0aa4dbbf13408430f0bdbcc435f
  9m		9m		1	kubelet, ip-172-20-123-54.us-west-2.compute.internal			Warning		FailedSync		Error syncing pod, skipping: rpc error: code = 2 desc = Error: No such container: db3d2d9632d5072126215d0770c3695f46266fd01640b863a1019d69b46e5e35
  7m		7m		1	kubelet, ip-172-20-123-54.us-west-2.compute.internal			Warning		FailedSync		Error syncing pod, skipping: failed to "KillPodSandbox" for "fd7933fd-3b5a-11e7-ba84-06eb9ab37f5e" with KillPodSandboxError: "rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod \"kube-dns-1321724180-1w9ds_kube-system\" network: CNI failed to retrieve network namespace path: Error: No such container: ae7f9802e57a5311a147aeb4cd847f1304572ec0772d456a1928aaddfd7fbf5e"

  7m	7m	1	kubelet, ip-172-20-123-54.us-west-2.compute.internal		Warning	FailedSync	Error syncing pod, skipping: rpc error: code = 2 desc = Error: No such container: 0257fbc853134d66dddd19080deabb70a9bde5f36ab1bc5c7dd61f81a605052d
  6m	6m	1	kubelet, ip-172-20-123-54.us-west-2.compute.internal		Warning	FailedSync	Error syncing pod, skipping: rpc error: code = 2 desc = Error: No such container: a77d4afbaf03a376b43b1c50c7d8e74486a6b63efc2dd107e68e6792dc50417a
  5m	5m	1	kubelet, ip-172-20-123-54.us-west-2.compute.internal		Warning	FailedSync	Error syncing pod, skipping: rpc error: code = 2 desc = Error: No such container: ecf421f2311dcd2023ace6eb8eda1d7582f1fbe43e47fa6c6b31c1c417c7243e
  4m	4m	1	kubelet, ip-172-20-123-54.us-west-2.compute.internal		Warning	FailedSync	Error syncing pod, skipping: rpc error: code = 2 desc = Error: No such container: ef0e93f29f39163838d34684bc1b070add7846cf85a86e744c5542439e671f27
  3m	3m	1	kubelet, ip-172-20-123-54.us-west-2.compute.internal		Warning	FailedSync	Error syncing pod, skipping: failed to "KillPodSandbox" for "fd7933fd-3b5a-11e7-ba84-06eb9ab37f5e" with KillPodSandboxError: "rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod \"kube-dns-1321724180-1w9ds_kube-system\" network: CNI failed to retrieve network namespace path: Error: No such container: 092978763e32577f6a5570fb1e25ba59cddf7a3f087455012d0aaee6a03e7778"

  2m	29s	4	kubelet, ip-172-20-123-54.us-west-2.compute.internal		Warning	FailedSync	(events with common reason combined)
  12m	4s	196	kubelet, ip-172-20-123-54.us-west-2.compute.internal		Warning	FailedSync	Error syncing pod, skipping: failed to "CreatePodSandbox" for "kube-dns-1321724180-1w9ds_kube-system(fd7933fd-3b5a-11e7-ba84-06eb9ab37f5e)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-dns-1321724180-1w9ds_kube-system(fd7933fd-3b5a-11e7-ba84-06eb9ab37f5e)\" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod \"kube-dns-1321724180-1w9ds_kube-system\" network: No configured Calico pools"

  11m	3s	200	kubelet, ip-172-20-123-54.us-west-2.compute.internal		Normal	SandboxChanged	Pod sandbox changed, it will be killed and re-created.

willtrking · 2017-05-18T00:29:30Z

OK, found a surprisingly simple workaround (on a fresh cluster, kops 1.6.0 w/ calico)

Seems like what happened was the original configure-calico job didn't succeed, so I just ran that job again under a new name. In order to do so I simply found the relevant kops YAML, renamed the job and kubectl create -f that YAML file.

Williams-MBP-2:kops willtrking$ kubectl get jobs --namespace=kube-system
NAME                     DESIRED   SUCCESSFUL   AGE
configure-calico         1         0            4m
configure-calico-again   1         1            1m

The YAML for the job was pulled from here:

https://github.com/kubernetes/kops/blob/master/upup/models/cloudup/resources/addons/networking.projectcalico.org/k8s-1.6.yaml.template#L222

After which kube-dns started running automatically:

Williams-MBP-2:kops willtrking$ kubectl get pods --namespace=kube-system
NAME                                                                  READY     STATUS    RESTARTS   AGE
calico-node-4lhvm                                                     2/2       Running   0          3m
calico-node-5k6hj                                                     2/2       Running   0          2m
calico-node-d8sql                                                     2/2       Running   0          2m
calico-node-h13zw                                                     2/2       Running   0          2m
calico-node-qk30q                                                     2/2       Running   0          3m
calico-node-ttl7b                                                     2/2       Running   0          3m
calico-node-xgqb9                                                     2/2       Running   0          2m
calico-policy-controller-811246363-bp4r7                              1/1       Running   0          3m
dns-controller-116990191-cp8xt                                        1/1       Running   0          3m
etcd-server-events-ip-172-20-114-94.us-west-2.compute.internal        1/1       Running   0          2m
etcd-server-events-ip-172-20-46-208.us-west-2.compute.internal        1/1       Running   0          2m
etcd-server-events-ip-172-20-71-14.us-west-2.compute.internal         1/1       Running   0          3m
etcd-server-ip-172-20-114-94.us-west-2.compute.internal               1/1       Running   0          2m
etcd-server-ip-172-20-46-208.us-west-2.compute.internal               1/1       Running   0          2m
etcd-server-ip-172-20-71-14.us-west-2.compute.internal                1/1       Running   0          2m
kube-apiserver-ip-172-20-114-94.us-west-2.compute.internal            1/1       Running   0          3m
kube-apiserver-ip-172-20-46-208.us-west-2.compute.internal            1/1       Running   1          2m
kube-apiserver-ip-172-20-71-14.us-west-2.compute.internal             1/1       Running   0          3m
kube-controller-manager-ip-172-20-114-94.us-west-2.compute.internal   1/1       Running   0          2m
kube-controller-manager-ip-172-20-46-208.us-west-2.compute.internal   1/1       Running   0          2m
kube-controller-manager-ip-172-20-71-14.us-west-2.compute.internal    1/1       Running   0          3m
kube-dns-1321724180-7pm55                                             3/3       Running   0          3m
kube-dns-1321724180-8g01w                                             3/3       Running   0          16s
kube-dns-autoscaler-265231812-21kc7                                   1/1       Running   0          3m
kube-proxy-ip-172-20-111-46.us-west-2.compute.internal                1/1       Running   0          2m
kube-proxy-ip-172-20-114-94.us-west-2.compute.internal                1/1       Running   0          2m
kube-proxy-ip-172-20-37-76.us-west-2.compute.internal                 1/1       Running   0          1m
kube-proxy-ip-172-20-46-208.us-west-2.compute.internal                1/1       Running   0          2m
kube-proxy-ip-172-20-46-72.us-west-2.compute.internal                 1/1       Running   0          2m
kube-proxy-ip-172-20-71-14.us-west-2.compute.internal                 1/1       Running   0          2m
kube-proxy-ip-172-20-86-222.us-west-2.compute.internal                1/1       Running   0          1m
kube-scheduler-ip-172-20-114-94.us-west-2.compute.internal            1/1       Running   0          3m
kube-scheduler-ip-172-20-46-208.us-west-2.compute.internal            1/1       Running   0          2m
kube-scheduler-ip-172-20-71-14.us-west-2.compute.internal             1/1       Running   0          2m

And kops validate cluster is happy!

A note here, my master nodes are consistently starting before any of my regular nodes. Here's the log line from the original configure-calico denoting failure

10m		10m		1	job-controller			Warning		FailedCreate	Error creating: pods "configure-calico-" is forbidden: service account kube-system/calico was not found, retry after the service account is created

odehsemreen · 2017-05-18T08:42:37Z

@chrislovecnm I have tried all kops 1.6 (alpha 1 and 2, beta 1) releases with all Kubernetes 1.6 (1.6.0, 1.6.1, 1.6.2) releases.

jhuntoo · 2017-05-18T15:49:37Z

I've also experienced this issue - kops 1.6 & k8s 1.6.3.

chrislovecnm · 2017-05-18T16:27:21Z

cc @caseydavenport , @shadoi

Casey any ideas on how we can diagnose what is going on?

caseydavenport · 2017-05-18T16:55:55Z

network: No configured Calico pools

Yeah, definitely a configure-calico problem. The logs from the failed Pod would hopefully indicate what went wrong.

That said, the latest release of Calico doesn't require that Job - we should update the manifest to remove it and use the CALICO_IPV4POOL_CIDR configuration option in the DaemonSet instead.

From the latest upstream manifest:

            # Configure the IP Pool from which Pod IPs will be chosen.
            - name: CALICO_IPV4POOL_CIDR
              value: "192.168.0.0/16"

@ozdanborne @heschlie

ozdanborne · 2017-05-18T17:16:27Z

I'll open up a PR that implements the change @caseydavenport is describing.

blakebarnett · 2017-05-18T17:41:19Z

I'm assuming it's a race condition with the calico-node DaemonSet getting scheduled before some part of the kops bootstrap process has put the service accounts in place? I'm not sure if that's done after things would start getting scheduled? Maybe the master nodes could be cordoned until these steps have been verified first.

…nt first This fixes the behaviour described in kubernetes#2529 which was fixed by kubernetes#2590, by avoiding the configure-calico job all together.

ozdanborne mentioned this issue May 18, 2017

replace calico ipip-configure job with node env #2590

Merged

chrislovecnm closed this as completed in #2590 May 18, 2017

ottoyiu mentioned this issue May 26, 2017

Change k8s 1.6 calico manifest's ordering of service account creation #2641

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DNS Container Fails - Kops 1.6 #2529

DNS Container Fails - Kops 1.6 #2529

odehsemreen commented May 8, 2017

igorcanadi commented May 10, 2017

georgebuckerfield commented May 11, 2017 •

edited

Loading

a-chernykh commented May 12, 2017

dolftax commented May 14, 2017

deleonjavier commented May 15, 2017

chrislovecnm commented May 15, 2017

deleonjavier commented May 15, 2017 •

edited

Loading

odehsemreen commented May 16, 2017

chrislovecnm commented May 17, 2017

mikesplain commented May 17, 2017

cbuckley01 commented May 17, 2017

ottoyiu commented May 17, 2017

willtrking commented May 18, 2017 •

edited

Loading

willtrking commented May 18, 2017 •

edited

Loading

odehsemreen commented May 18, 2017

jhuntoo commented May 18, 2017

chrislovecnm commented May 18, 2017

caseydavenport commented May 18, 2017

ozdanborne commented May 18, 2017

blakebarnett commented May 18, 2017

DNS Container Fails - Kops 1.6 #2529

DNS Container Fails - Kops 1.6 #2529

Comments

odehsemreen commented May 8, 2017

igorcanadi commented May 10, 2017

georgebuckerfield commented May 11, 2017 • edited Loading

a-chernykh commented May 12, 2017

dolftax commented May 14, 2017

deleonjavier commented May 15, 2017

chrislovecnm commented May 15, 2017

deleonjavier commented May 15, 2017 • edited Loading

odehsemreen commented May 16, 2017

chrislovecnm commented May 17, 2017

mikesplain commented May 17, 2017

cbuckley01 commented May 17, 2017

ottoyiu commented May 17, 2017

willtrking commented May 18, 2017 • edited Loading

willtrking commented May 18, 2017 • edited Loading

odehsemreen commented May 18, 2017

jhuntoo commented May 18, 2017

chrislovecnm commented May 18, 2017

caseydavenport commented May 18, 2017

ozdanborne commented May 18, 2017

blakebarnett commented May 18, 2017

georgebuckerfield commented May 11, 2017 •

edited

Loading

deleonjavier commented May 15, 2017 •

edited

Loading

willtrking commented May 18, 2017 •

edited

Loading

willtrking commented May 18, 2017 •

edited

Loading