Digital Ocean Provisioning not working (provisioner unable to connect to internet) #527

tianhuil · 2017-12-25T04:02:36Z

Hi, I'm trying to use the new DO provisioner. I'm running into this problem:

I have followed the instructions in the README.md. Yet when I run the test commands

kubectl create -f examples/pvc.yaml
kubectl create -f examples/pod-application.yaml

I get the error below (see Error below) which suggests that the provisioner is not allowed to speak with DO. I know that the security token is working as, by loading the token on my dev machine, I am able to run this command and view all my droplets:

$ curl -X GET "https://api.digitalocean.com/v2/droplets" -H "Authorization: Bearer $TOKEN" > /tmp/droplets.json

Error Message

I1225 01:10:29.946936       1 main.go:51] Provisioner external/digitalocean specified
I1225 01:10:29.947095       1 main.go:65] Building kube configs for running in cluster...
I1225 01:10:30.263427       1 controller.go:407] Starting provisioner controller 6644ad75-e910-11e7-a729-0a580af40110!
I1225 01:11:42.192272       1 controller.go:1080] scheduleOperation[lock-provision-default/pv1[9123185d-e910-11e7-901c
-a2fa2817e02b]]
I1225 01:11:42.245201       1 controller.go:1080] scheduleOperation[lock-provision-default/pv1[9123185d-e910-11e7-901c
-a2fa2817e02b]]
I1225 01:11:42.357717       1 leaderelection.go:156] attempting to acquire leader lease...
I1225 01:11:42.445066       1 leaderelection.go:178] successfully acquired lease to provision for pvc default/pv1
I1225 01:11:42.445604       1 controller.go:1080] scheduleOperation[provision-default/pv1[9123185d-e910-11e7-901c-a2fa
2817e02b]]
I1225 01:11:45.269755       1 controller.go:1080] scheduleOperation[provision-default/pv1[9123185d-e910-11e7-901c-a2fa
2817e02b]]
I1225 01:12:00.270052       1 controller.go:1080] scheduleOperation[provision-default/pv1[9123185d-e910-11e7-901c-a2fa
2817e02b]]
E1225 01:12:12.544855       1 provision.go:145] Failed to create volume {Delete pvc-9123185d-e910-11e7-901c-a2fa2817e0
2b &PersistentVolumeClaim{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:pv1,GenerateName:,Namespace:
default,SelfLink:/api/v1/namespaces/default/persistentvolumeclaims/pv1,UID:9123185d-e910-11e7-901c-a2fa2817e02b,Resour
ceVersion:863536,Generation:0,CreationTimestamp:2017-12-25 01:11:42 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePer
iodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{},OwnerReferences:[],Finalizers:[],ClusterName
:,Initializers:nil,},Spec:PersistentVolumeClaimSpec{AccessModes:[ReadWriteOnce],Resources:ResourceRequirements{Limits:
ResourceList{},Requests:ResourceList{storage: {{1048576 0} {<nil>} 1Mi BinarySI},},},VolumeName:,Selector:nil,StorageC
lassName:*default,},Status:PersistentVolumeClaimStatus{Phase:Pending,AccessModes:[],Capacity:ResourceList{},Conditions
:[],},} map[zone:nyc1]}, error: Post https://api.digitalocean.com/v2/volumes: dial tcp: i/o timeout
E1225 01:12:12.545468       1 controller.go:808] Failed to provision volume for claim "default/pv1" with StorageClass
"default": Post https://api.digitalocean.com/v2/volumes: dial tcp: i/o timeout
E1225 01:12:12.545561       1 goroutinemap.go:165] Operation for "provision-default/pv1[9123185d-e910-11e7-901c-a2fa28
17e02b]" failed. No retries permitted until 2017-12-25 01:12:13.045507602 +0000 UTC m=+105.497482293 (durationBeforeRe
try 500ms). Error: Post https://api.digitalocean.com/v2/volumes: dial tcp: i/o timeout
I1225 01:12:14.145504       1 leaderelection.go:198] stopped trying to renew lease to provision for pvc default/pv1, t
ask failed
I1225 01:12:15.270247       1 controller.go:1080] scheduleOperation[provision-default/pv1[9123185d-e910-11e7-901c-a2fa
2817e02b]]
I1225 01:12:30.270540       1 controller.go:1080] scheduleOperation[provision-default/pv1[9123185d-e910-11e7-901c-a2fa
2817e02b]]
I1225 01:12:45.439176       1 controller.go:1080] scheduleOperation[lock-provision-default/pv1[9123185d-e910-11e7-901c
-a2fa2817e02b]]
E1225 01:12:45.439237       1 provision.go:145] Failed to create volume {Delete pvc-9123185d-e910-11e7-901c-a2fa2817e0
2b &PersistentVolumeClaim{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:pv1,GenerateName:,Namespace:
default,SelfLink:/api/v1/namespaces/default/persistentvolumeclaims/pv1,UID:9123185d-e910-11e7-901c-a2fa2817e02b,Resour
ceVersion:863623,Generation:0,CreationTimestamp:2017-12-25 01:11:42 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePer
iodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{control-plane.alpha.kubernetes.io/leader: {"ho
lderIdentity":"6644ad75-e910-11e7-a729-0a580af40110","leaseDurationSeconds":15,"acquireTime":"2017-12-25T01:11:42Z","r
enewTime":"2017-12-25T01:12:14Z","leaderTransitions":0},volume.beta.kubernetes.io/storage-provisioner: external/digita
locean,},OwnerReferences:[],Finalizers:[],ClusterName:,Initializers:nil,},Spec:PersistentVolumeClaimSpec{AccessModes:[
ReadWriteOnce],Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{storage: {{1048576 0} {<nil>
} 1Mi BinarySI},},},VolumeName:,Selector:nil,StorageClassName:*default,},Status:PersistentVolumeClaimStatus{Phase:Pend
ing,AccessModes:[],Capacity:ResourceList{},Conditions:[],},} map[zone:nyc1]}, error: Post https://api.digitalocean.com
/v2/volumes: dial tcp: i/o timeout
E1225 01:12:45.439404       1 controller.go:808] Failed to provision volume for claim "default/pv1" with StorageClass
"default": Post https://api.digitalocean.com/v2/volumes: dial tcp: i/o timeout

DNS Error?:

It appears that this is a network issue. When I ssh into the shell, I'm unable to resolve either k8s or external DNS querie:

$ kubectl -it exec --namespace=kube-system digitalocean-provisioner-6c7bbf4ccc-s2g8n -- /bin/sh
/ # nslookup kubernetes.default 100.64.0.10
Server:    100.64.0.10
Address 1: 100.64.0.10

nslookup: can't resolve 'kubernetes.default': Try again
/ # nslookup kubernetes.default.svc.cluster.local 100.64.0.10
Server:    100.64.0.10
Address 1: 100.64.0.10

nslookup: can't resolve 'kubernetes.default.svc.cluster.local': Try again
/ # cat /etc/resolv.conf
nameserver 10.96.0.10
search kube-system.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
/ # nslookup nodejs.org 8.8.8.8
Server:    8.8.8.8
Address 1: 8.8.8.8

nslookup: can't resolve 'nodejs.org': Try again
/ # nslookup kubernetes.default.svc.cluster.local
nslookup: can't resolve '(null)': Name does not resolve

nslookup: can't resolve 'kubernetes.default.svc.cluster.local': Try again

Any clue? Particularly @klausenbusk

The text was updated successfully, but these errors were encountered:

klausenbusk · 2017-12-25T22:46:49Z

Any clue? Particularly @klausenbusk

This sounds like cluster issue.

Some questions:

Does networking work for regular pods?
How was the cluster created?
What do you use for pod networking? (flannel, calico, something else?)

tianhuil · 2017-12-26T00:26:19Z

Thanks @klausenbusk. This was a DNS issue.

In the provisioner pod, ip addresses work, but not not DNS.
In the flannel pod, both ip addresses and DNS work.

The difference seems to be the nameservers:

In the provisioner pod, only the k8s nameserver is there
In the flannel pod, the DO public nameservers are there:

$ kubectl -it exec --namespace=kube-system digitalocean-provisioner-6c7bbf4ccc-s2g8n -- more /etc/resolv.conf
nameserver 10.96.0.10
search kube-system.svc.cluster.local svc.cluster.local cluster.local
ptions ndots:5
$ kubectl -it exec --namespace=kube-system kube-flannel-ds-29cb5 -- more /etc/resolv.conf
Defaulting container name to kube-flannel.
Use 'kubectl describe pod/kube-flannel-ds-29cb5' to see all of the containers in this pod.
nameserver 67.207.67.2
nameserver 67.207.67.3

Once I manually added in nameserver 67.207.67.2 to the provisioner's /etc/resolv.conf, I was fine. To answer your remaining questions, the clusters are created on DO following these tutorials:

https://thenewstack.io/tutorial-run-multi-node-kubernetes-cluster-digitalocean/
http://thenewstack.io/tutorial-configuring-ultimate-development-environment-kubernetes/
and I have installed flannel into the system.

Any idea how to solve this more automatically @klausenbusk?

klausenbusk · 2017-12-26T01:15:57Z

Any idea how to solve this more automatically @klausenbusk?

Is kube-dns running? It could be this issue: kubernetes/kubeadm#587

tianhuil · 2017-12-26T02:54:53Z

Apparently, it's suppose to inherit the node's nameservers (as the flannel pod does) but this is not happening in the provisioner. See this note. Nor does it seem to be happening in the other pods I create.

tianhuil · 2017-12-26T02:57:18Z

I believe kube-dns is running:

$ kubectl get deployment --namespace=kube-system
NAME                       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
digitalocean-provisioner   2         2         2            2           1d
kube-dns                   1         1         1            1           8d
kubernetes-dashboard       1         1         1            1           8d
tiller-deploy              1         1         1            1           8d

tianhuil · 2017-12-26T18:59:27Z

To document, appending to nameservers to the /etc/resolv.confs appears to solve this problem. Closing issue.

klausenbusk · 2017-12-26T20:56:44Z

To document, appending to nameservers to the /etc/hosts appears to solve this problem. Closing issue.

Did you mean /etc/resolv.conf? It sounds like a workaround, you should fix the underlying issue.

tianhuil · 2017-12-26T21:00:58Z

Yes, sorry, /etc/resolv.conf. And I fixed the earlier comment to avoid future confusion. It's definitely a workaround. According to this, pods are supposed to inherit their node's nameservers. For some reason, that's not happening here. I'll have to continue investigating.

klausenbusk · 2017-12-26T21:11:21Z

According to this, pods are supposed to inherit their node's nameservers. For some reason, that's not happening here. I'll have to continue investigating.

The DNS pod is exposed as a Kubernetes Service with a static IP. Once assigned the kubelet passes DNS configured using the --cluster-dns=<dns-service-ip> flag to each container.

Probably set by /etc/systemd/system/kubelet.service.d/10-kubeadm.conf, as the scripts you used use kubeadm. Note: Removing --cluster-dns=<dns-service-ip> is also a workaround, you should fix the underlying issue, which is probably related to flannel/kube-dns/kube-proxy.

tianhuil · 2017-12-27T03:49:01Z

Documenting in case people care. This was fixed by upgraidng to flannel v0.9.1, which contains a fix to the DNS issue: flannel-io/flannel#872. Thanks to @klausenbusk for pointing out the solution!

mattinsler · 2018-01-02T12:49:47Z

@tianhuil did you have to make changes to your kube-controller-manager to make this work properly?

klausenbusk · 2018-01-02T14:08:49Z

@mattinsler correct, please see #529 for the final solution. I'm happy to help if you have any issues.

tianhuil · 2018-01-02T14:10:06Z

@mattinsler: to be specific:

upgrading to flannel v0.9.1 fixed this specific issue with the DNS: network/iptables: Add iptables rules to FORWARD chain flannel-io/flannel#872
I did have to do Digital Ocean external volume test failing #529 to finally get it working (because of a volume visibility issue)

mattinsler · 2018-01-02T14:53:44Z

Ahh OK. I'm stuck on how to execute 529. I'm still new to k8s... If you wouldn't mind, how would I (after using kubeadm to get a working cluster) add the correct config to the kube-controller-manager and restart it? Also, what exactly is the correct config? I've tried a bunch of things out and different ways to try to update or restart kube-controller-manager and I've just torn down and re-created the cluster each time because I get stuck.

tianhuil · 2018-01-02T14:56:28Z

Hi Matt, No worries - the exact change depends on your system. You need to ensure your output looks like this: #529 (comment) Instructions are here #529 (comment)

On Tue, Jan 2, 2018 at 9:53 AM Matt Insler ***@***.***> wrote: Ahh OK. I'm stuck on how to execute 529. I'm still new to k8s... If you wouldn't mind, how would I (after using kubeadm to get a working cluster) add the correct config to the kube-controller-manager and restart it? Also, what exactly is the correct config? I've tried a bunch of things out and different ways to try to update or restart kube-controller-manager and I've just torn down and re-created the cluster each time because I get stuck. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#527 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB0WkaZpFUuiYMEHTCzDVVX4s3aUkEsbks5tGkL6gaJpZM4RMGH5> .

-- Michael Li Training and placing PhDs to be data scientists: http://www.thedataincubator.com (Venture Beat) New bootcamp more competitive than Harvard <http://venturebeat.com/2014/04/15/ny-gets-new-bootcamp-for-data-scientists-its-free-but-harder-to-get-into-than-harvard/> (HBR) The question to ask before hiring a data scientist <http://blogs.hbr.org/2014/08/the-question-to-ask-before-hiring-a-data-scientist/>

mattinsler · 2018-01-03T02:43:24Z

Sorry, I'm still really new. So I edited my /etc/kubernetes/manifests/kube-controller-manager.yaml file and the pod is now gone and has not come back again. How do I figure out what happened? How do I make it start up again?

tianhuil · 2018-01-03T03:47:53Z

No worries -- so edits to the file automatically propagate to the pod. If you made a mistake, I'm not sure what happens but it might kill the pod.

If you save a backup copy, will the pod re-appear? You might also try kubectl create -f /etc/kubernetes/manifests/kube-controller-manager.yaml or just restart the cluster.

mattinsler · 2018-01-03T04:51:24Z

Hmm, OK. I tried to create it and the mounts are successful, but the logs say:

I0103 04:43:29.595193       1 controllermanager.go:108] Version: v1.7.12
stat /etc/kubernetes/controller-manager.conf: no such file or directory

Describing the pod shows:

    Mounts:
      /etc/kubernetes from k8s (ro)
      /etc/pki from pki (rw)
      /etc/ssl/certs from certs (rw)
      /usr/libexec/kubernetes/kubelet-plugins/volume/exec/ from flexvolume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-smw5m (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
Volumes:
  k8s:
    Type:  HostPath (bare host directory volume)
    Path:  /etc/kubernetes
  certs:
    Type:  HostPath (bare host directory volume)
    Path:  /etc/ssl/certs
  pki:
    Type:  HostPath (bare host directory volume)
    Path:  /etc/pki
  flexvolume:
    Type:  HostPath (bare host directory volume)
    Path:  /usr/libexec/kubernetes/kubelet-plugins/volume/exec/
  default-token-smw5m:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-smw5m
    Optional:    false

There is definitely a file at /etc/kubernetes/controller-manager.conf on the master. Maybe the pod needs to be explicitly run on the master? Is it easy to do that?

tianhuil · 2018-01-03T05:19:15Z

What version of k8s are you using? I've been using 1.9.0. Are you on 1.7.12?

mattinsler · 2018-01-03T05:50:50Z

Yes. 1.7.12. I might just upgrade to 1.9 though if that’s easier.

…

On Wed, Jan 3, 2018 at 12:19 AM Michael Li ***@***.***> wrote: What version of k8s are you using? I've been using 1.9.0. Are you on 1.7.12? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#527 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAL0gZ10cuSU1tJD5hg40-edAiBMLhBSks5tGw3VgaJpZM4RMGH5> .

mattinsler · 2018-01-03T13:55:17Z

Got it working! I created a new cluster on 1.9 and made the changes to the kube-controller-manager.yaml file before adding any nodes. Things just worked that way.

klausenbusk · 2018-01-03T14:01:04Z

Got it working! I created a new cluster on 1.9 and made the changes to the kube-controller-manager.yaml file before adding any nodes. Things just worked that way.

FWIW, if /usr/libexec/kubernetes/kubelet-plugins/volume/exec exists kubeadm automatic add the required hostPath volume.

mattinsler · 2018-01-03T14:36:40Z

Ahh, good to know!

klausenbusk · 2018-01-25T12:00:13Z

/area digitalocean

Just for the record.

hcabnettek · 2018-05-11T04:40:00Z

@klausenbusk @tianhuil I think I am having this same issue. I install a fresh cluster 1.10.2 on a Digital Ocean droplet, etc... Following the tutorials ... Eventually led me here. The provisioner pods don't have the nameserver records either. I had to manually add them to /etc/resolv.conf. Even after doing that no volumes are ever created. When I kubectl describe pvc/pv1 I see Failed to provision volume with StorageClass "default": invalid character 'U' looking for beginning of value How can I fix this? I verify I am able to use doctl and create a volume.

klausenbusk · 2018-05-11T22:30:03Z

I install a fresh cluster 1.10.2 on a Digital Ocean droplet, etc...

Which tool did you use?

hcabnettek · 2018-05-15T21:45:13Z

@klausenbusk I used kubeadm. Sorry I just saw your reply. I can't get past this error no matter what I try. =(

klausenbusk · 2018-05-16T21:13:03Z

@klausenbusk I used kubeadm. Sorry I just saw your reply. I can't get past this error no matter what I try. =(

Lets continue the discussion in #761

The snapshot workflow is being changed to use the CAS templates. (openebs/maya kubernetes-retired#602 kubernetes-retired#527) Until it's updated to use the CAS templates way of creating the snapshots, disabling them from CI. Signed-off-by: prateekpandey14 <[email protected]>

tianhuil changed the title ~~Digital Ocean Provisioning not working~~ Digital Ocean Provisioning not working (provisioner unable to connect to internet) Dec 25, 2017

tianhuil closed this as completed Dec 26, 2017

tianhuil mentioned this issue Dec 26, 2017

Digital Ocean external volume test failing #529

Closed

k8s-ci-robot added the area/digitalocean label Jan 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Digital Ocean Provisioning not working (provisioner unable to connect to internet) #527

Digital Ocean Provisioning not working (provisioner unable to connect to internet) #527

tianhuil commented Dec 25, 2017 •

edited

Loading

klausenbusk commented Dec 25, 2017

tianhuil commented Dec 26, 2017 •

edited

Loading

klausenbusk commented Dec 26, 2017

tianhuil commented Dec 26, 2017 •

edited

Loading

tianhuil commented Dec 26, 2017

tianhuil commented Dec 26, 2017 •

edited

Loading

klausenbusk commented Dec 26, 2017

tianhuil commented Dec 26, 2017

klausenbusk commented Dec 26, 2017 •

edited

Loading

tianhuil commented Dec 27, 2017 •

edited

Loading

mattinsler commented Jan 2, 2018

klausenbusk commented Jan 2, 2018

tianhuil commented Jan 2, 2018

mattinsler commented Jan 2, 2018

tianhuil commented Jan 2, 2018 via email

mattinsler commented Jan 3, 2018

tianhuil commented Jan 3, 2018 •

edited

Loading

mattinsler commented Jan 3, 2018

tianhuil commented Jan 3, 2018

mattinsler commented Jan 3, 2018 via email

mattinsler commented Jan 3, 2018

klausenbusk commented Jan 3, 2018

mattinsler commented Jan 3, 2018

klausenbusk commented Jan 25, 2018

hcabnettek commented May 11, 2018

klausenbusk commented May 11, 2018

hcabnettek commented May 15, 2018

klausenbusk commented May 16, 2018

Digital Ocean Provisioning not working (provisioner unable to connect to internet) #527

Digital Ocean Provisioning not working (provisioner unable to connect to internet) #527

Comments

tianhuil commented Dec 25, 2017 • edited Loading

Error Message

DNS Error?:

klausenbusk commented Dec 25, 2017

tianhuil commented Dec 26, 2017 • edited Loading

klausenbusk commented Dec 26, 2017

tianhuil commented Dec 26, 2017 • edited Loading

tianhuil commented Dec 26, 2017

tianhuil commented Dec 26, 2017 • edited Loading

klausenbusk commented Dec 26, 2017

tianhuil commented Dec 26, 2017

klausenbusk commented Dec 26, 2017 • edited Loading

tianhuil commented Dec 27, 2017 • edited Loading

mattinsler commented Jan 2, 2018

klausenbusk commented Jan 2, 2018

tianhuil commented Jan 2, 2018

mattinsler commented Jan 2, 2018

tianhuil commented Jan 2, 2018 via email

mattinsler commented Jan 3, 2018

tianhuil commented Jan 3, 2018 • edited Loading

mattinsler commented Jan 3, 2018

tianhuil commented Jan 3, 2018

mattinsler commented Jan 3, 2018 via email

mattinsler commented Jan 3, 2018

klausenbusk commented Jan 3, 2018

mattinsler commented Jan 3, 2018

klausenbusk commented Jan 25, 2018

hcabnettek commented May 11, 2018

klausenbusk commented May 11, 2018

hcabnettek commented May 15, 2018

klausenbusk commented May 16, 2018

tianhuil commented Dec 25, 2017 •

edited

Loading

tianhuil commented Dec 26, 2017 •

edited

Loading

tianhuil commented Dec 26, 2017 •

edited

Loading

tianhuil commented Dec 26, 2017 •

edited

Loading

klausenbusk commented Dec 26, 2017 •

edited

Loading

tianhuil commented Dec 27, 2017 •

edited

Loading

tianhuil commented Jan 3, 2018 •

edited

Loading