Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes not working after hard shutdown #404

Closed
philipn opened this issue Jul 26, 2016 · 9 comments
Closed

Kubernetes not working after hard shutdown #404

philipn opened this issue Jul 26, 2016 · 9 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@philipn
Copy link

philipn commented Jul 26, 2016

I just upgraded to minikube 0.7.0 from 0.6.0. I'm on OS X and I installed using the OS X installation instructions. To upgrade, I ran the OS X curl installation instructions. My minikube machine was running at the time I attempted my upgrade.

After upgrade, I see:

[philip@laptop k8]$ minikube version
minikube version: v0.7.0
[philip@laptop k8]$ minikube start
Starting local Kubernetes cluster...
Kubernetes is available at https://192.168.99.100:8443.
Kubectl is now configured to use the cluster.
[philip@laptop k8]$ minikube version
minikube version: v0.7.0
[philip@laptop k8]$ minikube status
Running
[philip@laptop k8]$ minikube get-k8s-versions
The following Kubernetes versions are available: 
    - v1.3.0
[philip@laptop k8]$ kubectl get pods
The connection to the server 192.168.99.100:8443 was refused - did you specify the right host or port?
[philip@laptop k8]$ kubectl config current-context
minikube
[philip@laptop k8]$ cat ~/.kube/config 
apiVersion: v1
clusters:
- cluster:
    certificate-authority: /Users/philip/.minikube/ca.crt
    server: https://192.168.99.100:8443
  name: minikube
contexts:
- context:
    cluster: minikube
    user: minikube
  name: minikube
current-context: minikube
kind: Config
preferences: {}
users:
- name: minikube
  user:
    client-certificate: /Users/philip/.minikube/apiserver.crt
    client-key: /Users/philip/.minikube/apiserver.key
[philip@laptop k8]$ 
[philip@laptop k8]$ minikube ssh 
                        ##         .
                  ## ## ##        ==
               ## ## ## ## ##    ===
           /"""""""""""""""""\___/ ===
      ~~~ {~~ ~~~~ ~~~ ~~~~ ~~~ ~ /  ===- ~~~
           \______ o           __/
             \    \         __/
              \____\_______/
 _                 _   ____     _            _
| |__   ___   ___ | |_|___ \ __| | ___   ___| | _____ _ __
| '_ \ / _ \ / _ \| __| __) / _` |/ _ \ / __| |/ / _ \ '__|
| |_) | (_) | (_) | |_ / __/ (_| | (_) | (__|   <  __/ |
|_.__/ \___/ \___/ \__|_____\__,_|\___/ \___|_|\_\___|_|
Boot2Docker version 1.11.1, build master : 901340f - Fri Jul  1 22:52:19 UTC 2016
Docker version 1.11.1, build 5604cbe
docker@minikubeVM:~$ docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
docker@minikubeVM:~$ 

My workaround was to run minikube delete; minikube start and then re-create my kubernetes cluster.

@dlorenc
Copy link
Contributor

dlorenc commented Jul 27, 2016

Hmm, I just tried and wasn't able to repro this. Did you happen to run "minikube logs" before the delete?

@philipn
Copy link
Author

philipn commented Jul 27, 2016

@dlorenc It may be unrelated to the upgrade procedure, as I just had this happen again with minikube.

Here is the output of minikube logs:

docker@minikubeVM:~$ [philip@laptop shotwell]$ minikube logs
==> /var/lib/localkube/localkube.err <==
Starting etcd...

==> /var/lib/localkube/localkube.out <==
I0727 03:53:36.145632    1573 server.go:202] Using iptables Proxier.
I0727 03:53:36.145753    1573 server.go:215] Tearing down userspace rules.
E0727 03:53:36.268708    1573 reflector.go:205] pkg/proxy/config/api.go:30: Failed to list *api.Service: Get http://127.0.0.1:8080/api/v1/services?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E0727 03:53:36.268844    1573 reflector.go:205] pkg/proxy/config/api.go:33: Failed to list *api.Endpoints: Get http://127.0.0.1:8080/api/v1/endpoints?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
2016-07-27 03:53:36.401476 I | etcdserver: recovered store from snapshot at index 20002
2016-07-27 03:53:36.401529 I | etcdserver: name = kubeetcd
2016-07-27 03:53:36.401540 I | etcdserver: data dir = /var/lib/localkube/etcd
2016-07-27 03:53:36.401552 I | etcdserver: member dir = /var/lib/localkube/etcd/member
2016-07-27 03:53:36.401562 I | etcdserver: heartbeat = 100ms
2016-07-27 03:53:36.401571 I | etcdserver: election = 1000ms
2016-07-27 03:53:36.401580 I | etcdserver: snapshot count = 10000
2016-07-27 03:53:36.401603 I | etcdserver: advertise client URLs = http://localhost:2379
2016-07-27 03:53:36.622914 C | etcdserver: read wal error (walpb: crc mismatch) and cannot be repaired

I had previously hard-shutdown my Mac, so my guess is this caused irreparable corruption. I wasn't expecting kubernetes (via etcd) to be corrupted so easily, though. Is it expected that a single-node kubernetes cluster (e.g. minikube) cannot withstand a hard shutdown?

This may be related to etcd-io/etcd#5857 and etcd-io/etcd#5862, but I don't know enough about etcd to say -- if anyone knows better, please chime in!

@philipn philipn changed the title Kubernetes not working after upgrade to 0.7.0 Kubernetes not working after hard shutdown Jul 27, 2016
@dlorenc dlorenc added the kind/bug Categorizes issue or PR as related to a bug. label Aug 11, 2016
@zonarbot
Copy link

I am able to reproduce this by having a misconfigured 3rd party resource in my kubernetes cluster. My only solution to this is to delete/reistall minikube.

from minikube logs:

016-10-31 20:23:15.174411 I | etcdserver: recovered store from snapshot at index 590059
2016-10-31 20:23:15.174457 I | etcdserver: name = dns
2016-10-31 20:23:15.174466 I | etcdserver: data dir = /var/lib/localkube/dns
2016-10-31 20:23:15.174475 I | etcdserver: member dir = /var/lib/localkube/dns/member
2016-10-31 20:23:15.174481 I | etcdserver: heartbeat = 100ms
2016-10-31 20:23:15.174487 I | etcdserver: election = 1000ms
2016-10-31 20:23:15.174493 I | etcdserver: snapshot count = 10000
2016-10-31 20:23:15.174506 I | etcdserver: advertise client URLs = http://localhost:49090
panic: runtime error: index out of range

goroutine 191 [running]:
panic(0x30e4aa0, 0xc420010110)
/usr/local/go/src/runtime/panic.go:500 +0x1a1
k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/master.(_Master).InstallThirdPartyResource(0xc420077800, 0xc421d1fe00, 0xc422143d00, 0x0)
/var/lib/jenkins/go2/src/k8s.io/minikube/_gopath/src/k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/master/master.go:694 +0xa86
k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/master.(_ThirdPartyController).SyncOneResource(0xc421e65f20, 0xc421d1fe00, 0x1, 0x1)
/var/lib/jenkins/go2/src/k8s.io/minikube/_gopath/src/k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/master/thirdparty_controller.go:90 +0x79
k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/master.(_ThirdPartyController).syncResourceList(0xc421e65f20, 0x5243b40, 0xc4222cf680, 0x0, 0x5243b40)
/var/lib/jenkins/go2/src/k8s.io/minikube/_gopath/src/k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/master/thirdparty_controller.go:119 +0x1b7
k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/master.(_ThirdPartyController).SyncResources(0xc421e65f20, 0xc421f1d678, 0x406c64)
/var/lib/jenkins/go2/src/k8s.io/minikube/_gopath/src/k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/master/thirdparty_controller.go:101 +0x88
k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/master.ExtensionsRESTStorageProvider.v1beta1Storage.func1()
/var/lib/jenkins/go2/src/k8s.io/minikube/_gopath/src/k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/master/storage_extensions.go:82 +0x30
k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/util/wait.JitterUntil.func1(0xc42050b050)
/var/lib/jenkins/go2/src/k8s.io/minikube/_gopath/src/k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:84 +0x19
k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/util/wait.JitterUntil(0xc42050b050, 0x2540be400, 0x0, 0x30312d3631303201, 0xc420053200)
/var/lib/jenkins/go2/src/k8s.io/minikube/_gopath/src/k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:85 +0xad
k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/util/wait.Until(0xc42050b050, 0x2540be400, 0xc420053200)
/var/lib/jenkins/go2/src/k8s.io/minikube/_gopath/src/k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:47 +0x4d
k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/util/wait.Forever(0xc42050b050, 0x2540be400)
/var/lib/jenkins/go2/src/k8s.io/minikube/_gopath/src/k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:39 +0x41
created by k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/master.ExtensionsRESTStorageProvider.v1beta1Storage
/var/lib/jenkins/go2/src/k8s.io/minikube/_gopath/src/k8s.io/minikube/vendor/k8s.io/kubernetes/pkg/master/storage_extensions.go:85 +0x1dd5

==> /var/lib/localkube/localkube.out <==
Starting etcd...
Starting apiserver...
Starting controller-manager...
Starting scheduler...
Starting kubelet...
Starting proxy...
Starting dns...

minikube version
minikube version: v0.12.0

@r2d4
Copy link
Contributor

r2d4 commented Oct 31, 2016

Possibly related to #660

@cuiyz
Copy link

cuiyz commented Dec 2, 2016

I got the same issue, but I did not hard shut down my OS. I'm running it on OS X, after I start the cluster use "minikube start", I got this:

$ minikube logs
==> /var/lib/localkube/localkube.err <==
I1202 01:38:21.528534 2941 server.go:203] Using iptables Proxier.
W1202 01:38:21.529000 2941 server.go:426] Failed to retrieve node info: Get http://127.0.0.1:8080/api/v1/nodes/minikube: dial tcp 127.0.0.1:8080: getsockopt: connection refused
W1202 01:38:21.529063 2941 proxier.go:226] invalid nodeIP, initialize kube-proxy with 127.0.0.1 as nodeIP
I1202 01:38:21.529076 2941 server.go:215] Tearing down userspace rules.
E1202 01:38:21.550651 2941 reflector.go:203] pkg/proxy/config/api.go:30: Failed to list *api.Service: Get http://127.0.0.1:8080/api/v1/services?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
2016-12-02 01:38:21.550707 I | etcdserver: name = kubeetcd
2016-12-02 01:38:21.550726 I | etcdserver: data dir = /var/lib/localkube/etcd
2016-12-02 01:38:21.550733 I | etcdserver: member dir = /var/lib/localkube/etcd/member
2016-12-02 01:38:21.550739 I | etcdserver: heartbeat = 100ms
2016-12-02 01:38:21.550744 I | etcdserver: election = 1000ms
2016-12-02 01:38:21.550749 I | etcdserver: snapshot count = 10000
2016-12-02 01:38:21.550761 I | etcdserver: advertise client URLs = http://localhost:2379
2016-12-02 01:38:21.550769 I | etcdserver: initial advertise peer URLs = http://localhost:2380
2016-12-02 01:38:21.550782 I | etcdserver: initial cluster = kubeetcd=http://localhost:2380
E1202 01:38:21.555261 2941 reflector.go:203] pkg/proxy/config/api.go:33: Failed to list *api.Endpoints: Get http://127.0.0.1:8080/api/v1/endpoints?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
2016-12-02 01:38:21.556434 I | etcdserver: starting member 37807cb0bf7500f6 in cluster 2c833ae9c7555b5e
2016-12-02 01:38:21.556483 I | raft: 37807cb0bf7500f6 became follower at term 0
2016-12-02 01:38:21.556494 I | raft: newRaft 37807cb0bf7500f6 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2016-12-02 01:38:21.556500 I | raft: 37807cb0bf7500f6 became follower at term 1
2016-12-02 01:38:21.560478 I | etcdserver: starting server... [version: 3.0.6, cluster version: to_be_decided]
2016-12-02 01:38:21.564860 I | membership: added member 37807cb0bf7500f6 [http://localhost:2380] to cluster 2c833ae9c7555b5e
E1202 01:38:21.564900 2941 server.go:75] unable to register configz: register config "componentconfig" twice
E1202 01:38:21.565649 2941 controllermanager.go:125] unable to register configz: register config "componentconfig" twice
E1202 01:38:21.567589 2941 leaderelection.go:252] error retrieving endpoint: Get http://127.0.0.1:8080/api/v1/namespaces/kube-system/endpoints/kube-scheduler: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1202 01:38:21.567639 2941 reflector.go:214] k8s.io/minikube/vendor/k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:414: Failed to list *extensions.ReplicaSet: Get http://127.0.0.1:8080/apis/extensions/v1beta1/replicasets?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1202 01:38:21.567667 2941 reflector.go:214] k8s.io/minikube/vendor/k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:409: Failed to list *api.ReplicationController: Get http://127.0.0.1:8080/api/v1/replicationcontrollers?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1202 01:38:21.567692 2941 reflector.go:214] k8s.io/minikube/vendor/k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:404: Failed to list *api.Service: Get http://127.0.0.1:8080/api/v1/services?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1202 01:38:21.567715 2941 reflector.go:214] k8s.io/minikube/vendor/k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:399: Failed to list *api.PersistentVolumeClaim: Get http://127.0.0.1:8080/api/v1/persistentvolumeclaims?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1202 01:38:21.567739 2941 reflector.go:214] k8s.io/minikube/vendor/k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:398: Failed to list *api.PersistentVolume: Get http://127.0.0.1:8080/api/v1/persistentvolumes?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1202 01:38:21.567770 2941 reflector.go:214] k8s.io/minikube/vendor/k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:394: Failed to list *api.Node: Get http://127.0.0.1:8080/api/v1/nodes?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1202 01:38:21.567798 2941 reflector.go:214] k8s.io/minikube/vendor/k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:391: Failed to list *api.Pod: Get http://127.0.0.1:8080/api/v1/pods?fieldSelector=spec.nodeName%21%3D%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded&resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1202 01:38:21.567825 2941 reflector.go:214] k8s.io/minikube/vendor/k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:388: Failed to list *api.Pod: Get http://127.0.0.1:8080/api/v1/pods?fieldSelector=spec.nodeName%3D%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded&resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1202 01:38:21.567851 2941 leaderelection.go:252] error retrieving endpoint: Get http://127.0.0.1:8080/api/v1/namespaces/kube-system/endpoints/kube-controller-manager: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1202 01:38:21.567869 2941 server.go:294] unable to register configz: register config "componentconfig" twice
W1202 01:38:21.567897 2941 server.go:549] Could not load kubeconfig file /var/lib/kubelet/kubeconfig: stat /var/lib/kubelet/kubeconfig: no such file or directory. Using default client config instead.
I1202 01:38:21.570777 2941 genericapiserver.go:629] Will report 10.0.2.15 as public IP address.
W1202 01:38:21.581167 2941 cacher.go:469] Terminating all watchers from cacher *api.ResourceQuota
I1202 01:38:21.581821 2941 conntrack.go:40] Setting nf_conntrack_max to 131072
I1202 01:38:21.582400 2941 conntrack.go:57] Setting conntrack hashsize to 32768
I1202 01:38:21.582753 2941 conntrack.go:62] Setting nf_conntrack_tcp_timeout_established to 86400
W1202 01:38:21.584116 2941 cacher.go:469] Terminating all watchers from cacher *api.PodTemplate
W1202 01:38:21.584387 2941 cacher.go:469] Terminating all watchers from cacher *api.LimitRange
E1202 01:38:21.584948 2941 event.go:208] Unable to write event: 'Post http://127.0.0.1:8080/api/v1/namespaces/default/events: dial tcp 127.0.0.1:8080: getsockopt: connection refused' (may retry after sleeping)
E1202 01:38:21.585101 2941 reflector.go:214] k8s.io/minikube/vendor/k8s.io/kubernetes/plugin/pkg/admission/resourcequota/resource_access.go:83: Failed to list *api.ResourceQuota: Get http://127.0.0.1:8080/api/v1/resourcequotas?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1202 01:38:21.585172 2941 reflector.go:214] k8s.io/minikube/vendor/k8s.io/kubernetes/plugin/pkg/admission/storageclass/default/admission.go:74: Failed to list *storage.StorageClass: Get http://127.0.0.1:8080/apis/storage.k8s.io/v1beta1/storageclasses?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1202 01:38:21.585243 2941 reflector.go:214] k8s.io/minikube/vendor/k8s.io/kubernetes/plugin/pkg/admission/serviceaccount/admission.go:119: Failed to list *api.Secret: Get http://127.0.0.1:8080/api/v1/secrets?fieldSelector=type%3Dkubernetes.io%2Fservice-account-token&resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1202 01:38:21.585683 2941 reflector.go:214] k8s.io/minikube/vendor/k8s.io/kubernetes/plugin/pkg/admission/serviceaccount/admission.go:103: Failed to list *api.ServiceAccount: Get http://127.0.0.1:8080/api/v1/serviceaccounts?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1202 01:38:21.585761 2941 reflector.go:203] k8s.io/minikube/vendor/k8s.io/kubernetes/plugin/pkg/admission/limitranger/admission.go:154: Failed to list *api.LimitRange: Get http://127.0.0.1:8080/api/v1/limitranges?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
....
....

@cuiyz
Copy link

cuiyz commented Dec 2, 2016

@philipn It seems my api server:8080 is not started correctly, like your 8443. Did you get any progress on this?

@philipn
Copy link
Author

philipn commented Dec 2, 2016

@cuiyz It hasn't happened to me since my reports earlier. I'm still on the same minikube version as when reported. I wasn't able to reproduce similar corruption behavior in repeated hard (partial and full) cluster shutdown tests of our production K8s cluster, so I didn't investigate this further. I'll comment here again if I see this happen again on a newer minikube release (I'm on v0.7.0 for parity with our production K8s).

@r2d4
Copy link
Contributor

r2d4 commented Dec 2, 2016

@philipn you can continue to use new versions of minikube while pinning a kubernetes version

minikube config set kubernetes-version v1.4.5

@r2d4
Copy link
Contributor

r2d4 commented May 31, 2017

This should be fixed in 0.19.1

@r2d4 r2d4 closed this as completed May 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

5 participants