Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

Azure China Cloud deploy failure: etcd cluster is unavailable or misconfigured #1894

Closed
datamesh-oss opened this issue Dec 7, 2017 · 3 comments · Fixed by #1901
Closed

Comments

@datamesh-oss
Copy link

Is this a request for help?:
Yes


Is this an ISSUE or FEATURE REQUEST? (choose one):
ISSUE


What version of acs-engine?:
0.10.0


Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
Kubernetes 0.8.4

What happened:
API server exits with log:

I1207 01:42:38.991032       1 reflector.go:240] Listing and watching *apps.ControllerRevision from storage/cacher.go:/controllerrevisions
E1207 01:42:38.991265       1 cacher.go:277] unexpected ListAndWatch error: storage/cacher.go:/controllerrevisions: Failed to list *apps.ControllerRevision: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 127.0.0.1:2379: getsockopt: connection refused
I1207 01:42:39.078440       1 insecure_handler.go:118] Serving insecurely on 0.0.0.0:8080
I1207 01:42:39.080145       1 serve.go:85] Serving securely on 0.0.0.0:443
I1207 01:42:39.080296       1 crd_finalizer.go:242] Starting CRDFinalizer
I1207 01:42:39.080642       1 naming_controller.go:277] Starting NamingConditionController
I1207 01:42:39.080662       1 apiservice_controller.go:112] Starting APIServiceRegistrationController
I1207 01:42:39.080686       1 cache.go:32] Waiting for caches to sync for APIServiceRegistrationController controller
I1207 01:42:39.080997       1 reflector.go:202] Starting reflector *v1.Endpoints (10m0s) from k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73
I1207 01:42:39.081043       1 reflector.go:240] Listing and watching *v1.Endpoints from k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73
F1207 01:42:39.081139       1 controller.go:128] Unable to perform initial IP allocation check: unable to refresh the service IP block: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 127.0.0.1:2379: getsockopt: connection refused

What you expected to happen:
Deploy Succeeded with no error.

How to reproduce it (as minimally and precisely as possible):
Use the following template on Azure China Cloud.

{
  "apiVersion": "vlabs",
  "location": "chinanorth",
  "properties": {
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "orchestratorVersion": "1.8.4"
    },
    "masterProfile": {
      "count": 1,
      "dnsPrefix": "just-test",
      "vmSize": "Standard_DS3",
      "storageProfile" : "ManagedDisks",
      "osDiskSizeGB": 200
    },
    "agentPoolProfiles": [
      {
        "name": "agentpool1",
        "count": 2,
        "vmSize": "Standard_DS3",
        "storageProfile" : "ManagedDisks",
        "availabilityProfile": "AvailabilitySet",
        "osDiskSizeGB": 200
      }
    ],
    "linuxProfile": {
      "adminUsername": "user",
      "ssh": {
        "publicKeys": [
          {
            "keyData": "xxxxx"
          }
        ]
      }
    },
    "servicePrincipalProfile": {
      "clientId": "xxxxx",
      "secret": "xxxx"
    }
  }
}

Anything else we need to know:
Actually recently I've spent lots of time experimenting on both Azure Public Cloud and Azure China Cloud. As far as I tested, ACS on Azure Public Cloud works fine for all versions. However, the latest working version for China Cloud is the v1.7.9 and v1.8.2. Version 1.7.10 and 1.8.3 and above will encounter deployment failures mainly the API server can not be connected and the az group deployment create command stuck forever.

I know China Cloud is quite special and very much outdated, and can easily be broken when ACS updated, but can you guys run more tests against azure sovereign cloud?

@andyzhangx
Copy link
Contributor

@pengzhisun it's a bug, etcdDownloadURLBase value of mooncake template is empty

@datamesh-oss as a workaround, could you fill in the etcdDownloadURLBase value by yourself? thanks.

vi _output/dnsPrefix/azuredeploy.parameters.json

    "etcdDownloadURLBase": {
      "value": "https://acs-mirror.azureedge.net/github-coreos"
    },

@pengzhisun
Copy link
Member

@andyzhangx Thanks, have fixed this bug.
@datamesh-oss you could also use the release v0.9.4, it should be worked for Azure China Cloud.

@datamesh-oss
Copy link
Author

@andyzhangx @pengzhisun Thank you guys! It did fix the deploy problem by adding the etcdDownloadURLBase . The test cluster is up and running now (acs v0.10.0, k8s v1.8.4):

$ kubectl get nodes
NAME                        STATUS    ROLES     AGE       VERSION
k8s-agentpool1-36380237-0   Ready     agent     3m        v1.8.4
k8s-agentpool1-36380237-1   Ready     agent     3m        v1.8.4
k8s-master-36380237-0       Ready     master    3m        v1.8.4

$ kubectl get pod -n kube-system
NAME                                            READY     STATUS    RESTARTS   AGE
heapster-cc6cdcb64-6z9lj                        2/2       Running   0          2m
kube-addon-manager-k8s-master-36380237-0        1/1       Running   0          3m
kube-apiserver-k8s-master-36380237-0            1/1       Running   0          2m
kube-controller-manager-k8s-master-36380237-0   1/1       Running   0          2m
kube-dns-v20-689456f759-q6zvr                   3/3       Running   0          3m
kube-dns-v20-689456f759-zhc5c                   3/3       Running   0          3m
kube-proxy-6dg7l                                1/1       Running   0          3m
kube-proxy-qrt9v                                1/1       Running   0          3m
kube-proxy-x67nx                                1/1       Running   0          3m
kube-scheduler-k8s-master-36380237-0            1/1       Running   0          2m
kubernetes-dashboard-c7c6d6c46-6ss4d            1/1       Running   1          3m
tiller-deploy-7b869f6865-f6gxc                  1/1       Running   0          3m

I will later upgrade our working cluster. Thanks again!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants