Not able to deploy K8s Cluster with ACS-Engine 14.6 version on azure #2591

rakeshkulkarni6 · 2018-04-04T15:30:18Z

Is this a request for help?:

Is this an ISSUE or FEATURE REQUEST? (choose one):
ISSUE

What version of acs-engine?:

acs-engine v0.14.6

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
kubernetes 1.9.0

What happened:
I have deployed K8S cluster using acs-engine v0.14.6 . After cluster deployes I am not able to see any nodes listed. I have checkd docker images and docker container where any containers are not created.
Docker Images:
root@k8s-master-63864159-0:/var/lib/waagent/Microsoft.Azure.Extensions.CustomScript-2.0.6/status# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
k8s-gcrio.azureedge.net/hyperkube-amd64 v1.9.0 0e4e0ed658bb 3 months ago 618 MB
Docker PS:
root@k8s-master-63864159-0:/var/lib/waagent/Microsoft.Azure.Extensions.CustomScript-2.0.6/status# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
root@k8s-master-63864159-0:/var/lib/waagent/Microsoft.Azure.Extensions.CustomScript-2.0.6/status#

What you expected to happen:
Kubernetes cluster should get deployed with Kubeernetes V 1.9.0 using acs-engine v0.14.6

How to reproduce it (as minimally and precisely as possible):
acs-engine generate
Anything else we need to know:
Extension Status:
root@k8s-master-63864159-0:/var/lib/waagent/Microsoft.Azure.Extensions.CustomScript-2.0.6/status# cat 0.status
[
{
"version": 1,
"timestampUTC": "2018-04-04T15:15:45Z",
"status": {
"operation": "Enable",
"status": "error",
"formattedMessage": {
"lang": "en",
"message": "Enable failed: failed to execute command: command terminated with exit status=3\n[stdout]\n\n[stderr]\n"
}
}
}

I have tried many version from acs-engine v 0.12.0 to 0.14.6 andy of the version is not deploying kubernetes cluster

CecileRobertMichon · 2018-04-04T21:31:01Z

Hi @rakeshkulkarni6 , what does your apimodel look like?

rakeshkulkarni6 · 2018-04-05T11:43:47Z

Hi here is the api model for your reference

{
"apiVersion": "vlabs",
"properties": {
"orchestratorProfile": {
"orchestratorType": "Kubernetes",
"orchestratorRelease": "1.9",
"orchestratorVersion": "1.9.6",
"kubernetesConfig": {
"kubernetesImageBase": "k8s-gcrio.azureedge.net/",
"clusterSubnet": "10.246.0.0/16",
"dnsServiceIP": "10.0.0.10",
"serviceCidr": "10.0.0.0/16",
"networkPolicy": "azure",
"maxPods": 30,
"dockerBridgeSubnet": "172.17.0.1/16",
"useInstanceMetadata": true,
"enableRbac": true,
"enableSecureKubelet": true,
"privateCluster": {
"enabled": false
},
"gchighthreshold": 85,
"gclowthreshold": 80,
"etcdVersion": "3.2.16",
"etcdDiskSizeGB": "130",
"addons": [
{
"name": "tiller",
"enabled": true,
"containers": [
{
"name": "tiller",
"cpuRequests": "50m",
"memoryRequests": "150Mi",
"cpuLimits": "50m",
"memoryLimits": "150Mi"
}
],
"config": {
"max-history": "0"
}
},
{
"name": "aci-connector",
"enabled": false,
"containers": [
{
"name": "aci-connector",
"cpuRequests": "50m",
"memoryRequests": "150Mi",
"cpuLimits": "50m",
"memoryLimits": "150Mi"
}
],
"config": {
"nodeName": "aci-connector",
"os": "Linux",
"region": "westus",
"taint": "azure.com/aci"
}
},
{
"name": "kubernetes-dashboard",
"enabled": true,
"containers": [
{
"name": "kubernetes-dashboard",
"cpuRequests": "300m",
"memoryRequests": "150Mi",
"cpuLimits": "300m",
"memoryLimits": "150Mi"
}
]
},
{
"name": "rescheduler",
"enabled": false,
"containers": [
{
"name": "rescheduler",
"cpuRequests": "10m",
"memoryRequests": "100Mi",
"cpuLimits": "10m",
"memoryLimits": "100Mi"
}
]
},
{
"name": "metrics-server",
"enabled": true,
"containers": [
{
"name": "metrics-server"
}
]
}
],
"kubeletConfig": {
"--address": "0.0.0.0",
"--allow-privileged": "true",
"--anonymous-auth": "false",
"--authorization-mode": "Webhook",
"--azure-container-registry-config": "/etc/kubernetes/azure.json",
"--cadvisor-port": "0",
"--cgroups-per-qos": "true",
"--client-ca-file": "/etc/kubernetes/certs/ca.crt",
"--cloud-config": "/etc/kubernetes/azure.json",
"--cloud-provider": "azure",
"--cluster-dns": "10.0.0.10",
"--cluster-domain": "cluster.local",
"--enforce-node-allocatable": "pods",
"--event-qps": "0",
"--eviction-hard": "memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%",
"--feature-gates": "",
"--image-gc-high-threshold": "85",
"--image-gc-low-threshold": "80",
"--keep-terminated-pod-volumes": "false",
"--kubeconfig": "/var/lib/kubelet/kubeconfig",
"--max-pods": "110",
"--network-plugin": "cni",
"--node-status-update-frequency": "10s",
"--non-masquerade-cidr": "10.246.0.0/16",
"--pod-infra-container-image": "k8s-gcrio.azureedge.net/pause-amd64:3.1",
"--pod-manifest-path": "/etc/kubernetes/manifests"
},
"controllerManagerConfig": {
"--allocate-node-cidrs": "false",
"--cloud-config": "/etc/kubernetes/azure.json",
"--cloud-provider": "azure",
"--cluster-cidr": "10.246.0.0/16",
"--cluster-name": "somadeleteacspoc",
"--cluster-signing-cert-file": "/etc/kubernetes/certs/ca.crt",
"--cluster-signing-key-file": "/etc/kubernetes/certs/ca.key",
"--feature-gates": "ServiceNodeExclusion=true",
"--kubeconfig": "/var/lib/kubelet/kubeconfig",
"--leader-elect": "true",
"--node-monitor-grace-period": "40s",
"--pod-eviction-timeout": "5m0s",
"--profiling": "false",
"--root-ca-file": "/etc/kubernetes/certs/ca.crt",
"--route-reconciliation-period": "10s",
"--service-account-private-key-file": "/etc/kubernetes/certs/apiserver.key",
"--terminated-pod-gc-threshold": "5000",
"--use-service-account-credentials": "true",
"--v": "2"
},
"cloudControllerManagerConfig": {
"--allocate-node-cidrs": "false",
"--cloud-config": "/etc/kubernetes/azure.json",
"--cloud-provider": "azure",
"--cluster-cidr": "10.246.0.0/16",
"--cluster-name": "somadeleteacspoc",
"--kubeconfig": "/var/lib/kubelet/kubeconfig",
"--leader-elect": "true",
"--route-reconciliation-period": "10s",
"--v": "2"
},
"apiServerConfig": {
"--admission-control": "NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota,DenyEscalatingExec,AlwaysPullImages",
"--advertise-address": "",
"--allow-privileged": "true",
"--anonymous-auth": "false",
"--audit-log-maxage": "30",
"--audit-log-maxbackup": "10",
"--audit-log-maxsize": "100",
"--audit-log-path": "/var/log/audit.log",
"--audit-policy-file": "/etc/kubernetes/manifests/audit-policy.yaml",
"--authorization-mode": "Node,RBAC",
"--bind-address": "0.0.0.0",
"--client-ca-file": "/etc/kubernetes/certs/ca.crt",
"--cloud-config": "/etc/kubernetes/azure.json",
"--cloud-provider": "azure",
"--etcd-cafile": "/etc/kubernetes/certs/ca.crt",
"--etcd-certfile": "/etc/kubernetes/certs/etcdclient.crt",
"--etcd-keyfile": "/etc/kubernetes/certs/etcdclient.key",
"--etcd-quorum-read": "true",
"--etcd-servers": "https://127.0.0.1:2379",
"--insecure-port": "8080",
"--kubelet-client-certificate": "/etc/kubernetes/certs/client.crt",
"--kubelet-client-key": "/etc/kubernetes/certs/client.key",
"--profiling": "false",
"--proxy-client-cert-file": "/etc/kubernetes/certs/proxy.crt",
"--proxy-client-key-file": "/etc/kubernetes/certs/proxy.key",
"--repair-malformed-updates": "false",
"--requestheader-allowed-names": "",
"--requestheader-client-ca-file": "/etc/kubernetes/certs/proxy-ca.crt",
"--requestheader-extra-headers-prefix": "X-Remote-Extra-",
"--requestheader-group-headers": "X-Remote-Group",
"--requestheader-username-headers": "X-Remote-User",
"--secure-port": "443",
"--service-account-key-file": "/etc/kubernetes/certs/apiserver.key",
"--service-account-lookup": "true",
"--service-cluster-ip-range": "10.0.0.0/16",
"--storage-backend": "etcd3",
"--tls-cert-file": "/etc/kubernetes/certs/apiserver.crt",
"--tls-private-key-file": "/etc/kubernetes/certs/apiserver.key",
"--v": "4"
},
"schedulerConfig": {
"--kubeconfig": "/var/lib/kubelet/kubeconfig",
"--leader-elect": "true",
"--profiling": "false",
"--v": "2"
}
}
},
"masterProfile": {
"count": 1,
"dnsPrefix": "somadeleteacspoc",
"vmSize": "Standard_E8s_v3",
"vnetSubnetID": "/subscriptions/XXXXXXXXXXXXXXXXXXXXXXXXXXXX/resourceGroups/somadelete3/providers/Microsoft.Network/virtualNetworks/somadeletevnet/subnets/default",
"firstConsecutiveStaticIP": "10.10.1.45",
"storageProfile": "ManagedDisks",
"oauthEnabled": false,
"preProvisionExtension": null,
"extensions": [],
"distro": "ubuntu",
"kubernetesConfig": {
"kubeletConfig": {
"--address": "0.0.0.0",
"--allow-privileged": "true",
"--anonymous-auth": "false",
"--authorization-mode": "Webhook",
"--azure-container-registry-config": "/etc/kubernetes/azure.json",
"--cadvisor-port": "0",
"--cgroups-per-qos": "true",
"--client-ca-file": "/etc/kubernetes/certs/ca.crt",
"--cloud-config": "/etc/kubernetes/azure.json",
"--cloud-provider": "azure",
"--cluster-dns": "10.0.0.10",
"--cluster-domain": "cluster.local",
"--enforce-node-allocatable": "pods",
"--event-qps": "0",
"--eviction-hard": "memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%",
"--feature-gates": "",
"--image-gc-high-threshold": "85",
"--image-gc-low-threshold": "80",
"--keep-terminated-pod-volumes": "false",
"--kubeconfig": "/var/lib/kubelet/kubeconfig",
"--max-pods": "110",
"--network-plugin": "cni",
"--node-status-update-frequency": "10s",
"--non-masquerade-cidr": "10.246.0.0/16",
"--pod-infra-container-image": "k8s-gcrio.azureedge.net/pause-amd64:3.1",
"--pod-manifest-path": "/etc/kubernetes/manifests"
}
}
},
"agentPoolProfiles": [
{
"name": "agentpool1",
"count": 1,
"vmSize": "Standard_E8s_v3",
"osType": "Linux",
"availabilityProfile": "AvailabilitySet",
"storageProfile": "ManagedDisks",
"vnetSubnetID": "/subscriptions/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/resourceGroups/somadelete3/providers/Microsoft.Network/virtualNetworks/somadeletevnet/subnets/default",
"distro": "ubuntu",
"kubernetesConfig": {
"kubeletConfig": {
"--address": "0.0.0.0",
"--allow-privileged": "true",
"--anonymous-auth": "false",
"--authorization-mode": "Webhook",
"--azure-container-registry-config": "/etc/kubernetes/azure.json",
"--cadvisor-port": "0",
"--cgroups-per-qos": "true",
"--client-ca-file": "/etc/kubernetes/certs/ca.crt",
"--cloud-config": "/etc/kubernetes/azure.json",
"--cloud-provider": "azure",
"--cluster-dns": "10.0.0.10",
"--cluster-domain": "cluster.local",
"--enforce-node-allocatable": "pods",
"--event-qps": "0",
"--eviction-hard": "memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%",
"--feature-gates": "Accelerators=true",
"--image-gc-high-threshold": "85",
"--image-gc-low-threshold": "80",
"--keep-terminated-pod-volumes": "false",
"--kubeconfig": "/var/lib/kubelet/kubeconfig",
"--max-pods": "110",
"--network-plugin": "cni",
"--node-status-update-frequency": "10s",
"--non-masquerade-cidr": "10.246.0.0/16",
"--pod-infra-container-image": "k8s-gcrio.azureedge.net/pause-amd64:3.1",
"--pod-manifest-path": "/etc/kubernetes/manifests"
}
},
"fqdn": "",
"preProvisionExtension": null,
"extensions": []
}
],
"linuxProfile": {
"adminUsername": "cloudinfraadmin",
"ssh": {
"publicKeys": [
{
"keyData": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
}
]
}
},
"servicePrincipalProfile": {
"clientId": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
"secret": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
},
"certificateProfile": {
"caCertificate": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
"caPrivateKey": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
"apiServerCertificate": "XXXXXXXXXXXXXXXXXXXXXXXXXXX",
"clientCertificate": "XXXXXXXXXXXXXXXXXXXXXXX",
"clientPrivateKey": "XXXXXXXXXXXXXXXXXXXXXXXXXXXX",
"kubeConfigCertificate": "XXXXXXXXXXXXXXXXXXXXXXXX",
"kubeConfigPrivateKey": "XXXXXXXXXXXXXXXXXXXXXX",
"etcdServerCertificate": "XXXXXXXXXXXXXXXX",
"etcdServerPrivateKey": "XXXXXXXXXXXXXXXXXXXXXXXXX",
"etcdClientCertificate": "XXXXXXXXXXXXX",
"etcdClientPrivateKey": "XXXXXXXXXXXXXXXX",
"etcdPeerCertificates": [
XXXXXXXXXXXXXXXXXXX"
],
"etcdPeerPrivateKeys": [
""
]
}
}
}

rakeshkulkarni6 · 2018-04-05T12:31:36Z

Hi I have tried with new version acs-engine 0.15.0 to deploy kubernetes cluster on azure.
I have generated arm templates using acs-engine generate command and deployed using powershell command.

I am getting below error:
New-AzureRmResourceGroupDeployment : 5:54:07 PM - Resource Microsoft.Compute/virtualMachines/extensions
'k8s-master-63864159-0/cse0' failed with message '{
"status": "Failed",
"error": {
"code": "ResourceDeploymentFailure",
"message": "The resource operation completed with terminal provisioning state 'Failed'.",
"details": [
{
"code": "VMExtensionProvisioningError",
"message": "VM has reported a failure when processing extension 'cse0'. Error message: "Enable failed: failed to execute
command: command terminated with exit status=3\n[stdout]\n\n[stderr]\n"."
}
]
}
}'
At line:1 char:1

New-AzureRmResourceGroupDeployment -Name smatestpoc -ResourceGroupNam ...

  + CategoryInfo          : NotSpecified: (:) [New-AzureRmResourceGroupDeployment], Exception
  + FullyQualifiedErrorId : Microsoft.Azure.Commands.ResourceManager.Cmdlets.Implementation.NewAzureResourceGroupDeploymentCmdlet

New-AzureRmResourceGroupDeployment : 5:54:07 PM - VM has reported a failure when processing extension 'cse0'. Error message:
"Enable failed: failed to execute command: command terminated with exit status=3
[stdout]
[stderr]
".
At line:1 char:1

New-AzureRmResourceGroupDeployment -Name smatestpoc -ResourceGroupNam ...

  + CategoryInfo          : NotSpecified: (:) [New-AzureRmResourceGroupDeployment], Exception
  + FullyQualifiedErrorId : Microsoft.Azure.Commands.ResourceManager.Cmdlets.Implementation.NewAzureResourceGroupDeploymentCmdlet

CecileRobertMichon · 2018-04-05T16:34:58Z

@rakeshkulkarni6 please remove all secrets/keys from the apimodel you shared

CecileRobertMichon · 2018-04-05T18:49:38Z

@rakeshkulkarni6 can you please try deploying with k8s 1.9.6? There are known upstream bugs in 1.9.0

rakeshkulkarni6 · 2018-04-06T08:04:05Z

Hi @CecileRobertMichon ,

Again its sowing same error, Please find the below error details.
it seeme the extension is not able to provision script to deploy Kubernetes cluster.

New-AzureRmResourceGroupDeployment : 1:24:27 PM - Resource Microsoft.Compute/virtualMachines/extensions
'k8s-master-63864159-0/cse0' failed with message '{
"status": "Failed",
"error": {
"code": "ResourceDeploymentFailure",
"message": "The resource operation completed with terminal provisioning state 'Failed'.",
"details": [
{
"code": "VMExtensionProvisioningError",
"message": "VM has reported a failure when processing extension 'cse0'. Error message: "Enable failed: failed to execute
command: command terminated with exit status=3\n[stdout]\n\n[stderr]\n"."
}
]
}
}'
At line:1 char:1

New-AzureRmResourceGroupDeployment -Name smatestpoc -ResourceGroupNam ...

  + CategoryInfo          : NotSpecified: (:) [New-AzureRmResourceGroupDeployment], Exception
  + FullyQualifiedErrorId : Microsoft.Azure.Commands.ResourceManager.Cmdlets.Implementation.NewAzureResourceGroupDeploymentCmdlet

New-AzureRmResourceGroupDeployment : 1:24:27 PM - VM has reported a failure when processing extension 'cse0'. Error message:
"Enable failed: failed to execute command: command terminated with exit status=3
[stdout]
[stderr]
".
At line:1 char:1

New-AzureRmResourceGroupDeployment -Name smatestpoc -ResourceGroupNam ...

  + CategoryInfo          : NotSpecified: (:) [New-AzureRmResourceGroupDeployment], Exception
  + FullyQualifiedErrorId : Microsoft.Azure.Commands.ResourceManager.Cmdlets.Implementation.NewAzureResourceGroupDeploymentCmdlet

New-AzureRmResourceGroupDeployment : 1:24:27 PM - Template output evaluation skipped: at least one resource deployment operation
failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details.
At line:1 char:1

New-AzureRmResourceGroupDeployment -Name smatestpoc -ResourceGroupNam ...

  + CategoryInfo          : NotSpecified: (:) [New-AzureRmResourceGroupDeployment], Exception
  + FullyQualifiedErrorId : Microsoft.Azure.Commands.ResourceManager.Cmdlets.Implementation.NewAzureResourceGroupDeploymentCmdlet

New-AzureRmResourceGroupDeployment : 1:24:27 PM - Template output evaluation skipped: at least one resource deployment operation
failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details.
At line:1 char:1

New-AzureRmResourceGroupDeployment -Name smatestpoc -ResourceGroupNam ...

  + CategoryInfo          : NotSpecified: (:) [New-AzureRmResourceGroupDeployment], Exception
  + FullyQualifiedErrorId : Microsoft.Azure.Commands.ResourceManager.Cmdlets.Implementation.NewAzureResourceGroupDeploymentCmdlet

rakeshkulkarni6 · 2018-04-07T03:56:20Z

I have used vlabs in apiversion and kept dnd prefix empty in agentpoolprofile.still I am getting above error while deploying can you please help I need to setup production cluster using ACS-ENGINE

rakidu · 2018-04-07T11:41:12Z

I got similar error when deploying cluster using 14.6 acs-engine version.
"error": {
"code": "ResourceDeploymentFailure",
"message": "The resource operation completed with terminal provisioning state 'Failed'.",
"details": [
{
"code": "VMExtensionProvisioningError",
"message": "VM has reported a failure when processing extension 'cse0'. Error message: "Enable failed: failed to execute

I deleted the deployment and redeployed the cluster in to a new resource group with kubernetes version 1.9.5

it got deployed successfully and working fine with out any errors.

CecileRobertMichon · 2018-04-09T17:47:24Z

@rakeshkulkarni6 @rakidu I am trying to repro this error. It looks like 14.6 might have introduced a regression causing transient deployment errors (possibly a race condition). In the meantime, if you retry you might get lucky and get a working cluster @rakeshkulkarni6. If you still have a cluster that failed with this error can you please share the content of /var/log/azure/cluster-provision.log on your first master please?

bobjac · 2018-04-12T22:17:02Z

I am seeing a similar issue with version 15.1 of acs-engine and version 1.8.9 of kubernetes. I am trying to deploy an cluster into an existing vNet. The vNet has multiple subnets, but I am getting the same error when deploying master & agents to the same subnet or splitting the master & agents into different subnets.

rncwnd79 · 2018-04-13T09:17:30Z

Same here with 15.2, kubernetes 1.9.6 and 3 masters, 3 agents.
In azure portal I get the already documented error message for all 3 masters (cse0, cs1, cse2).

The last lines from /var/log/azure/cluster-provision.log:
++ /usr/local/bin/kubectl get nodes
++ grep Ready
++ wc -l
+ nodes=5
+ '[' 5 -eq 6 ']'
+ sleep 1
+ '[' 1 -ne 0 ']'
+ echo 'still waiting for active nodes after 1800 seconds'
still waiting for active nodes after 1800 seconds
+ exit 3

@CecileRobertMichon I tried now many times hoping to get the cluster deployed once, but had no luck...

htuomola · 2018-04-16T07:14:12Z

I also got this, acs-engine 0.15.2, kubernetes 1.9.6 and 1 master, 4 agents.

cluster-provision.log

Kubernetes master is running at http://localhost:8080

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
+ '[' 1 = 0 ']'
+ sleep 1
+ for i in '{1..600}'
+ /usr/local/bin/kubectl cluster-info
Kubernetes master is running at http://localhost:8080

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
+ '[' 1 = 0 ']'
+ sleep 1
+ '[' 1 -ne 0 ']'
+ echo 'k8s cluster is not healthy after 600 seconds'
k8s cluster is not healthy after 600 seconds
+ exit 3

Interestingly, if I run kubectl cluster-info manually, it hangs after posting the master status (which is shown with FQDN, contrary to the log above which has localhost).
With kubectl cluster-info dump I only get:
Unable to connect to the server: dial tcp <master public IP>:443: i/o timeout

Edit: in my case, it was resolved by deleting the cluster and re-creating.

htuomola · 2018-04-16T08:18:40Z

Err, I'll take that (edit) back - it deployed on second time but without heapster, dashboard, kubedns and tiller.

> kubectl cluster-info
Kubernetes master is running at https://<dns>

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

CecileRobertMichon · 2018-04-16T16:55:09Z

v0.16.0 will be released this week with a number of improvements to address the above deployment errors including #2641, #2625, #2639, #2650 and #2666. cc: @jackfrancis

lehtiton · 2018-04-17T06:03:57Z

Is there any configuration of which it is possible to generate Kubernetes cluster at the moment. I have tried quite a many orchestratorRelease & orchestratorVersion combinations and getting the same error with all of the trials: "VM has reported a failure when processing extension 'cse0'."

Here is my latest trial:
{
"apiVersion": "vlabs",
"properties": {
"orchestratorProfile": {
"orchestratorType": "Kubernetes",
"orchestratorRelease": "1.10",
"orchestratorVersion": "1.10.0",
"kubernetesConfig": {
"privateCluster": {
"enabled": true
}
}
},
"masterProfile": {
"count": 1,
"dnsPrefix": "acsengine01",
"vmSize": "Standard_D2_v2",
"vnetSubnetId": "",
"firstConsecutiveStaticIP": "",
"vnetCidr": ""
},
"agentPoolProfiles": [
{
"name": "agentpool1",
"count": 3,
"vmSize": "Standard_D2_v2",
"availabilityProfile": "AvailabilitySet",
"vnetSubnetId": ""
}
],
"linuxProfile": {
"adminUsername": "",
"ssh": {
"publicKeys": [
{
"keyData": ""
}
]
}
},
"servicePrincipalProfile": {
"clientId": "",
"secret": ""
}
}
}

CecileRobertMichon · 2018-04-17T06:19:04Z

@lehtiton this isn't a bug with a specific configuration but rather transient vm provisioning errors which result in one or more of the nodes not being ready in a certain amount of time. The improvements I mentioned above aim to catch those errors and add retries and timeouts to better handle infrastructure flakiness. If you are seeing 100% failures, please send me the content of /var/log/azure/cluster-provision.log and the output of kubectl get nodes so I can help you debug. Also please try to deploy a cluster without using a custom vnet to make sure this isn't something to do with your network config.

lehtiton · 2018-04-18T20:40:52Z

@CecileRobertMichon thanks for your help. I managed finally getting rid of this issue by changing something in the configs (I guess). I have tried so many times with so many different configurations that cannot keep book any more of those. However, I still faced a couple of issues that I commented to another issue #2476 in case you would have any hints how to work on those. I also shared my configurations there.

ankitsingh11 · 2018-09-26T06:54:26Z

@CecileRobertMichon I am trying to create cluster using acs-engine v0.16.0 in China East2 which is a new region but there are lot of errors while running the custom execution script in the master VM. Have changed lot of configurations and the registry Url, able to fetch the images in the master VM but still the provisioning fails with an exit code 30.

echo 'k8s cluster is not healthy after 600 seconds'
k8s cluster is not healthy after 600 seconds
exit 30

Please help

CecileRobertMichon · 2018-09-26T16:33:12Z

@ankitsingh11 please refer to https://github.com/Azure/acs-engine/blob/master/docs/kubernetes/troubleshooting.md#vmextensionprovisioningerror-or-vmextensionprovisioningtimeout for a guide on troubleshooting these issues and open a new issue if you still need help. I will close this one for now as it is outdated.

Any reason why you are using acs-engine v0.16.0? The latest versions contain a lot of improvements for vm extensions.

rakeshkulkarni6 changed the title ~~Not able K8s Cluster with ACS-Engine 14.6 version~~ Not able to deploy K8s Cluster with ACS-Engine 14.6 version on azure Apr 4, 2018

CecileRobertMichon mentioned this issue Apr 6, 2018

Unable to create Kubernetes cluster using acs-engine #2595

Closed

CecileRobertMichon closed this as completed Sep 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not able to deploy K8s Cluster with ACS-Engine 14.6 version on azure #2591

Not able to deploy K8s Cluster with ACS-Engine 14.6 version on azure #2591

rakeshkulkarni6 commented Apr 4, 2018

CecileRobertMichon commented Apr 4, 2018

rakeshkulkarni6 commented Apr 5, 2018 •

edited

Loading

rakeshkulkarni6 commented Apr 5, 2018

CecileRobertMichon commented Apr 5, 2018

CecileRobertMichon commented Apr 5, 2018

rakeshkulkarni6 commented Apr 6, 2018

rakeshkulkarni6 commented Apr 7, 2018

rakidu commented Apr 7, 2018

CecileRobertMichon commented Apr 9, 2018

bobjac commented Apr 12, 2018

rncwnd79 commented Apr 13, 2018 •

edited

Loading

htuomola commented Apr 16, 2018 •

edited

Loading

htuomola commented Apr 16, 2018

CecileRobertMichon commented Apr 16, 2018

lehtiton commented Apr 17, 2018

CecileRobertMichon commented Apr 17, 2018

lehtiton commented Apr 18, 2018 •

edited

Loading

ankitsingh11 commented Sep 26, 2018

CecileRobertMichon commented Sep 26, 2018 •

edited

Loading

Not able to deploy K8s Cluster with ACS-Engine 14.6 version on azure #2591

Not able to deploy K8s Cluster with ACS-Engine 14.6 version on azure #2591

Comments

rakeshkulkarni6 commented Apr 4, 2018

Is this an ISSUE or FEATURE REQUEST? (choose one): ISSUE

What version of acs-engine?:

CecileRobertMichon commented Apr 4, 2018

rakeshkulkarni6 commented Apr 5, 2018 • edited Loading

rakeshkulkarni6 commented Apr 5, 2018

CecileRobertMichon commented Apr 5, 2018

CecileRobertMichon commented Apr 5, 2018

rakeshkulkarni6 commented Apr 6, 2018

rakeshkulkarni6 commented Apr 7, 2018

rakidu commented Apr 7, 2018

CecileRobertMichon commented Apr 9, 2018

bobjac commented Apr 12, 2018

rncwnd79 commented Apr 13, 2018 • edited Loading

htuomola commented Apr 16, 2018 • edited Loading

htuomola commented Apr 16, 2018

CecileRobertMichon commented Apr 16, 2018

lehtiton commented Apr 17, 2018

CecileRobertMichon commented Apr 17, 2018

lehtiton commented Apr 18, 2018 • edited Loading

ankitsingh11 commented Sep 26, 2018

CecileRobertMichon commented Sep 26, 2018 • edited Loading

Is this an ISSUE or FEATURE REQUEST? (choose one):
ISSUE

rakeshkulkarni6 commented Apr 5, 2018 •

edited

Loading

rncwnd79 commented Apr 13, 2018 •

edited

Loading

htuomola commented Apr 16, 2018 •

edited

Loading

lehtiton commented Apr 18, 2018 •

edited

Loading

CecileRobertMichon commented Sep 26, 2018 •

edited

Loading