Recently we discovered a few ACS kubernetes clusters that were not responding after system reboots of the Master VM's. All ACS RPv1 clusters deployed will see this issue and ACS RPv2 clusters deployed after Fri Oct 20 06:49:20 PDT 2017 will see this issue.
Upon investigation, we found out that that this was due to etcd not restarting.
As a fix, we set etcd2 restart to 'always' after rebooting master VM's. This issue has been fixed and has being rolled out to RPv2 resions.
To fix this issue manually, please run the following commands on all the master nodes in your cluster
+- sudo /bin/sed -i s/Restart=on-abnormal/Restart=always/g /lib/systemd/system/etcd.service
+- systemctl daemon-reload
List of all ACS RPv1 regions:
-
australiasoutheast
-
northeurope
-
brazilsouth
-
australiaeast
-
japaneast
-
northcentralus
-
westus
-
eastasia
-
eastus2
-
southcentralus
-
southeastasia
-
eastus
-
westeurope
-
Centralus
List of RPv2 regions:
-
UK West
-
UK South
-
West Central US
-
West US 2
-
Canada East
-
Canada Central
-
West India
-
South India
-
Central India
-
japanwest