Do not check for nodes Ready in CSE on masters #2650

jackfrancis · 2018-04-10T16:47:26Z

What this PR does / why we need it:

Remove master nodes "check for all nodes Ready" in CSE. The objective with this change is to fail faster when any agent node does not have a healthy kubelet after provisioning. We introduce a new kubelet status check on all nodes, which will fail CSE if not healthy.

By waiting n minutes for master node provisioning to return CSE status (until nodes Ready check times out), we add unnecessary delay to the eventual non-zero ARM deployment result.

This PR also includes a new, automatic "vanilla kubernetes" E2E test (default config, single, small agent pool) as a tactic towards speeding up the debug process for problematic code changes; and adding standardized coverage for acs-engine kubernetes default config.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Special notes for your reviewer:

If applicable:

documentation
unit tests
tested backward compatibility (ie. deploy with previous version, upgrade with this branch)

Release note:

Do not check for nodes Ready in CSE on masters

EPinci · 2018-04-10T17:35:24Z

Per @jackfrancis request, this is a test that demonstrate current all nodes check is flaky in upgrade/scale situations.

Deploy a kubernetes cluster with at least 2 nodes in a single nodepool
In my case: kubernetes v1.9.3 with 3 masters and 3 nodes in single nodepool
Cluster needs to have useInstanceMetadata set to false for the autoscaler to work
Deploy to the cluster the official kubernetes autoscaler v1.2 to the cluster
Autoscaler needs to be configured to allow nodepool to shrink to 1
Since cluster has no load, the nodepool will be reduced to a single node
Use ACS-Engine to upgrade the cluster
In my case: to v1.9.6
At the end of the redeployment of the first VM, script will be stuck for about with the below log.
Note that the reported Ready nodes is 4 (all of them) but the check is testing against 6 (original size of the cluster).

+ for i in '{1..1800}'
++ wc -l
++ grep Ready
++ /usr/local/bin/kubectl get nodes
+ nodes=4
+ '[' 4 -eq 6 ']'
+ sleep 1
+ '[' 1 -ne 0 ']'
+ echo 'still waiting for active nodes after 1800 seconds'
still waiting for active nodes after 1800 seconds
+ exit 3

This will fail the entire upgrade process.
This will also impact #2637.

jackfrancis · 2018-04-10T17:37:33Z

Thanks for adding that @EPinci. If you're able to walk through the above scenario using this PR and see a different (successful) outcome, that will provide evidence that we should prioritize this work.

EPinci · 2018-04-10T17:47:10Z

Yup, I'll run through it on my test cluster and report back.

On a second thought, however, instead of totally skipping the node check, can it be beneficial to just check the local node readiness?
Something like:

nodeready=$(${KUBECTL} get nodes 2>/dev/null | grep $HOSTNAME | grep 'Ready' | wc -l) 
  if [ $nodeready -eq "1" ]

jackfrancis · 2018-04-10T19:46:57Z

@EPinci That would largely amount to the same thing. The ARM deployment will return CSE success based on the set of all CSE executions having a zero exit code. There is a difference between "waiting a certain amount of time on each master node for all nodes to be ready" and "waiting a certain amount of time on each node (masters + agents) for itself to check in as ready with the control plane", but that difference is not meaningful from the perspective of "validate that all the nodes are Ready".

(I'm also eliding over the fact that we don't currently set up kubectl w/ cluster context on the agent nodes, so that alternate check would involve introducing that.)

Spoken generally, what we're checking for now is "up to a certain period of time after provisioning, do we see a functional k8s cluster w/ the full set of Ready nodes participating behind the apiserver at least once?" Admittedly, checking something like that is arguably a k8s anti-pattern, as clusters are designed to be 100% functional with node participation fluidity; but, doing that check exposes valuable behavioral data about the real-world effects of our post-provisioning, app-layer configuration implementation. We want 100% end state outcome achievement for known-working cluster configurations, and so in order to know and protect against novel edge cases, we need such a check to be operable over time.

However, it is also arguable that such a check does not need to concern a user's experience building a new cluster. Should we rather return control to the user when their cluster has a functioning apiserver and nodes that are able to schedule workloads? In addition to giving users a few minutes of their time back (especially for clusters with high node counts), it is arguably more k8s idiomatic to do so.

However (part deux), I would argue that we don't want to make that change until the project has confidence that CI/CD is continually doing the comprehensive end state outcome validation for all code changes going forward. Without that check, the resilience of our cluster provision scripts will decay and leave users with working clusters, but with intermittent non-working nodes (which they may be paying for if we successfully provision a VM but don't successfully join it to the cluster).

EPinci · 2018-04-10T21:35:14Z

I run this PR thru a test. Knocked out my test cluster and I could not understand why.
I redeployed my original setup (3 masters, 3 nodes, kube v1.9.3) and manually destroyed two nodes.
Upgrade run for master0 and master1 but immediately failed for master2 with the following:

INFO[1873] Starting ARM Deployment (master-18-04-10T23.13.13-781252198). This will take some time...
INFO[1919] Finished ARM Deployment (master-18-04-10T23.13.13-781252198). Error: resources.DeploymentsClient#CreateOrUpdate: Failure sending request: StatusCode=200 -- Original Error: Long running operation terminated with status 'Failed': Code="DeploymentFailed" Message="At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details."
�[36mINFO�[0m[1920] Error creating upgraded master VM: k8s-master-93283987-2
FATA[1920] Error upgrading cluster: resources.DeploymentsClient#CreateOrUpdate: Failure sending request: StatusCode=200 -- Original Error: Long running operation terminated with status 'Failed': Code="DeploymentFailed" Message="At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details."

and the deployment failure is "subnet full" and that is weird because I have a dedicated /24 subnet for three master with ipAddressCount set to 20.
I checked the subnet and it has, in fact, got full because master0 and master1 now have allocated 111 address each. I take it something else is going really wrong...

jackfrancis · 2018-04-12T15:55:35Z

test/e2e/runner/cli_provisioner.go

@@ -233,7 +233,7 @@ func (cli *CLIProvisioner) FetchProvisioningMetrics(path string, cfg *config.Con
 	agentFiles := []string{"/var/log/azure/cluster-provision.log", "/var/log/cloud-init.log",
 		"/var/log/cloud-init-output.log", "/var/log/syslog", "/var/log/azure/custom-script/handler.log",
 		"/opt/m", "/opt/azure/containers/kubelet.sh", "/opt/azure/containers/provision.sh",
-		"/opt/azure/provision-ps.log", "/var/log/azure/dnsdump.pcap"}


/var/log/azure/dnsdump.pcap is no longer a file generated by provision scripts

CecileRobertMichon · 2018-04-12T16:55:16Z

parts/k8s/kubernetescustomscript.sh

-        retrycmd_if_failure 100 1 10 systemctl daemon-reload && systemctl restart kubelet
-    fi
+    retrycmd_if_failure 10d 1 10 systemctl daemon-reload && systemctl restart kubelet
+    retrycmd_if_failure 10 1 3 systemctl status kubelet --no-pager -l > /var/log/azure/kubelet-status.log


We should consider doing this for other daemons as well

CecileRobertMichon

lgtm

making total nodes optional

f8262da

ghost assigned jackfrancis Apr 10, 2018

ghost added the in progress label Apr 10, 2018

jackfrancis mentioned this pull request Apr 10, 2018

K8s upgrade error when using RBAC and AAD #2061

Closed

prototype KubernetesDebug

415fd27

jackfrancis added the DO-NOT-MERGE label Apr 10, 2018

jackfrancis added 2 commits April 10, 2018 15:54

kubernetes debug if kubernetes

b744305

golang bool reads “True”

d1658f7

jackfrancis changed the title ~~WIP: CSE all nodes Ready check is optional~~ CSE all nodes Ready check is opt-in (KubernetesDebug) Apr 10, 2018

jackfrancis removed the DO-NOT-MERGE label Apr 10, 2018

jackfrancis added 4 commits April 11, 2018 15:08

check for kubelet daemon even if reboot incoming

3e2ea9e

E2E_DEBUG and vanilla k8s test

315a963

kubernetes debug is generic

2d23255

stop checking for nodes ready altogether

87c6809

jackfrancis changed the title ~~CSE all nodes Ready check is opt-in (KubernetesDebug)~~ Do not check for nodes Ready in CSE on masters Apr 12, 2018

jackfrancis commented Apr 12, 2018

View reviewed changes

CecileRobertMichon reviewed Apr 12, 2018

View reviewed changes

CecileRobertMichon approved these changes Apr 12, 2018

View reviewed changes

jackfrancis merged commit 3041523 into Azure:master Apr 12, 2018

ghost removed the in progress label Apr 12, 2018

jackfrancis deleted the total_nodes_debug branch April 12, 2018 17:48

EPinci mentioned this pull request Apr 12, 2018

Upgrade results in node with 111 IPs #2668

Closed

CecileRobertMichon mentioned this pull request Apr 16, 2018

Not able to deploy K8s Cluster with ACS-Engine 14.6 version on azure #2591

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not check for nodes Ready in CSE on masters #2650

Do not check for nodes Ready in CSE on masters #2650

jackfrancis commented Apr 10, 2018 •

edited

Loading

EPinci commented Apr 10, 2018

jackfrancis commented Apr 10, 2018

EPinci commented Apr 10, 2018

jackfrancis commented Apr 10, 2018 •

edited

Loading

EPinci commented Apr 10, 2018

jackfrancis Apr 12, 2018

CecileRobertMichon Apr 12, 2018

CecileRobertMichon left a comment

Do not check for nodes Ready in CSE on masters #2650

Do not check for nodes Ready in CSE on masters #2650

Conversation

jackfrancis commented Apr 10, 2018 • edited Loading

EPinci commented Apr 10, 2018

jackfrancis commented Apr 10, 2018

EPinci commented Apr 10, 2018

jackfrancis commented Apr 10, 2018 • edited Loading

EPinci commented Apr 10, 2018

jackfrancis Apr 12, 2018

Choose a reason for hiding this comment

CecileRobertMichon Apr 12, 2018

Choose a reason for hiding this comment

CecileRobertMichon left a comment

Choose a reason for hiding this comment

jackfrancis commented Apr 10, 2018 •

edited

Loading

jackfrancis commented Apr 10, 2018 •

edited

Loading