Replace liveness and readiness with initialization node controller #1186

suket22 · 2022-01-19T18:20:48Z

1. Issue, if available:
#1135

2. Description of changes:
Earlier we would only delete worker nodes iff the kubelet would have a status of NodeStatusNeverUpdated for more than 15 minutes. If this were the case, it meant the Kubelet never connected to the API Server at all.

We now delete worker nodes iff the kubelet is in NotReady status for more than 15 minutes after startup. The mechanism we use to discover that a node has only been started is by looking at a taint we apply during node object creation. Once the taint has been removed, we declare the startup as complete and no longer re-evaluate the node for deletion. We will re-introduce something like a liveness controller in the future if necessary but we want to be very careful with that to maintain static stability of the cluster.

3. How was this change tested?
Case 1 - I removed the CNI policy from the Node Role. Nodes were stuck in NotReady and I can see them being terminated after 15 minutes.

2022-01-19T17:51:22.341Z	INFO	controller.provisioning	Batched 4 pods in 1.101064515s	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T17:51:22.551Z	INFO	controller.provisioning	Computed packing of 1 node(s) for 4 pod(s) with instance type option(s) [c1.xlarge c4.2xlarge c3.2xlarge c6i.2xlarge c5ad.2xlarge c5d.2xlarge c5a.2xlarge c5.2xlarge c5n.2xlarge m3.2xlarge m5dn.2xlarge m5a.2xlarge t3.2xlarge m6i.2xlarge m5ad.2xlarge t3a.2xlarge m5n.2xlarge m4.2xlarge m5zn.2xlarge m5d.2xlarge]	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T17:51:24.811Z	INFO	controller.provisioning	Launched instance: i-01de791cd273012e1, hostname: ip-192-168-187-149.us-west-2.compute.internal, type: t3a.2xlarge, zone: us-west-2b, capacityType: on-demand	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T17:51:24.852Z	INFO	controller.provisioning	Bound 4 pod(s) to node ip-192-168-187-149.us-west-2.compute.internal	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T17:51:24.852Z	INFO	controller.provisioning	Waiting for unschedulable pods	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T17:57:21.288Z	INFO	controller.provisioning	Batched 4 pods in 1.050060518s	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T17:57:21.650Z	INFO	controller.provisioning	Computed packing of 1 node(s) for 4 pod(s) with instance type option(s) [c1.xlarge c4.2xlarge c3.2xlarge c5ad.2xlarge c5d.2xlarge c5a.2xlarge c5.2xlarge c6i.2xlarge c5n.2xlarge m3.2xlarge t3a.2xlarge m5zn.2xlarge t3.2xlarge m5.2xlarge m5a.2xlarge m5dn.2xlarge m6i.2xlarge m4.2xlarge m5ad.2xlarge m6a.2xlarge]	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T17:57:23.796Z	INFO	controller.provisioning	Launched instance: i-03c6fb55b68bb907f, hostname: ip-192-168-157-226.us-west-2.compute.internal, type: t3a.2xlarge, zone: us-west-2b, capacityType: on-demand	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T17:57:23.859Z	INFO	controller.provisioning	Bound 4 pod(s) to node ip-192-168-157-226.us-west-2.compute.internal	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T17:57:23.859Z	INFO	controller.provisioning	Waiting for unschedulable pods	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T18:03:11.328Z	INFO	controller.provisioning	Batched 4 pods in 1.09442776s	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T18:03:11.719Z	INFO	controller.provisioning	Computed packing of 1 node(s) for 4 pod(s) with instance type option(s) [c1.xlarge c3.2xlarge c4.2xlarge c5d.2xlarge c5ad.2xlarge c5.2xlarge c5a.2xlarge c6i.2xlarge c5n.2xlarge m3.2xlarge m5zn.2xlarge m5.2xlarge m5dn.2xlarge m6a.2xlarge m5n.2xlarge m5ad.2xlarge m4.2xlarge m6i.2xlarge m5a.2xlarge t3.2xlarge]	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T18:03:14.331Z	INFO	controller.provisioning	Launched instance: i-0b4949f10b9937bf9, hostname: ip-192-168-136-179.us-west-2.compute.internal, type: c5a.2xlarge, zone: us-west-2b, capacityType: on-demand	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T18:03:14.377Z	INFO	controller.provisioning	Bound 4 pod(s) to node ip-192-168-136-179.us-west-2.compute.internal	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T18:03:14.377Z	INFO	controller.provisioning	Waiting for unschedulable pods	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T18:06:24.001Z	INFO	controller.node	Triggering termination for node that failed to transition to ready	{"commit": "ee0d0b5", "node": "ip-192-168-187-149.us-west-2.compute.internal"}
2022-01-19T18:06:24.035Z	INFO	controller.termination	Cordoned node	{"commit": "ee0d0b5", "node": "ip-192-168-187-149.us-west-2.compute.internal"}
2022-01-19T18:06:24.256Z	INFO	controller.termination	Deleted node	{"commit": "ee0d0b5", "node": "ip-192-168-187-149.us-west-2.compute.internal"}

Case 2 - I'm still looking to replicate the case, where a node does come up as healthy, but then goes into NotReady. I'm probably going to try a network partition to replicate this scenario but this is WIP.

4. Does this change impact docs?

Yes, PR includes docs updates
Yes, issue opened: link to issue
No

I don't think this impact our docs but I'm happy to add a callout somewhere if needed.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

netlify · 2022-01-19T18:20:54Z

✔️ Deploy Preview for karpenter-docs-prod canceled.

🔨 Explore the source changes: 382440e

🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/61e8a48a02c93b00075484ef

pkg/controllers/node/startup.go

pkg/controllers/node/controller.go

pkg/controllers/node/startup.go

ellistarn

Nice!

Co-authored-by: Ellis Tarn <[email protected]>

bwagner5 · 2022-01-24T19:58:02Z

pkg/controllers/node/initialization.go

+	}
+
+	if !node.IsReady(n) {
+		if age := injectabletime.Now().Sub(n.GetCreationTimestamp().Time); age < InitializationTimeout {


you could shorten with:

if age := injectabletime.Since(n.GetCreationTimestamp().Time); age < InitializationTimeout {

bwagner5 · 2022-01-24T20:00:41Z

pkg/controllers/node/initialization.go

+		if age := injectabletime.Now().Sub(n.GetCreationTimestamp().Time); age < InitializationTimeout {
+			return reconcile.Result{RequeueAfter: InitializationTimeout - age}, nil
+		}
+		logging.FromContext(ctx).Infof("Triggering termination for node that failed to become ready")


I think it would be helpful to log the node name here, wdyt?

This is included in our logger. All reconcilers include resource name in their logger at the top level. It helps avoid redundant statements.

Replace liveness and readiness with startup node controller

ee0d0b5

mattsre reviewed Jan 19, 2022

View reviewed changes

pkg/controllers/node/startup.go Outdated Show resolved Hide resolved