Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace liveness and readiness with initialization node controller #1186

Merged
merged 5 commits into from
Jan 25, 2022

Conversation

suket22
Copy link
Contributor

@suket22 suket22 commented Jan 19, 2022

1. Issue, if available:
#1135

2. Description of changes:
Earlier we would only delete worker nodes iff the kubelet would have a status of NodeStatusNeverUpdated for more than 15 minutes. If this were the case, it meant the Kubelet never connected to the API Server at all.

We now delete worker nodes iff the kubelet is in NotReady status for more than 15 minutes after startup. The mechanism we use to discover that a node has only been started is by looking at a taint we apply during node object creation. Once the taint has been removed, we declare the startup as complete and no longer re-evaluate the node for deletion. We will re-introduce something like a liveness controller in the future if necessary but we want to be very careful with that to maintain static stability of the cluster.

3. How was this change tested?
Case 1 - I removed the CNI policy from the Node Role. Nodes were stuck in NotReady and I can see them being terminated after 15 minutes.

2022-01-19T17:51:22.341Z	INFO	controller.provisioning	Batched 4 pods in 1.101064515s	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T17:51:22.551Z	INFO	controller.provisioning	Computed packing of 1 node(s) for 4 pod(s) with instance type option(s) [c1.xlarge c4.2xlarge c3.2xlarge c6i.2xlarge c5ad.2xlarge c5d.2xlarge c5a.2xlarge c5.2xlarge c5n.2xlarge m3.2xlarge m5dn.2xlarge m5a.2xlarge t3.2xlarge m6i.2xlarge m5ad.2xlarge t3a.2xlarge m5n.2xlarge m4.2xlarge m5zn.2xlarge m5d.2xlarge]	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T17:51:24.811Z	INFO	controller.provisioning	Launched instance: i-01de791cd273012e1, hostname: ip-192-168-187-149.us-west-2.compute.internal, type: t3a.2xlarge, zone: us-west-2b, capacityType: on-demand	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T17:51:24.852Z	INFO	controller.provisioning	Bound 4 pod(s) to node ip-192-168-187-149.us-west-2.compute.internal	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T17:51:24.852Z	INFO	controller.provisioning	Waiting for unschedulable pods	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T17:57:21.288Z	INFO	controller.provisioning	Batched 4 pods in 1.050060518s	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T17:57:21.650Z	INFO	controller.provisioning	Computed packing of 1 node(s) for 4 pod(s) with instance type option(s) [c1.xlarge c4.2xlarge c3.2xlarge c5ad.2xlarge c5d.2xlarge c5a.2xlarge c5.2xlarge c6i.2xlarge c5n.2xlarge m3.2xlarge t3a.2xlarge m5zn.2xlarge t3.2xlarge m5.2xlarge m5a.2xlarge m5dn.2xlarge m6i.2xlarge m4.2xlarge m5ad.2xlarge m6a.2xlarge]	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T17:57:23.796Z	INFO	controller.provisioning	Launched instance: i-03c6fb55b68bb907f, hostname: ip-192-168-157-226.us-west-2.compute.internal, type: t3a.2xlarge, zone: us-west-2b, capacityType: on-demand	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T17:57:23.859Z	INFO	controller.provisioning	Bound 4 pod(s) to node ip-192-168-157-226.us-west-2.compute.internal	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T17:57:23.859Z	INFO	controller.provisioning	Waiting for unschedulable pods	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T18:03:11.328Z	INFO	controller.provisioning	Batched 4 pods in 1.09442776s	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T18:03:11.719Z	INFO	controller.provisioning	Computed packing of 1 node(s) for 4 pod(s) with instance type option(s) [c1.xlarge c3.2xlarge c4.2xlarge c5d.2xlarge c5ad.2xlarge c5.2xlarge c5a.2xlarge c6i.2xlarge c5n.2xlarge m3.2xlarge m5zn.2xlarge m5.2xlarge m5dn.2xlarge m6a.2xlarge m5n.2xlarge m5ad.2xlarge m4.2xlarge m6i.2xlarge m5a.2xlarge t3.2xlarge]	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T18:03:14.331Z	INFO	controller.provisioning	Launched instance: i-0b4949f10b9937bf9, hostname: ip-192-168-136-179.us-west-2.compute.internal, type: c5a.2xlarge, zone: us-west-2b, capacityType: on-demand	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T18:03:14.377Z	INFO	controller.provisioning	Bound 4 pod(s) to node ip-192-168-136-179.us-west-2.compute.internal	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T18:03:14.377Z	INFO	controller.provisioning	Waiting for unschedulable pods	{"commit": "ee0d0b5", "provisioner": "default"}
2022-01-19T18:06:24.001Z	INFO	controller.node	Triggering termination for node that failed to transition to ready	{"commit": "ee0d0b5", "node": "ip-192-168-187-149.us-west-2.compute.internal"}
2022-01-19T18:06:24.035Z	INFO	controller.termination	Cordoned node	{"commit": "ee0d0b5", "node": "ip-192-168-187-149.us-west-2.compute.internal"}
2022-01-19T18:06:24.256Z	INFO	controller.termination	Deleted node	{"commit": "ee0d0b5", "node": "ip-192-168-187-149.us-west-2.compute.internal"}

Case 2 - I'm still looking to replicate the case, where a node does come up as healthy, but then goes into NotReady. I'm probably going to try a network partition to replicate this scenario but this is WIP.

4. Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: link to issue
  • No

I don't think this impact our docs but I'm happy to add a callout somewhere if needed.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@netlify
Copy link

netlify bot commented Jan 19, 2022

✔️ Deploy Preview for karpenter-docs-prod canceled.

🔨 Explore the source changes: 382440e

🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/61e8a48a02c93b00075484ef

Copy link
Contributor

@ellistarn ellistarn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@suket22 suket22 changed the title Replace liveness and readiness with startup node controller Replace liveness and readiness with initialization node controller Jan 19, 2022
}

if !node.IsReady(n) {
if age := injectabletime.Now().Sub(n.GetCreationTimestamp().Time); age < InitializationTimeout {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could shorten with:

if age := injectabletime.Since(n.GetCreationTimestamp().Time); age < InitializationTimeout {

if age := injectabletime.Now().Sub(n.GetCreationTimestamp().Time); age < InitializationTimeout {
return reconcile.Result{RequeueAfter: InitializationTimeout - age}, nil
}
logging.FromContext(ctx).Infof("Triggering termination for node that failed to become ready")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful to log the node name here, wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is included in our logger. All reconcilers include resource name in their logger at the top level. It helps avoid redundant statements.

@ellistarn ellistarn merged commit 2346ed5 into aws:main Jan 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants