Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix emptiness reconciler when ttl not expired #1262

Merged
merged 3 commits into from
Feb 2, 2022

Conversation

suket22
Copy link
Contributor

@suket22 suket22 commented Feb 2, 2022

1. Issue, if available:
#993

2. Description of changes:

  • If ttl has not expired, requeueAfter at the right time.

3. How was this change tested?

  • Added a unit test.
  • Tested this manually as well.

Before it would take a long time to clear up (4 minutes)-

2022-02-02T00:47:51.832Z	INFO	controller.node	Added TTL to empty node	{"commit": "4829931", "node": "ip-192-168-191-121.us-west-2.compute.internal"}
2022-02-02T00:47:54.791Z	INFO	controller.node	Removed emptiness TTL from node	{"commit": "4829931", "node": "ip-192-168-191-121.us-west-2.compute.internal"}
2022-02-02T00:48:11.846Z	INFO	controller.node	Added TTL to empty node	{"commit": "4829931", "node": "ip-192-168-191-121.us-west-2.compute.internal"}
2022-02-02T00:52:13.964Z	INFO	controller.node	Triggering termination after 30s for empty node	{"commit": "4829931", "node": "ip-192-168-191-121.us-west-2.compute.internal"}
2022-02-02T00:52:13.996Z	INFO	controller.termination	Cordoned node	{"commit": "4829931", "node": "ip-192-168-191-121.us-west-2.compute.internal"}
2022-02-02T00:52:14.219Z	INFO	controller.termination	Deleted node	{"commit": "4829931", "node": "ip-192-168-191-121.us-west-2.compute.internal"}

Now it's as expected on a quick cycle of empty -> full -> empty and it drains as expected.

2022-02-02T16:51:16.166Z	INFO	controller.provisioning	Waiting for unschedulable pods	{"commit": "69bb318", "provisioner": "default"}
2022-02-02T16:52:52.765Z	INFO	controller.node	Added TTL to empty node	{"commit": "69bb318", "node": "ip-192-168-179-90.us-west-2.compute.internal"}
2022-02-02T16:52:56.132Z	INFO	controller.node	Removed emptiness TTL from node	{"commit": "69bb318", "node": "ip-192-168-179-90.us-west-2.compute.internal"}
2022-02-02T16:53:09.093Z	INFO	controller.node	Added TTL to empty node	{"commit": "69bb318", "node": "ip-192-168-179-90.us-west-2.compute.internal"}
2022-02-02T16:53:39.001Z	INFO	controller.node	Triggering termination after 30s for empty node	{"commit": "69bb318", "node": "ip-192-168-179-90.us-west-2.compute.internal"}
2022-02-02T16:53:39.040Z	INFO	controller.termination	Cordoned node	{"commit": "69bb318", "node": "ip-192-168-179-90.us-west-2.compute.internal"}
2022-02-02T16:53:39.272Z	INFO	controller.termination	Deleted node	{"commit": "69bb318", "node": "ip-192-168-179-90.us-west-2.compute.internal"}

I even tried out a basic emptiness check to verify no regressions -

2022-02-02T16:50:11.094Z	INFO	controller.node	Added TTL to empty node	{"commit": "69bb318", "node": "ip-192-168-171-248.us-west-2.compute.internal"}
2022-02-02T16:50:41.001Z	INFO	controller.node	Triggering termination after 30s for empty node	{"commit": "69bb318", "node": "ip-192-168-171-248.us-west-2.compute.internal"}
2022-02-02T16:50:41.033Z	INFO	controller.termination	Cordoned node	{"commit": "69bb318", "node": "ip-192-168-171-248.us-west-2.compute.internal"}
2022-02-02T16:50:41.254Z	INFO	controller.termination	Deleted node	{"commit": "69bb318", "node": "ip-192-168-171-248.us-west-2.compute.internal"}

4. Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: link to issue
  • No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@netlify
Copy link

netlify bot commented Feb 2, 2022

✔️ Deploy Preview for karpenter-docs-prod canceled.

🔨 Explore the source changes: 9ad026b

🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/61faf397dfdac900079c27df

@ellistarn
Copy link
Contributor

Thoughts on solving #1103 at the same time, since I think it touches the same code.

@suket22
Copy link
Contributor Author

suket22 commented Feb 2, 2022

Thoughts on solving #1103 at the same time, since I think it touches the same code.

So our node controller watches for changes in the provisioner and if say TTLSecondsAfterEmpty was modified, a reconcile event for each node should've been fired as per this.

: This worked as expected. I don't think anything is needed for #1103

2022-02-02T18:00:38.954Z	INFO	controller.node	Added TTL to empty node	{"commit": "69bb318", "node": "ip-192-168-131-59.us-west-2.compute.internal"}
2022-02-02T18:01:49.209Z	INFO	controller.node	Triggering termination after 30s for empty node	{"commit": "69bb318", "node": "ip-192-168-131-59.us-west-2.compute.internal"}
2022-02-02T18:01:49.241Z	INFO	controller.termination	Cordoned node	{"commit": "69bb318", "node": "ip-192-168-131-59.us-west-2.compute.internal"}
2022-02-02T18:01:49.492Z	INFO	controller.provisioning	Waiting for unschedulable pods	{"commit": "69bb318", "provisioner": "default"}
2022-02-02T18:01:49.492Z	INFO	controller.termination	Deleted node	{"commit": "69bb318", "node": "ip-192-168-131-59.us-west-2.compute.internal"}

I think what might've happened there was that if you modify TTLSecondsAfterEmpty before whatever the new TTLSecondsAfterEmpty was over, then we'd miss doing a reconcile because of the code bug I fix in this PR.

Copy link
Contributor

@njtran njtran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for the change, just a couple comments.

pkg/controllers/node/suite_test.go Outdated Show resolved Hide resolved
pkg/controllers/node/suite_test.go Outdated Show resolved Hide resolved
pkg/controllers/node/suite_test.go Outdated Show resolved Hide resolved
pkg/controllers/node/suite_test.go Show resolved Hide resolved
@suket22 suket22 merged commit 6d861af into aws:main Feb 2, 2022
@suket22 suket22 deleted the emptinessRequeueFix branch February 2, 2022 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants