Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set RequeueAfter so liveness subreconciler fires when no more node events #779

Merged
merged 2 commits into from
Nov 2, 2021

Conversation

suket22
Copy link
Contributor

@suket22 suket22 commented Nov 2, 2021

1. Issue, if available:
N/A

2. Description of changes:
This sets a RequeueAfter on the liveness reconciler result, if it's been less than LivenessTimeout since the node was created. This ensures that even if there's no more subsequent node events we still consider the worker node for termination later.

If you don't do so, you may enter into a runaway scaling problem where Karpenter will keep launching new nodes and pods indefinitely. The pods get stuck in Terminating state and the liveness reconciler never terminates the old NotReady worker nodes because no new node events are being generated and therefore the reconciler is never triggered.

How to reproduce the issue?
Configure Karpenter to launch worker nodes in a subnet that has no network connectivity. For example my subnet had a bad route table (no gateways).

I verified after this code change, that after 15 minutes the nodes in NotReady status were terminated.

3. Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: link to issue
  • No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@netlify
Copy link

netlify bot commented Nov 2, 2021

✔️ Deploy Preview for karpenter-docs-prod canceled.

🔨 Explore the source changes: 32350e6

🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/6181c82be5a6880007fceabd

Copy link
Contributor

@ellistarn ellistarn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing my buggy snippit I messaged to you :D

@suket22 suket22 merged commit 6468992 into aws:main Nov 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants