Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Karpenter can't delete node if Kubelet crashes #6090

Closed
tomjohnburton opened this issue Apr 24, 2024 · 1 comment
Closed

Karpenter can't delete node if Kubelet crashes #6090

tomjohnburton opened this issue Apr 24, 2024 · 1 comment
Labels
bug Something isn't working needs-triage Issues that need to be triaged

Comments

@tomjohnburton
Copy link

Description

Observed Behavior:
We had a pod that as soon as it launched (and due to not setting resource limits) ate up the entire CPU and memory of the node it was running on. The node would go into the NotReady state and the pods would fail to be terminated. In the Karpenter logs, it was as if nothing happened. Karpenter doesn't make any attempt to recover the node.

Expected Behavior:
Karpenter should intervene in this case and try to spin up a new node.

Reproduction Steps (Please include YAML):
Create a pod without resource limits that immediately steals all the memory and cpu of the node. Kubelet should crash. Karpenter won't even register there is an issue.

Versions:

  • Chart Version: 0.36.0
  • Kubernetes Version (kubectl version): 1.29
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@tomjohnburton tomjohnburton added bug Something isn't working needs-triage Issues that need to be triaged labels Apr 24, 2024
@tzneal
Copy link
Contributor

tzneal commented Apr 24, 2024

Closing as a duplicate of the kubernetes-sigs/karpenter#750 issue.

@tzneal tzneal closed this as completed Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs-triage Issues that need to be triaged
Projects
None yet
Development

No branches or pull requests

2 participants