-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drift replacement stuck due to "Cannot disrupt NodeClaim" #1684
Comments
This issue is currently awaiting triage. If Karpenter contributors determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
I've done some more digging on this and can add some additional information. The node and nodeclaim both have deletionTimestamp set, which means they're just waiting for their finalizers to be removed before entering Terminating. Looking at the code it seems that the node finalizer ensures that all the nodeclaims related to the node are deleted before it finishes by calling cloudProvider.Delete for the nodeclaim, then the node will finalize and terminate and then the nodeclaims will finalize and terminate. I can't see any evidence that the deletion request has made it to the cloud (I did a athena search over cloudtrail for all non read-only requests in ec2 in the region, no calls to TerminateInstances (or similar) were made) so it seems to be something in between. The only thing I can see that would cause the cloud termination never to be reached, without any errors being logged, is if one of the two actions that cause a reconcile requeue never finish - namely the drain, or the volume attachment tidy up. Actually it can't be the drain, because a failed drain causes a node event to be published. |
Are you using aws provider? Karpenter will create an new nodeclaim firstly to ensure the pods can be scheduled. Cloud you see your controller logs to check whether the provider is trying to create an nodeclaim or not? |
It is the volume attachments. If I do However, the persistent volume associated with the attachment is in Released status (the PVC, and the pod that was associated with the claim, no longer exist, presumably having been drained). I know that the fix is to ignore released attachments in filterVolumeAttachments, I'm just struggling to create a test case that fails without the fix and passes with the fix! |
An easy way to validate that this was the problem (in hindsight, obviously!) is that |
We are seeing similar behavior with karpenter v1, nodeclaims stuck in a drifted state without ever being disrupted. In my case, I have a node with no volumeattachments
My node says that disruption is blocked due to a pending pod, but I have no pending pods in my cluster, and the node in question has a taint to allow only a single do-not-disrupt pod to schedule there as a test case.
I am using terminationGracePeriod on this nodeclaim, and I expect that disruption via drift should make the node unschedulable and create a new nodeclaim for pods to reschedule on. |
Opened up a new issue here since my problem looks to be unrelated to the issue with volumeattachments, even though it results in similar behavior. |
Thanks to @AndrewSirenko for providing some valuable insight in #1700 by suggesting that maybe the CSI drivers weren't handling volume detachment correctly. I had also missed that only the EBS CSI volumes were affected, and the EFS volumes were being handled fine. I decided to check the EBS controller logs during node termination, only to discover I no longer had an EBS controller on the node, because the EBS node daemonset didn't tolerate the termination taints. Once I changed the tolerations so that the EBS node controller remained alive during termination that meant that the volumes could be cleaned up appropriately and the drift replacement now works perfectly again. It looks like #1294 was released with 1.0.1 and was a breaking change for us due to our incorrect EBS CSI configuration which we'd previously got away with! |
Closing this now |
Glad you root caused this @willthames, and thanks for sharing this tricky failure mode. I'll make sure we over at EBS CSI Driver add this to some kind of Karpenter + EBS CSI FAQ/Troubleshooting guide. Just curious, but what version of the EBS CSI Driver were you running? v1.29.0 added a check for the the |
I've just checked the running version in as yet unfixed cluster, it's v1.34.0 - so the version shouldn't be a problem (we have a github action that regularly checks our helm charts and bumps them so we're rarely too far off the leading edge) I'll validate that the correct taints are being applied and watched for when I apply the AMI bump to our remaining cluster |
@AndrewSirenko I've raised kubernetes-sigs/aws-ebs-csi-driver#2158 now - it seems that the taint has changed with v1 to |
Description
Observed Behavior:
Two nodes were replaced during drift replacement, next one seems stuck with
There is nothing in karpenter's logs to explain this. We did see similar behaviour during the karpenter 1.0.1 upgrade but put that down to API version mismatches but we don't seem to have any such mismatches this time.
Expected Behavior:
All nodes get replaced during drift replacement
Reproduction Steps (Please include YAML):
Versions:
kubectl version
):The text was updated successfully, but these errors were encountered: