-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only wait for volume attachments for drainable nodes #1700
Only wait for volume attachments for drainable nodes #1700
Conversation
This solves two problems identified so far: * waiting for a volume attachment associated with a persistent volume claim that has been released * waiting for a volume attachment associated with a persistent volume claim that has assigned to a new pod that can't start until the association is removed Effectively this changes the logic from finding all the PVCs associated with non-drainable pods and ignoring those ones, to finding all the PVCs associated with drainable pods and only blocking on those ones. As such some of the previous tests are now redundant because they check we wait for volume associations that aren't tied to a PVC at all.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: willthames The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @willthames. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Note that this is an alternative, much more impactful alternative to #1699. There may be cases that I have not considered. #1699 is a safer PR but doesn't protect against a second problem I found where a pod with a PVC associated with a volume attachment got moved to a new node (statefulset-like behaviour) and the new pod couldn't start because its persistent volume was still attached to the old node. |
Hey @willthames thanks for looking into this, I'm sorry that you're running into this drift replacement issue. I'll take a deeper look on the PR Tuesday, but I see one main issue with the following statement:
The issue is that Pod resources can get cleaned up in Kubernetes BEFORE the associated volumes are unpublished from the node (unmounted and/or detached). This is because the pod lifecycle is not affected by the lifecycle of the PVCs it is associated with. Therefore if you block node termination by filtering in attachments of drainable pods instead of filtering out those of non-drainable pod, you are not guaranteed to block node termination until all volumes are detached (because the pod resources might be deleted by K8s before associated VolumeAttachment resources are dealt with). You can run into the following race condition:
Or even worse, the Node resource could be deleted before kubelet confirms volume is unmounted. (Which leads to a 6+ minute delay on stateful workload migration) Nonetheless, VolumeAttachment resources should not linger once a PV is released, so I'm curious about that case. Is there anything special about these volumes? This could be a bug in Attach/Detach controller instead of Kubernetes? You can see this Preventing 6+ minute delays from StatefulSet Disruption Request for Comment for more information on why this blocking until volumes detached happens. I'm working on a up-to-date/cleaned-up version of this document which should be up at the end of the month. |
Hi @AndrewSirenko, thanks for your detailed response, definitely appreciate the extra context! The scenario you describe is the opposite problem to the one I have (nodes disappearing before the volume attachment, causing problems, rather than nodes not being terminated because of the volume attachment), so there's definitely a conflict somewhere. I'm only having this problem with one file system type as far as I can tell (EBS) but also with one pod type (couchbase, which is managed as pods by the couchbase controller (rather than a higher level abstraction such as statefulsets)). But the couchbase pods are long gone by the time the clean up needs to happen. (Initially I thought this was a problem for EFS where we have a lot more applications using it, but it's EBS and this is the only application using it. EBS is also a lot less tolerant to being mounted on multiple pods!) This problem persists for days, the only way I can get karpenter to complete drift termination is through deleting affected volume attachments (which is not an approach that is at all satisfying). I'll definitely check for any outstanding efs controller bugs. The alternative PR to this one doesn't block on unattached PVCs, but that one does not clean up if the volume is then mounted onto another pod |
This PR is no longer required (I'll add an explanation in #1684) |
Fixes #1684
Description
This solves two problems identified so far:
Effectively this changes the logic from finding
all the PVCs associated with non-drainable pods
and ignoring those ones, to finding all the PVCs
associated with drainable pods and only blocking
on those ones.
As such some of the previous tests are now redundant because they check we wait for volume associations that aren't tied to a PVC at all.
How was this change tested?
make test
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.