-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Volume can't be detached of deleted Node/VM #221
Comments
I now delete the pvc and I disappeared in the cns ui in vsphere. The controller still tried to find the delete node and detach the volume:
I then removed the finalizer from the pv but now the controller doesn't find the pvc anymore:
Where is this information still stored? Can I clean this up somehow? |
Every two minutes the kube-controller-manager logs this:
However the logs stopped after the kube-controller-manager was restared. vsphere-csi-controller still logs the same errors. |
@yvespp Can you delete the |
Didn't know about volumeattachments, thanks! I delete the
|
Summery so far:
|
@yvespp are your workloads deployed as statefulsets? When you delete a VM, kubernetes will not attempt to delete the Pod. If the Pod was part of a Statefulset, then you have to run some manual steps to have it rescheduled on another node. This is being tracked on upstream kubernetes here - kubernetes/kubernetes#65392 From your original description:
The logs that you have pasted are actually external-attacher logs. Since the detach can't be successful (since the node doesnt exist), you need to manually delete the volumeattachments, as you did in #221 (comment) |
Yes it's a statefulset. I did drain the node before it was shut down and deleted but somehow the pod was not terminated before the node was shut down and the volume never released. On the Kubernetes side deleting the |
Would it help if I delete the Maybe I should just wait till all |
Can you try following these steps, once the node is in the
This should create another Pod which will be in the |
The node is never in
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
I believe this is related to kubernetes-csi/external-attacher#215 |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@yvespp any update from VMware in the ticket you opened with them? We have the same issue that the disk/VMDK itself in vSphere is somehow stuck in this unrecoverable state. We cannot even attach the backing VMDK manually to any VM in vSphere after it got stuck in this state. |
@marratj in our case the problem was that Disk Change Block Tracking (CBT) was enabled by our backup tool in the background but CBT was not enabled on the VM and the disk could not be attached again. |
@yvespp thanks a lot! That was exactly it also in our case. |
Still ran into this running Kubernetes v1.21.3. The node was deleted and the VolumeAttachment was still around specifying the old node name. Required us to update the finalizers on each resource to be deleted so the pods can start up. |
Hello @mitchellmaler, since this issue is closed, I'd suggest you take a look and comment here: #359. |
@tgelter thanks! |
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
After draining and deleting a Node a volume of a pod is stuck in
ContainerCreating
phase because the volume can't be mounted. Event:Unable to attach or mount volumes
Logs of the controller show that it can't detach the volume from the deleted vm (
kubedev-worker-2bcf0b684616
):There is no Node, CSINode or VM with that name anymore.
In the details of that PVC in the vSphere Container Volumes view there is no vm shown.
What you expected to happen:
The volume can be attached.
How to reproduce it (as minimally and precisely as possible):
Not sure, maybe the app didn't shut down correctly and the volume couldn't be removed before the node and vm was deleted
Anything else we need to know?:
Where is the state stored and can the PV be detached manually somehow?
I tried to recreate the PVC to reattache the same PV and restart the controller pod but it didn't help.
Is there something that I can to during the VM drain/shutdown to avoid this?
Environment:
uname -a
): 5.3.0-51-genericThe text was updated successfully, but these errors were encountered: