VolumeAttachment not marked as detached causes problems when the Node is deleted. #215

msau42 · 2020-03-12T02:09:06Z

In #184, we had decided that instead of marking the VolumeAttachment as detached, we would just requeue the volume to have the workqueue process it again.

However, this doesn't work in the case where the Node is deleted. In that scenario:

ListVolumes() shows that volume is not attached to the node anymore
ReconcileVA() sets force sync
syncAttach() just tries to reattach the volume again and fails because node is gone
In k/k AD controller, we try to attach to new node, but it fails on the multi-attach check because volume is still attached in asw.

What should happen is:

ListVolumes() shows that volume is not attached to the node anymore
We actually mark VolumeAttachment.status.attached as detached
In k/k AD controller, VerifyVolumesAttached() sees that VolumeAttachment is detached, updates asw
AD reconciler allows new Attach on new node to proceed.

I'm not sure the best way to fix step 2). Some suggestions I have in order of preference:

We go back to actually updating VolumeAttachment in ReconcileVA() like the original PR did. But we call markAsDetached to make sure we update everything properly.
We pass some more state to syncVA() so that it can markAsDetached if csiAttach failed on the force sync.

The text was updated successfully, but these errors were encountered:

msau42 · 2020-03-12T02:09:16Z

cc @jsafrane

msau42 · 2020-03-12T02:09:22Z

/kind bug

jsafrane · 2020-03-16T17:32:19Z

What should happen is:

ListVolumes() shows that volume is not attached to the node anymore

We actually mark VolumeAttachment.status.attached as detached

The volume might be already attaching. A/D controller does not know about that.

In k/k AD controller, VerifyVolumesAttached() sees that VolumeAttachment is detached, updates asw

It's not possible to distinguish attaching from attached there.

AD reconciler allows new Attach on new node to proceed.

IMO, A/D controller (or csi plugin) should see the Node does not exist and delete VolumeAttachment.

msau42 · 2020-03-16T18:00:31Z

Do you see a problem if we mark VolumeAttachment as detached while it's still attaching? How is this different from the normal case when we are first attaching a volume and it's VolumeAttachment.status.attached = false?

jsafrane · 2020-03-17T08:47:13Z

From external provisioner POV, it should be safe. In the worst case when both regular sync and ListVolume sync race, it marks just attached volume as detached. This causes VolumeAttachment to be synced again, ControllerPublish will be called and it will fix the VolumeAttachment status.

Still, something in the A/D controller / CSI plugin there must check the destination node is gone and delete VolumeAttachment and update ASW.

msau42 · 2020-03-17T14:02:38Z

When a node is deleted, the combination of pods getting deleted and AD.verifyvolumesattached causes the detach

msau42 · 2020-03-17T14:06:31Z

I forgot to mention in the initial comment, the Pod does get deleted, which should trigger detach, however AD controller doesn't because it still sees the volume mounted in asw and says it's not safe to detach. This also starts the 6 minute force detach timer.

It depends on VerifyVolumesAttached to override asw.

jsafrane · 2020-03-17T15:42:52Z

It starts making sense now.

I'm thinking if it can break anything in A/D controller. If A/D VerifyVolumesAreAttached finds a volume that is detached and the pod does not exist, it will remove the pod from ASW. Who is going to delete the VolumeAttachment then? CSI plugin.Detach is not called... Should A/D controller call it?

msau42 · 2020-03-17T17:21:51Z

Looks like VolumesAreAttached calls DeleteVolumeNode, which completely removes it from asw, so you're right, Detach won't get called and VolumeAttachment will be leaked. One possibility is to not completely remove it from asw, but mark some special state.

Another possibility is we add VolumeAttachment GC to the AD: kubernetes/kubernetes#77324

msau42 · 2020-03-17T17:25:27Z

Another thought, should VolumeAttachment have ownerref to Node object so when the Node object gets deleted, we automatically trigger Detach?

jsafrane · 2020-03-18T08:36:18Z

I'd prefer to have a proper fix instead of piling up hacks in A/D controller & external-attacher.

Node object is deleted. PodGCController deletes pods from the node. Should it delete VolumeAttachments too? That looks too CSI specific... Or should we add loop to A/D controller to force-detach from really deleted nodes sooner that in 6 minutes? This will work for all volume plugins and we don't need ListVolumes / VerifyVolumesAreAttached at all.

How probable is it that the deleted node comes back and how quickly? If someone accidentally deletes a Node object (keeping kubelet running), how quickly is it re-created? PodGCController has 40s "quarantine".

msau42 · 2020-03-19T00:51:25Z

VerifyVolumesAreAttached is still needed for other scenarios where node still exists but volume got detached out of band, or node gets recreated with the same name.

The other thing I realized is that this problem only occurs if the Node is ungracefully deleted (without a drain).

If someone deleted the Node object, kubelet only recreates it if it restarts.

I think we should still let Pod GC invoke detach. Right now it is guarded by the existence of node, which is guarded by this check. I'm not sure what problems may happen if we delete the node from cache while we think pods are still attached to it. The other problem is that nodeDelete is only called via informer. If it fails that one time, then we never retry removing the node from cache again.

msau42 · 2020-03-19T02:49:20Z

I discussed a bit with @saad-ali and he has concerns about using the volumeattachment.status.attached field to indicate that a volume is detached because it could also mean attach or detach could be in progress. If we want to fix this, we may need to think about having a new field that can actually indicate something is detached with nothing in progress.

Also, we probably should revisit the logic in csi.VolumesAreAttached that uses the status.attached field, and also revisit other cases where we return attached = false. That causes the volume to be removed from asw. Could that be problematic if it's actually still attached?

jsafrane · 2020-03-19T09:02:04Z

concerns about using the volumeattachment.status.attached field to indicate that a volume is detached because it could also mean attach or detach could be in progress

The only signal that a volume is fully detached is that VolumeAttachment is deleted / missing and IMO only this should be used by A/D controller to check that the volume is fully detached. When VolumeAttachment exists, the volume may be attaching / detaching / fully attached.

We could add "detached and can't attach because node is gone" to VolumeAttachment, but that is safe and race free only until the node appears again.

fejta-bot · 2020-06-17T10:00:52Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

msau42 · 2020-06-17T23:48:19Z

/lifecycle frozen

alex-berger · 2021-02-18T22:23:54Z

Any news on this, this is really annoying as there are so many opportunities where nodes are deleted:

by the cluster autoscaler
during cluster upgrade
...
And then we always have to manually fix dozens of pods which cannot start as they cannot attach their volumes.

ialidzhikov · 2021-05-07T19:45:26Z

@msau42 , isn't this issue resolved by kubernetes/kubernetes#96617 (which is the fix for kubernetes/kubernetes#77324)?

mitchellmaler · 2021-08-09T21:18:20Z

Still ran into this running Kubernetes v1.21.3. The node was deleted and the VolumeAttachment was still around specifying the old node name.

msau42 · 2021-08-09T23:39:45Z

nodes are deleted:

by the cluster autoscaler

during cluster upgrade

Those scenarios are graceful scenarios, where we highly recommend drain is done first. Draining first before deleting the node should solve at least the graceful case. Although there is a slight race that the drain process also needs to fully wait for volumes to be completely unmounted.

For the ungraceful case, the 6 minute force detach should kick in

alex-berger · 2021-08-10T07:22:35Z

@msau42 this happens every now and then in our clusters and 6 minutes is a long delay in a world with 99.95% and more uptime SLAs. Also note, that we already drain the nodes and still it might happen that a node terminates immediately without proper draining.
That's reality and not just theory, and we have to deal with it. That's why we need resilient and self healing controllers, which make sure the system recovers from such error automatically within reasonable time.
After all this is the reason why everybody is moving to kubernetes. If we still want to live on assumptions and fix everything manually whenever our overly optimistic assumptions fail, then we don't need to maintain complex kubernetes systems 😁.

galgross · 2021-10-14T11:23:52Z

1.21.

facing the same issue in k8s 1.21.5

gkaskonas · 2021-10-20T16:19:58Z

Facing the same issue with the latest EKS and ebs csi controller

avivgold098 · 2022-06-24T05:56:14Z

@jsafrane what do you think about that issue? My organization can help with the implementation

spriya-m · 2022-06-29T13:17:37Z

Facing the issue with Kubernetes v1.24 as well.

irasnyd · 2024-02-13T19:14:59Z

Still hitting this with EKS v1.26

karunsiri · 2024-03-20T01:52:20Z

Still hitting this with EKS v1.29.

nooperpudd · 2024-04-15T03:46:46Z

Need to wait about 6mins to attach volume

Warning  FailedAttachVolume      8m12s  attachdetach-controller  Multi-Attach error for volume "pvc-bc4b37b5-1726-414b-aac8-ec1a01209041" Volume is already exclusively attached to one node and can't be attached to another  
                 
Normal   SuccessfulAttachVolume  2m8s   attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-bc4b37b5-1726-414b-aac8-ec1a01209041"

fungaren · 2024-04-17T13:41:54Z

Maybe we can learn from this:

https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/release-1.29/cmd/hooks/prestop.go

ivankatliarchuk · 2024-07-02T20:34:37Z

Does anyone have experience with this or know of any ongoing efforts to address it? AWS EKS 1.26, 1.27 and most likely 1.28 affected as well.

kbr-trackunit · 2024-08-12T09:50:32Z

Does anyone have experience with this or know of any ongoing efforts to address it? AWS EKS 1.26, 1.27 and most likely 1.28 affected as well.

We are still running AWS EKS 1.28 and this have been a problem for a long time.

We are also running Karpenter which does kill of nodes all the time as scheduled. From time to time we see prometheus-pushgateway node getting killed.
A new node is started where the prometheus-pushgateway pod is scheduled. It fails due to the Multi-attach error which again mean that the new pod is not registered as healthy I'm not quite sure why the pvc doesn't register that the pod is being terminated. But the PV is attached to the old node according to the ebs-csi-driver until it reaches the timeout after 5 minutes.
As soon as it's released the new pod start up.

I'm unsure if this issue is strictly related to Kubernetes or if the solution is maybe to be found inside AWS ebs-csi-driver.

We have had this issue since before 1.21.

sergiiplevako-hnb · 2024-12-09T18:33:01Z

We encountered this issue during an in-place upgrade of our AWS EKS cluster from v1.29 to v1.30. The problem manifested when using HashiCorp Vault (v1.18.1), deployed as a Deployment with EBS volumes for persistent storage. Our EBS CSI driver version is aws-ebs-csi-driver:v1.37.0.
Issue Details
During the upgrade, the external-attacher component of the EBS CSI driver exhibited problems detaching and reattaching volumes.
The behavior caused Vault pods to enter a crash-loop state as their volumes were not properly handled.
Specifically, the driver reported issues with volumes being in an incorrect state (e.g., available instead of attached), as seen in the logs below:

kubectl logs ebs-csi-controller-xxxxxxx-xxxx -n kube-system --container ebs-plugin
I1205 15:23:44.460319    1 main.go:147] "Region provided via AWS_REGION environment variable" region="OUR_REGION"
I1205 15:23:44.461462    1 driver.go:69] "Driver Information" Driver="ebs.csi.aws.com" Version="v1.37.0"
I1205 15:27:16.863438    1 controller.go:460] "ControllerUnpublishVolume: detaching" volumeID="vol-A" nodeID="i-NODE5"
I1205 15:27:17.572397    1 cloud.go:991] "DetachDisk: called on non-attached volume" volumeID="vol-A"
E1205 15:27:17.779618    1 handlers.go:76] "Error from AWS API" err="api error IncorrectState: Volume 'vol-A' is in the 'available' state."
I1205 15:27:17.779758    1 controller.go:463] "ControllerUnpublishVolume: attachment not found" volumeID="vol-A" nodeID="i-NODE5"
I1205 15:27:17.979201    1 controller.go:410] "ControllerPublishVolume: attaching" volumeID="vol-A" nodeID="i-NODE6"
I1205 15:27:19.589191    1 controller.go:419] "ControllerPublishVolume: attached" volumeID="vol-A" nodeID="i-NODE6" devicePath="/dev/xvdaa"
I1205 15:33:17.263861    1 controller.go:460] "ControllerUnpublishVolume: detaching" volumeID="vol-B" nodeID="i-NODE1"
I1205 15:33:17.263951    1 controller.go:460] "ControllerUnpublishVolume: detaching" volumeID="vol-C" nodeID="i-NODE2"
I1205 15:33:17.899116    1 cloud.go:991] "DetachDisk: called on non-attached volume" volumeID="vol-C"
I1205 15:33:17.899111    1 cloud.go:991] "DetachDisk: called on non-attached volume" volumeID="vol-B"
E1205 15:33:18.104202    1 handlers.go:76] "Error from AWS API" err="api error IncorrectState: Volume 'vol-C' is in the 'available' state."
I1205 15:33:18.104373    1 controller.go:463] "ControllerUnpublishVolume: attachment not found" volumeID="vol-C" nodeID="i-NODE2"
E1205 15:33:18.157390    1 handlers.go:76] "Error from AWS API" err="api error IncorrectState: Volume 'vol-B' is in the 'available' state."
I1205 15:33:18.157653    1 controller.go:463] "ControllerUnpublishVolume: attachment not found" volumeID="vol-B" nodeID="i-NODE1"
I1205 15:33:18.375038    1 controller.go:410] "ControllerPublishVolume: attaching" volumeID="vol-B" nodeID="i-NODE3"
I1205 15:33:18.378919    1 controller.go:410] "ControllerPublishVolume: attaching" volumeID="vol-C" nodeID="i-NODE4"
I1205 15:33:19.976911    1 cloud.go:1106] "Waiting for volume state" volumeID="vol-C" actual="attaching" desired="attached"
I1205 15:33:19.976911    1 cloud.go:1106] "Waiting for volume state" volumeID="vol-B" actual="attaching" desired="attached"
I1205 15:33:21.575964    1 cloud.go:1106] "Waiting for volume state" volumeID="vol-C" actual="attaching" desired="attached"
I1205 15:33:21.575966    1 controller.go:419] "ControllerPublishVolume: attached" volumeID="vol-B" nodeID="i-NODE3" devicePath="/dev/xvdaa"
I1205 15:33:21.584269    1 controller.go:410] "ControllerPublishVolume: attaching" volumeID="vol-B" nodeID="i-NODE3"
I1205 15:33:22.878990    1 controller.go:419] "ControllerPublishVolume: attached" volumeID="vol-B" nodeID="i-NODE3" devicePath="/dev/xvdaa"
I1205 15:33:23.970693    1 controller.go:419] "ControllerPublishVolume: attached" volumeID="vol-C" nodeID="i-NODE4" devicePath="/dev/xvdaa"
I1209 09:46:42.968909    1 controller.go:460] "ControllerUnpublishVolume: detaching" volumeID="vol-B" nodeID="i-NODE3"
I1209 09:46:44.610288    1 cloud.go:1106] "Waiting for volume state" volumeID="vol-B" actual="detaching" desired="detached"
I1209 09:46:46.175315    1 cloud.go:1106] "Waiting for volume state" volumeID="vol-B" actual="detaching" desired="detached"
I1209 09:46:48.572803    1 cloud.go:1106] "Waiting for volume state" volumeID="vol-B" actual="detaching" desired="detached"
I1209 09:46:52.386666    1 cloud.go:1106] "Waiting for volume state" volumeID="vol-B" actual="detaching" desired="detached"
I1209 09:46:58.786684    1 cloud.go:1106] "Waiting for volume state" volumeID="vol-B" actual="detaching" desired="detached"
I1209 09:47:09.896150    1 cloud.go:1106] "Waiting for volume state" volumeID="vol-B" actual="detaching" desired="detached"
I1209 09:47:29.390903    1 controller.go:468] "ControllerUnpublishVolume: detached" volumeID="vol-B" nodeID="i-NODE3"
I1209 09:47:29.411332    1 controller.go:460] "ControllerUnpublishVolume: detaching" volumeID="vol-B" nodeID="i-NODE3"
I1209 09:47:29.995199    1 cloud.go:991] "DetachDisk: called on non-attached volume" volumeID="vol-B"
E1209 09:47:30.209572    1 handlers.go:76] "Error from AWS API" err="api error IncorrectState: Volume 'vol-B' is in the 'available' state."
I1209 09:47:30.209720    1 controller.go:463] "ControllerUnpublishVolume: attachment not found" volumeID="vol-B" nodeID="i-NODE3"

Reproduction Steps

Perform an in-place upgrade of an AWS EKS cluster from Kubernetes v1.29 to v1.30.
Deploy HashiCorp Vault (v1.18.1) with persistent storage on EBS volumes.
Observe volume attachment/detachment errors during or after the upgrade, leading to application failures.

Impact

Persistent volumes (EBS) become stuck in an incorrect state, such as available, and fail to attach to the new nodes.
Vault pods fail to start because their associated volumes cannot be mounted.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 12, 2020

msau42 mentioned this issue May 13, 2020

PVCs failing with CMEK-enabled storage class kubernetes-sigs/gcp-compute-persistent-disk-csi-driver#499

Closed

msau42 mentioned this issue Jun 4, 2020

CSI: Modify VolumeAttachment check to use Informer/Cache kubernetes/kubernetes#91307

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 17, 2020

msau42 mentioned this issue Jun 17, 2020

Implement ListVolumes kubernetes-sigs/aws-ebs-csi-driver#464

Closed

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 17, 2020

msau42 mentioned this issue Aug 7, 2020

Receiving a Multi-Attach error when the EBS volume is not attached to any instance kubernetes-sigs/aws-ebs-csi-driver#542

Closed

ebuildy mentioned this issue Sep 7, 2020

Volume can't be detached of deleted Node/VM kubernetes-sigs/vsphere-csi-driver#221

Closed

hardikdr mentioned this issue Sep 11, 2020

machine-controller-manager is not waiting for detach of CSI PVs gardener/machine-controller-manager#508

Closed

ialidzhikov mentioned this issue Sep 15, 2020

VolumeAttachments are orphaned on hibernation gardener/gardener-extension-provider-gcp#172

Closed

xing-yang self-assigned this Nov 24, 2020

smarsh-tim mentioned this issue Dec 30, 2021

eksctl anywhere upgrade leaves drivers/controllers in broken state aws/eks-anywhere#896

Open

mjseid mentioned this issue Jun 14, 2022

Add optional pause prior to completing lifecycle action to allow PVC cleanup aws/aws-node-termination-handler#651

Closed

sarroutbi mentioned this issue May 19, 2023

tang-operator [scale-out]: timeout is expiring for POD to go to Running state RedHat-SP-Security/tests#126

Closed

BrianKopp mentioned this issue May 24, 2023

PVC attaching takes much time kubernetes-sigs/aws-ebs-csi-driver#1302

Closed

fungaren mentioned this issue Apr 17, 2024

mark volume detach as success when Node vm is deleted from vcenter kubernetes-sigs/vsphere-csi-driver#1879

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VolumeAttachment not marked as detached causes problems when the Node is deleted. #215

VolumeAttachment not marked as detached causes problems when the Node is deleted. #215

msau42 commented Mar 12, 2020

msau42 commented Mar 12, 2020

msau42 commented Mar 12, 2020

jsafrane commented Mar 16, 2020

msau42 commented Mar 16, 2020

jsafrane commented Mar 17, 2020

msau42 commented Mar 17, 2020

msau42 commented Mar 17, 2020

jsafrane commented Mar 17, 2020

msau42 commented Mar 17, 2020

msau42 commented Mar 17, 2020

jsafrane commented Mar 18, 2020

msau42 commented Mar 19, 2020

msau42 commented Mar 19, 2020

jsafrane commented Mar 19, 2020

fejta-bot commented Jun 17, 2020

msau42 commented Jun 17, 2020

alex-berger commented Feb 18, 2021

ialidzhikov commented May 7, 2021

mitchellmaler commented Aug 9, 2021

msau42 commented Aug 9, 2021

alex-berger commented Aug 10, 2021 •

edited

Loading

galgross commented Oct 14, 2021

gkaskonas commented Oct 20, 2021

avivgold098 commented Jun 24, 2022

spriya-m commented Jun 29, 2022

irasnyd commented Feb 13, 2024

karunsiri commented Mar 20, 2024

nooperpudd commented Apr 15, 2024 •

edited

Loading

fungaren commented Apr 17, 2024

ivankatliarchuk commented Jul 2, 2024

kbr-trackunit commented Aug 12, 2024

sergiiplevako-hnb commented Dec 9, 2024

VolumeAttachment not marked as detached causes problems when the Node is deleted. #215

VolumeAttachment not marked as detached causes problems when the Node is deleted. #215

Comments

msau42 commented Mar 12, 2020

msau42 commented Mar 12, 2020

msau42 commented Mar 12, 2020

jsafrane commented Mar 16, 2020

msau42 commented Mar 16, 2020

jsafrane commented Mar 17, 2020

msau42 commented Mar 17, 2020

msau42 commented Mar 17, 2020

jsafrane commented Mar 17, 2020

msau42 commented Mar 17, 2020

msau42 commented Mar 17, 2020

jsafrane commented Mar 18, 2020

msau42 commented Mar 19, 2020

msau42 commented Mar 19, 2020

jsafrane commented Mar 19, 2020

fejta-bot commented Jun 17, 2020

msau42 commented Jun 17, 2020

alex-berger commented Feb 18, 2021

ialidzhikov commented May 7, 2021

mitchellmaler commented Aug 9, 2021

msau42 commented Aug 9, 2021

alex-berger commented Aug 10, 2021 • edited Loading

galgross commented Oct 14, 2021

gkaskonas commented Oct 20, 2021

avivgold098 commented Jun 24, 2022

spriya-m commented Jun 29, 2022

irasnyd commented Feb 13, 2024

karunsiri commented Mar 20, 2024

nooperpudd commented Apr 15, 2024 • edited Loading

fungaren commented Apr 17, 2024

ivankatliarchuk commented Jul 2, 2024

kbr-trackunit commented Aug 12, 2024

sergiiplevako-hnb commented Dec 9, 2024

alex-berger commented Aug 10, 2021 •

edited

Loading

nooperpudd commented Apr 15, 2024 •

edited

Loading