-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VolumeAttachment not marked as detached causes problems when the Node is deleted. #215
Comments
cc @jsafrane |
/kind bug |
The volume might be already attaching. A/D controller does not know about that.
It's not possible to distinguish
IMO, A/D controller (or csi plugin) should see the Node does not exist and delete VolumeAttachment. |
Do you see a problem if we mark VolumeAttachment as detached while it's still attaching? How is this different from the normal case when we are first attaching a volume and it's VolumeAttachment.status.attached = false? |
From external provisioner POV, it should be safe. In the worst case when both regular sync and ListVolume sync race, it marks just attached volume as detached. This causes VolumeAttachment to be synced again, ControllerPublish will be called and it will fix the VolumeAttachment status. Still, something in the A/D controller / CSI plugin there must check the destination node is gone and delete VolumeAttachment and update ASW. |
When a node is deleted, the combination of pods getting deleted and AD.verifyvolumesattached causes the detach |
I forgot to mention in the initial comment, the Pod does get deleted, which should trigger detach, however AD controller doesn't because it still sees the volume mounted in asw and says it's not safe to detach. This also starts the 6 minute force detach timer. It depends on VerifyVolumesAttached to override asw. |
It starts making sense now. I'm thinking if it can break anything in A/D controller. If A/D |
Looks like VolumesAreAttached calls DeleteVolumeNode, which completely removes it from asw, so you're right, Detach won't get called and VolumeAttachment will be leaked. One possibility is to not completely remove it from asw, but mark some special state. Another possibility is we add VolumeAttachment GC to the AD: kubernetes/kubernetes#77324 |
Another thought, should VolumeAttachment have ownerref to Node object so when the Node object gets deleted, we automatically trigger Detach? |
I'd prefer to have a proper fix instead of piling up hacks in A/D controller & external-attacher. Node object is deleted. How probable is it that the deleted node comes back and how quickly? If someone accidentally deletes a Node object (keeping kubelet running), how quickly is it re-created? |
VerifyVolumesAreAttached is still needed for other scenarios where node still exists but volume got detached out of band, or node gets recreated with the same name. The other thing I realized is that this problem only occurs if the Node is ungracefully deleted (without a drain). If someone deleted the Node object, kubelet only recreates it if it restarts. I think we should still let Pod GC invoke detach. Right now it is guarded by the existence of node, which is guarded by this check. I'm not sure what problems may happen if we delete the node from cache while we think pods are still attached to it. The other problem is that |
I discussed a bit with @saad-ali and he has concerns about using the volumeattachment.status.attached field to indicate that a volume is detached because it could also mean attach or detach could be in progress. If we want to fix this, we may need to think about having a new field that can actually indicate something is detached with nothing in progress. Also, we probably should revisit the logic in csi.VolumesAreAttached that uses the status.attached field, and also revisit other cases where we return attached = false. That causes the volume to be removed from asw. Could that be problematic if it's actually still attached? |
The only signal that a volume is fully detached is that We could add "detached and can't attach because node is gone" to |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/lifecycle frozen |
Any news on this, this is really annoying as there are so many opportunities where nodes are deleted:
|
@msau42 , isn't this issue resolved by kubernetes/kubernetes#96617 (which is the fix for kubernetes/kubernetes#77324)? |
Still ran into this running Kubernetes v1.21.3. The node was deleted and the VolumeAttachment was still around specifying the old node name. |
Those scenarios are graceful scenarios, where we highly recommend drain is done first. Draining first before deleting the node should solve at least the graceful case. Although there is a slight race that the drain process also needs to fully wait for volumes to be completely unmounted. For the ungraceful case, the 6 minute force detach should kick in |
@msau42 this happens every now and then in our clusters and 6 minutes is a long delay in a world with 99.95% and more uptime SLAs. Also note, that we already drain the nodes and still it might happen that a node terminates immediately without proper draining. |
facing the same issue in k8s 1.21.5 |
Facing the same issue with the latest EKS and ebs csi controller |
@jsafrane what do you think about that issue? My organization can help with the implementation |
Facing the issue with Kubernetes v1.24 as well. |
Still hitting this with EKS v1.26 |
Still hitting this with EKS v1.29. |
Need to wait about 6mins to attach volume
|
Maybe we can learn from this: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/release-1.29/cmd/hooks/prestop.go |
Does anyone have experience with this or know of any ongoing efforts to address it? AWS EKS 1.26, 1.27 and most likely 1.28 affected as well. |
We are still running AWS EKS 1.28 and this have been a problem for a long time. We are also running Karpenter which does kill of nodes all the time as scheduled. From time to time we see prometheus-pushgateway node getting killed. I'm unsure if this issue is strictly related to Kubernetes or if the solution is maybe to be found inside AWS ebs-csi-driver. We have had this issue since before 1.21. |
We encountered this issue during an in-place upgrade of our AWS EKS cluster from v1.29 to v1.30. The problem manifested when using HashiCorp Vault (v1.18.1), deployed as a Deployment with EBS volumes for persistent storage. Our EBS CSI driver version is aws-ebs-csi-driver:v1.37.0.
Reproduction Steps
Impact
|
In #184, we had decided that instead of marking the VolumeAttachment as detached, we would just requeue the volume to have the workqueue process it again.
However, this doesn't work in the case where the Node is deleted. In that scenario:
What should happen is:
I'm not sure the best way to fix step 2). Some suggestions I have in order of preference:
markAsDetached
ifcsiAttach
failed on the force sync.The text was updated successfully, but these errors were encountered: