-
Notifications
You must be signed in to change notification settings - Fork 815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PVC attaching takes much time #1302
Comments
I believe this is expected behavior, if you terminate your EC2 instance ungracefully (i.e. without safely evicting all of your pods through The 6 minute wait time before timing out is due to the default configuration of Attach/Detach controller timers. More specifically, ReconcilerMaxWaitForUnmountDuration which is the max amount of time the attach/detach controller waits for a volume to be safely unmounted from a node. After 6 minutes, the controller assumes the node is unresponsive and attempts to detach the volume. |
is ReconcilerMaxWaitForUnmountDuration configurable ? is it part of the control plane ? |
@amitde69 ReconcilerMaxWaitForUnmountDuration is currently not configurable: kubernetes/kubernetes#105341. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
+1 this makes some apps unusable (e.g. strimzi kafka operator) because of the frequent timeouts. |
I'm still seeing this |
I'm still seeing this |
Experiencing the same behavior, and this is very concerning as this is a clear regression compared to previous versions of EKS. We could observe the issue with a pod scheduled with Karpenter. I want to make it clear that the node was not terminated ungracefully. |
We're experiencing this on EKS 1.24 + EBS 1.12. We use node-termination-handler to handle the graceful draining of the nodes. We observed that the volume becomes detached from the old node through AWS console, but the attachdetach controller doesn't detach the volume on the kubernetes api. This seems to be an issue on attachdetach controller to be honest. I'll try to dig out what maybe happening. |
Same here. /remove-lifecycle stale |
Unfortunately, we see this problem as well, and it has a severe negative impact on our workload. Has anyone come up with a solution or workaround yet? |
/kind bug @torredil @ConnorJC3 comments + thumbs up deduplicated, there are now 12 people that found their way to this Github issue to report this exact same issue. And I suspect this will grow in number with EKS 1.22 ending support in June 2023 and EBS CSI Driver becoming mandatory in EKS 1.23. Can we please have an official word on this? Do you confirm the issue, or do you still think this is an expected, perfectly normal behavior? Do you maybe lack resource to look into this, should we alert our AWS account reps to get more attention on this? |
Hi all, I can't make any specific promises before I get a chance to look into it, but I will investigate this bug sometime during the next work week. In the meantime, can anyone affected here please clarify that this issue is happening to them (only) when they terminate the EC2 instance behind a node without draining it first? If you can reproduce this under other scenarios that are different, please provide information on how to do that, thanks! |
Thanks! Running a "kubectl drain " on a node running a stateful workload is enough to reproduce this issue. The node doesn't have to be terminated ungracefully. |
For me killing a pod of a stateful set with a PVC attached is enough already to reproduce. |
I have this issue every time when we starting process of nodes replacement when updating AWS launch templates. Killing a pods from StatefulSet usually not enough so we also kill pods with EBS CSI Controller and then issue disappears. |
Hi @ConnorJC3 Anything else you need, you let us know! |
Hi all, sorry about the somewhat late response. I have been trying to reproduce this locally via methods such as If anyone experiencing this issue not related to an ungraceful shutdown (ungraceful shutdowns causing this is expected behavior and out of our control) could provide one of the two, that would be very helpful:
|
Not got reproduction steps per se, but running stateful sets with pv/pvcs on karpenter on spot instances with a node lifespan TTL of 15mins is a nice way to force the issue to happen quickly, as nodes are constantly churning and re-attaching to the PVs takes a while. |
Using Spot instances on AWS and when an instance is terminated the same issue occurs EKS: 1.22 |
I don't know what changed exactly, but I cannot reproduce the issue after upgrading to the driver's latest version. Reverting to an older version like 1.12.1 and the issue is back. I will upgrade all production clusters to the latest version and keep you updated if the issue is back. |
@maximethebault After upgrading your production clusters, are you still seeing the issue? |
We want to verify whether VolumeAttachments are leaking, see kubernetes-sigs/aws-ebs-csi-driver#1302 (comment) Co-authored-by: Danail Branekov <[email protected]> Co-authored-by: Georgi Sabev <[email protected]>
We want to verify whether VolumeAttachments are leaking, see kubernetes-sigs/aws-ebs-csi-driver#1302 (comment) Co-authored-by: Danail Branekov <[email protected]> Co-authored-by: Georgi Sabev <[email protected]>
I could confirm that this issue is still exist with EKS 1.25 / 1.26 and latest EBS driver version 1.18.0. |
Hi all,
|
PVC attaching delay fixed by increasing attacher and provisioner qps and burst . |
Is this related? kubernetes-csi/external-attacher#215 |
In our tests, we were able to reproduce an issue where a volume remained attached to a node after the pod was deleted. This issue is relatively easy to reproduce (within a few hundred volume attachments) and would not clear itself without manual intervention. It's possible there are multiple issues which users are experiencing in this thread, however, this one was the most disruptive to our users. This reproduction is visible in the following places:
The following workarounds can be used to detach the volume from the node:
This issue is most likely to occur when a pod moves from one node to another, for example, when draining a node and during karpenter consolidation events. During investigation, we saw the following calls in CloudWatch to the API server from the kubelet:
What stood out here, was the sha256 hash being requested from the API, in place of the expected Delving into the kubelet logs, we can see logs similar to the following:
This is the point where the kubelet is trying to detach the volume and is failing when failing to read PR kubernetes/kubernetes#116138 removes the code path that is being taken by The invalid path was:
So the observed behaviour becomes:
The changes in the PR remove this erroneous code path, and return a It appears the EKS provided version of the kubelet is not patched with the PR commits to include the fix for this issue, otherwise, these log events would not occur. The PR states this fix has been applied to the upstream branches, and can be checked by looking at the specific
We have reproduced this behaviour in the following versions of the AWS kubelet, as recorded by the UserAgent string:
It looks like we will need AWS to rebuild the kubelet/AMIs with the required upstream patches to resolve this specific issue. To aid in reproduction, we have created a repository available here https://github.com/martysweet/eks-ebs-node-detachment-reproduction, which provisions a bare-bones eks cluster based off Terraform blueprints. The python script can then be used to cycle the pod between two nodes. It will typically take less than 60 minutes before it becomes stuck on all of the above versions. ~ edit
The other kubelet versions tested (using the default blueprint sourced AL2 AMIs) have patch versions lower than the PR mentions (ex . kubelet/v1.24.13 vs 1.24.14) |
I can confirm the issue of long re-atachments for the addon v1.19.0-eksbuild.2 from kubernetes version 1.23 upwards and found kind of an easier workaround as I was impatiently updating a cluster: I deleted the This bug is a real pain and in my opinion this should be a dealbreaker for everyone who tests eks thoroughly. |
I have same issue with gp2 volume in 1.24, 1.25, 1.26. Pod restart process takes from 3 to 20 minutes, even if node hasn't changed. I had opened case and Support engineer from AWS suggested to add This option helped me and pod started to run without significant delays but only if node hasn't been changed. The same problem happens when node terminated gracefully by Karpenter, as AWS support engineer described - it is a race condition which is related to Karpenter. Issue: #1665 I tried to Drain node manually before deletion and deleted it. This helped and there were no re-attaching issues for my pod from StatefulSet. Probably this could be temporary fixed by integrating node-termination-handler. |
For whoever is still impacted by this behaviour - make sure to update to the latest ebs csi driver version. |
Problem still exists on EKS v1.27.4-eks-cedffd4, Karpenter v0.30.0, aws-ebs-csi-driver v1.23.0 |
@TJCNI The recently introduced PreStop lifecycle hook feature aims to address delayed attachments as a result of the driver node pod getting terminated before workload volumes are cleanly unmounted during graceful termination events. The hook logic will only execute if a node is being drained. If you happen to be using spot instances along with Karpenter, you'll want to enable the Native Spot Termination Handling feature introduced in aws/karpenter-provider-aws#2546 (disabled by default):
For more details see Deprovisioning nodes from the Karpenter documentation. Otherwise, if your instances are indeed being gracefully terminated and are running into delayed attachments issues I highly encourage you to create a new issue describing your workflow + attaching driver logs (please include |
We have Karpenter termination handling setup and still experience this in EKS 1.27, ebs-csi-driver 1.23, Karpenter 0.30.0. Based on logs (and probably to no one's surprise) the driver is failing to detatch the volume. I also see is that the node in question becomes
|
@hobbsh EBS detaches rely on the EC2 instance's cooperation (if it is still online) - the OS must "give up" the volume, traditionally done on *nix OSes by Thus, if your node breaks in such a way that the volume isn't detached on the node side (for example, because the Kubelet is broken and never asks the EBS CSI Driver to unmount the volume) - but the node is still running, the EBS CSI Driver will get stuck indefinitely trying to detach. This is intentional design, as taking the volume away from a running node without it's cooperation (what EBS calls a "force detach") can cause data loss. The EBS CSI Driver will never perform a force detach on your behalf because of the risks associated. The feature that @torredil mentions above is only applicable when nodes are gracefully terminated (i.e. drained before shutdown). If your nodes are not being gracefully terminated, you'll need to work with Karpenter to figure out why that is occurring. |
Anyone on K8S and CSI versions managed to solve it and can share? |
So how are unrecoverable instance failures supposed to be handled? Things like hardware failures are individually rare, but across a large fleet of nodes become a common enough occurrence. In these cases the volumes are never going to be gracefully unmounted, and we're left to either wait many minutes for some timeouts to be hit on the AWS side and the instance to stop, or to manually force-detach the volumes. Either way some data loss is inevitable, but requiring a human to intervene and force-detach the volumes is generally not what we want to be doing. And waiting for the timeouts to be hit usually means more downtime than we want (for the services we have not yet managed to make HA). |
As a first step, ensure that Graceful Node Shutdown is enabled in your cluster by configuring Doing the above will allow the driver's preStop hook to execute even during unexpected shutdowns, and thus prevent the node pod from terminating for a period of 30 seconds, given there are That said, in rare corner cases where Kubelet's Node Shutdown Manager is unable to detect shutdown events and delay it by the
A different perspective on this: data loss / corruption of any kind is unacceptable. Delayed attachments are certainly highly undesirable and indicative of room for improvements, but potentially compromising a user's data is never the right tradeoff. Relevant to this discussion, Sig-Storage is currently working on a feature that allows for disabling the 6 minute force detach timer in A/D controller (doing so would be the equivalent of replacing 6 minutes with "infinity", requiring operator intervention by instructing Kubernetes that the node has really been shut-down): kubernetes/kubernetes#120328 To the best of my knowledge, the issue of delayed attachments as a result of graceful termination workflows has been addressed. If anyone runs into issues using the latest version of the driver, please report it and be sure to include all relevant context / information that may be helpful in tracking down the root cause so that we can address that promptly for you. Thank you! /close |
@torredil: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Duplicating case #1279 as we observe the same issue when detaching/deleting and attaching PVC to a new node takes more than 6 min and it's followed by a timeout
What happened?
EBS CSI Controller logs
Failure to Delete EBS volume after Detach
What you expected to happen?
It should not take so much time for PVC detaching and attaching to new k8s node. Compare to default Kubernetes Storage driver (gp2) where I do not see this issue it take a lot of time
How to reproduce it (as minimally and precisely as possible)?
You just need to terminate EC2 instance (k8s node) directly which statefulset is running on
Anything else we need to know?:
Environment
kubectl version
): 1.22The text was updated successfully, but these errors were encountered: