-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: context dealing exceeded due to metrics-server pod not terminating on terraform destroy #353
Comments
Terraform failed to destroy metrics-server because of a dependency issue in the getting started example where the VPC resources were getting deleted before the EKS cluster was completely deleted. #356 resolves this. Since it's merged I'll close this ticket now. |
Still getting a similar issue regarding |
Hi @danvau7, thanks for following up on this. We're working on a list of known issues, including this one. We're also working on a fix for this problem in particular, but don't have an estimate for when that will be pushed out. In the meantime, if you are trying to deploy the examples you can avoid the issue by deploying the VPC separately from the rest of the example. |
Setting the following in the
Also modifying the helm configs such as in the following also helps. I change the default value from 1200 -> 3600.
|
Sorry for replying late to this. The ideal destroy order would be:
You can use the Here's what I suggest to check next time you are facing this issue:
If any of the above is true, check terraform destroy order, and you may see that VPC resources have been deleted before the addons, which leads to nodes unhealthy and overall cluster in a state that addons can't be deleted properly (this may be true NOT just for metrics-server, but other addons too). |
Sadly, I am also facing this issue:
I am using version 3.3.0 of the EKS Blueprints module. |
The Pods hang in Terminating state. Here is the describe pod output of the nginx controller:
Thats the output of kubectl get nodes:
After running the following command for the hanging pods, I was able to complete terraform destroy: kubectl delete pod --grace-period=0 --force --namespace nginx ingress-nginx-controller-559c9cc878-wgg42 |
@Zvikan do you have a hint for me how I can fix it? |
Hey @schwichti , I can see that your nodes are in status "NotReady" , this may be similar issue as I explained above. Did you try to do a cleanup/destroy process using the |
@Zvikan I am not using any example, but created my own module based eks blueprints: https://github.com/dspace-group/simphera-reference-architecture-aws/tree/feat_update_dependencies . In fact, I am able to complete when I delete the Helm charts manually (see above), therefore I do not have the need to use I think you are right that VPC resources were deleted before the addons, but I cannot see why that is the case. In https://github.com/aws-ia/terraform-aws-eks-blueprints/pull/356/files you have added an explicit dependency from the blueprints module to VPC. I do not see why is necessary, because there is an implicit dependency. It also appears that this explicit dependency was removed again from the examples. |
@schwichti the implicit dependency is just for several of the VPC resources like the subnet ids, but what can cause this domino effect may be your NAT gateway, or VPC related resources that leads the nodes to be in unstable state and therefore TF not being able to cleanup the addons properly. And we've removed the explicit dependency between addons and/or accelerator (now known as EKS blueprint) to VPC (due to upstream changes in newer versions where we can't use We've been trying to keep a single TF apply &/ destroy without any issues, but we entered the dependency rabbit hole and faced the following issues:
And more. So how do we go from here? we've been thinking alot and game into a decision that the best next steps is to take a step back, and suggest to deploy modules in the correct order via |
@Zvikan do I get you right that adding an explicit dependency to the vpc resources could solve the issue in my case (coming at the price of slowing down)? Using the |
@schwichti Yes, by adding explicit dependency in your case you control the deployment (apply/destroy) flow, achieving what I've said above. |
do think a destroy that includes of a EKS cluster should without being told automatically delete the add-ons first. |
Hi, i started playing with eks-blueprints a couple of weeks ago and ran into similar problems to this and #524 . So i manually ran Main Problem seems to be: Helm-destruction is immediate and terraform continoues which causes problems.. only solution i've seen so far is a fixed wait-timout.. ;( |
Updated k8s.gcr.io references to registry.k8s.io
Welcome to Amazon SSP EKS Accelerator!
Amazon EKS Accelerator Release version
3.5
What is your environment, configuration and the example used?
Terraform v1.0.10
on darwin_amd64
Using the getting started guide
What did you do and What did you see instead?
What did I want to do?
Tried to terraform destroy the cluster
What did I expect:
The cluster to be deprovisioned
What happened?
Terraform destroy failed with error:
context deadline exceeded
Exploring the cluster, the error took place because the metrics-server pod was stuck in terminating state
Once the pod was deleted forcefully (
kubectl delete pod metrics-server-694d47d564-4xv72 --grace-period=0 --force -n metrics-server
), and then the related namespace was deleted, Terraform was able to destroy the clusterAdditional Information
No response
The text was updated successfully, but these errors were encountered: