-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tf destroy fails to remove aws_auth: unauthorized #1162
Comments
I'm also getting this error... using a bare-bones install:
my terraform destroy stops with an Unauthorized error
with TF_LOG=TRACE, I can see that:
Terraform version: v0.14.2 It is also strange that it is querying localhost. There also seems to be an order of operations issue here, since the cluster is gone, but TF state still shows a ConfigMap remaining. Workaround: remove the state manually
did the trick for now until the bug is resolved. |
Same error here, we need to change |
I experienced this same issue. After retrying the
In my case, I also noticed that Terraform is trying to connect to Kubernetes at localhost, while it should connect to EKS. |
@SirBarksALot Good that you noticed that Your workaround with |
I am running into the same and related issues destroying this module with the terraform:light docker image. I believe this is related to #978 , but I have not found any workarounds that work for automation purposes. |
@TjeuKayim I am running into the same thing. I believe the comment from @SirBarksALot was really for the change inside the model. For your kubernetes_ingress resource try adding the depends_on |
I have tried many things over the last week and I came to a conclusion that it is best to create the auth config map without a help of this module (yourself) while setting |
@TjeuKayim do not worry about the loadbalancer (and probably security group) that (I assume) ingress-nginx installation creates. If we solve the dependency/auth config map problem and the helm/k8s resources will be deleted before eks, the lb and sg will be destroyed too. Just keep in mind the destruction of lb and sg takes a few seconds during which we should not destroy eks. For that I have created null_resource that awaits lb (just have to copy it and revert for destruction xD). |
Just wanted to add that our team is also experiencing this issue as well with the basic terraform example.. seems to be around 40% failure rate on destroy we get the "Unauthorized" error.. the rest of the time it works perfectly. Unfortunately this makes a cicd process very difficult so we are very eager to hear about any solutions. For those of you who are using
My concern is of course do you find it's cleaning up all the terraform created resources I often had to go back into AWS manually unless I ran 'destroy' multiple times. Thanks! |
@JohnPolansky same here, sometimes it works and sometimes it doesn't. I have a feeling that if you create a cluster and immediately destroy it then it works, however if you wait a bit the auth config is not working, I have read somewhere that this config map might have a timer? Do you exprience the same thing John? |
We can try to trace down what commit to the Terraform repository exactly caused the regression. I know for sure that v0.14.3 is affected and that v0.13.5 is not affected by this issue. I didn't test the versions in between. And @spaziran was using v0.14.2. Has anyone here experienced the issue with other Terraform versions? Are v0.14.{0,1} and v0.13.6 affected? |
@SirBarksALot I've seen the issue on both an example where I created the cluster then within ~1 min destroy'd it .. and I've seen the issue where I created the cluster then ~2 hours later destroyed it.. and I've seen it succeed in both cases. It's very weird. I did also read somewhere there the terraform auth is only good for 15 mins.. but I don't think that applies here as my destroy's fail after ~5-7mins. @TjeuKayim My co-worker is on 0.14.4 and I'm on 0.14.3 and we've been experienced the "unauthorized/configmap/aws-auth" issue. We are both very eager to resolve this so if you are looking for testers when the time comes, count us in. |
+1 on this issue. myself and one other person both hit this issue using latest version of TF |
@TjeuKayim preliminary testing shows that 0.13.6 and 0.14.0 are NOT affected. Therefore an issue would appear in v0.14.1. |
@panaut0lordv @TjeuKayim on 6th try with v0.14.0 I've got this error, v0.13.6 seems fine so far (sample size 10). |
@MateuszMalkiewicz - I can confirm that my destroy are failing with not authorized in ~3-7 mins of starting them.. so no I wouldn't say it's a "longer period". As far as the versions.. it's very hard to be sure.. because "sometimes" it will succeed, i've had as many as 5 in a row destroy perfectly. I've always done the create/destroy actions right after each other, and also done them with 1hr our apart and had the destroy fail. Hope this helps. |
it seems like there's a race condition where the cluster is destroyed before all of it's dependencies. in our case it was things like kubernetes_namespaces, cluster role bindings, config_map, and the cluster_role that were getting "stranded", because the cluster itself was already gone. maybe a conditional of some sort could be added to ensure all the cluster parts are destroyed before destroying the cluster itself? i have no idea how complicated that would be, sorry if i'm over-simplifying this. |
We've managed to figure this out. In terraform 0.14+ destroy command no loner refreshes the state of resources before generating execution plan (like it did in 0.13.X). Solution to it is just simply run |
This seems plausible, but are you able to reproduce the success of |
@clebio for now the answer is yes. On refresh we get a new token and everything's fine and dandy… Well, if you're good with refreshing first. I have yet to test on some pipeline-like scenario (init and destroy). |
So I thought I would chime in to say thanks for @MateuszMalkiewicz suggestion for the Obviously this feels more like a workaround than a fix, it seems like Terraform should be handling this for us but it is useful, Thanks to everyone participating in this. |
Doing |
Yes, seeing this same issue while running the destroy in CI/CD with Terrafrom 14.5, the workaround which is suggested above is kind of solving the issue in the pipeline. |
I think we should put some note in the documentation stating that this part is extremely error prone. I am very thankful for the module you've guys created and continue supporting as we have already made 40-50 EKS installations with it, but this part is continuously the biggest issue for months/years (and looks like it is simply some terraform architecture problem). Managing it outside of terraform should probably be the most stable solution. |
This seems to be a terraform (and not a module / provider) issue. I experience same problem ok Google / GKE. |
I have this issue with Terraform v0.14.7 |
I've had a 100% failure rate with terraform 1.14.7. Every single time I destroy I get this issue. I found that running
ahead of the destroy works 100% of the time. The config map just exists on the cluster so when the cluster is destroyed so is the config map. I would love to not have to remember to run this command. |
Terraform v0.14.6 is also affected by this issue |
Facing same issue in terraform version 0.14.4 |
I've seen this a lot in my work with the Kubernetes provider. The problem is that the data source containing the EKS credentials isn't being refreshed prior to destroy, and so the Kubernetes provider uses default values (like localhost) to attempt to connect to the cluster. The fix for this has merged upstream. It's available in starting in terraform 0.15-beta1. |
@dak1n1 huh, hope we finally get this fixed. EDIT : Terraform v0.14.7
|
@kaykhancheckpoint, as mensioned earlier by dak1n1 (see #1162 (comment)), you might want to try I did not tried it yet but I hope it helps 😄 |
Indeed, terraform v0.15.0 does fix the issue of For example, we have a CI/CD pipeline which only runs terraform apply/delete after someone clicked on a trigger button (after he /she reviewed the planned changes from terraform plan). In case the review takes long the token expires. Please note that the kubernetes provider documentation mentions to use exec plugin to fix such issues:
But this requires to have a full blown aws cli available where you run terraform. As we do not want this in our CI pipeline (129MB + UPDATE: moved my proj to gitlab so, updated the link |
You can also use the aws-iam-authenticator binary. It's probably a bit safer to have in CI than the full Looks like the binary is about 39MB in size. https://github.com/kubernetes-sigs/aws-iam-authenticator/releases/tag/v0.5.2 |
We use |
@dak1n1, good to know that. I did not look into what can the binary do but actually, I am using their go token package to generate and fetch the token. :) |
- Remove the retries for destroying components in eks - Add terraform refresh for destroying eks, due to config_auth Unauthorized issue terraform-aws-modules/terraform-aws-eks#1162
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had recent activity since being marked as stale. |
Hi there, seems same problem exist on latest terraform 1.0.9 (terraform cloud): {"@Level":"error","@message":"Error: Unauthorized","@module":"terraform.ui","@timestamp":"2021-10-25T17:55:29.727927Z","diagnostic":{"severity":"error","summary":"Unauthorized","detail":""},"type":"diagnostic"} {"@Level":"info","@message":"module.eks.module.eks.kubernetes_config_map.aws_auth[0]: Destruction errored after 0s","@module":"terraform.ui","@timestamp":"2021-10-25T17:55:29.079089Z","hook":{"resource":{"addr":"module.eks.module.eks.kubernetes_config_map.aws_auth[0]","module":"module.eks.module.eks","resource":"kubernetes_config_map.aws_auth[0]","implied_provider":"kubernetes","resource_type":"kubernetes_config_map","resource_name":"aws_auth","resource_key":0},"action":"delete","elapsed_seconds":0},"type":"apply_errored"} What plugin/terraform version the fix will be delivered? |
I am encountering the same problem, but not using the provided module. Rather I am using a module I created myself, which has a few boolean flags to deploy certain kubernetes resources with the kubernetes terraform provider. When the kubernetes provider is used in the module, it relies upon this block:
Several kubernetes resources are destroyed, however, I keep seeing these issues on two resources:
To be specific:
The first time the destroy fails, the EKS cluster is still there. The EKS cluster therefore only gets destroyed the second time I run Any advice would be greatly appreciated. |
if you creating any k8s module outside of this module you must put explicit dependency via |
@daroga0002 I already use the |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
I have issues
I'm submitting a...
What is the current behavior?
When I do a destroy operation, I receive
The only remaining piece of state is the aws_auth module:
Environment details
The text was updated successfully, but these errors were encountered: