-
Notifications
You must be signed in to change notification settings - Fork 987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
terraform refresh attempts to dial localhost (reopening with workaround) #1028
Comments
Thanks for the provided workaround. We are also hitting this bug from time to time. I tried the parallelism approach and did not see the 'localhost issue' again. However, we went into a different issue with this. I would love to know the reason why this bug happens at all (and why it can be mitigated by reducing the terraform threads). We are creating a kubeconfig file before we run Here is our providers.tf: provider "kubernetes" {
config_path = var.kubeconfig
}
provider "helm" {
kubernetes {
config_path = var.kubeconfig
}
version = ">= 1.2.1"
} |
it happens to us too :( |
to be specific, in my case it happens during
If I run the same command (
|
Interesting @igoooor , does it also try to connect to localhost in your case? We have a similiar issue, like the one you describe, but in those cases we just get a 'permission denied' message (no indication that it tries to connect to localhost). If we use |
in my case I get the localhost error yes, when refreshing only
If I replace my provider config and use variables (for |
Yeah, this might be the case. In most of our cases we are not using terraform data sources to fill in the access credentials, but we are still experiencing this bug. I am currently checking if I get the same issue when not using a created kubeconfig file, but passing the client_certificate, client_key, etc instead directly via variables to the provider. |
This only happens to me since I update to terraform 13 today. |
I'm unable to reproduce this scenario. To me import seems to work as expected. @igoooor Is the cluster referred to by Also, everyone else, please post the versions of Terraform and provider you used. |
It is already present, before starting the terraform command. |
Alright, thanks for clarifying that. |
it works via kubeconfig and via parameters set for And again, it only fails during refresh, if I |
it also happens when I'm using a resource instead of a data. |
For information, both workarounds doesn't work when using the remote backend:
|
I am experiencing the same issue @igoooor. The only difference is that I am using digitalocean instead of google. |
I ran into this issue as well. What appears to have happened in my case is that I had originally created a kubernetes_secret resource in a modules main.tf. Things changed and that was removed since it was no longer needed. When the refresh happened, I guess because the original resource didn't exist, it ignores any configuration for the kubernetes provider and always tries to use localhost. Without looking at the code, I'd say if there's a secret (or maybe other k8s resource) in the state file (we use remote state), but the definition for that resource is removed, then this will happen (but that's just a guess) Our fix is to simply remove that resource from the state manually, and then manually clean up the resource. |
Same problem here, any workarounds available ? |
@pduchnovsky on Terraform Cloud, you should still be able to use the above workarounds.
|
To be honest, this workaround is not really acceptable, e.g. I am creating single GKE cluster with two non-default node pools of which one is GPU enabled.. then I deploy around 10 kubernetes_deployment(s) of which one is created in average of 8 minutes (big images) and it would take AGES to deploy/update those if I set parallelism to 1. So for the time being I made a workaround that after cluster is created I extract it's IP and cert to variables and then use those as a reference.. of course now I cannot change the cluster itself, but that's not something we do often. Looking forward for when the PR #1078 is merged. |
Also ran into this. Was able to work around it |
I am not sure if this issue is related, but I have figured out that if a cluster needs recreation, then the output of host/ca certificate/etc. would be empty. This would result in the empty values being passed to the provider which results in connection to localhost error that hides the original fact that cluster needs recreation. So you see the "localhost" error in the output, while the original problem that is hidden is that cluster will be re-created (meaning there is no host to connect to obviously before it is created). See this issue for more details. |
I am also experiencing the same issue as @ilya-git where the refresh/plan will fail if a cluster that is referenced in a dynamic provider configuration needs to be recreated.
This reliably produces an error message similar to the one in the initial comment on this issue, on the first attempt to refresh a resource using the kubernetes provider. Targeting the cluster in a "first pass" and then proceeding with the rest appears to be a viable workaround; i.e. hashicorp/terraform#4149 would appear to be a viable fix. |
I hit the problem by destroying an AKS Cluster. I'm passing the Kubernetes configuration from the state. Terraform v1.0.0
|
@Krebsmar I've hit the same issue now with AKS cluster. Did you manage to resolve this? |
The workaround does not work for me sadly :/ |
Seeing this same issue with the EKS module: https://github.com/terraform-aws-modules/terraform-aws-eks Initial apply works fine, subsequent changes to the cluster fail with terraform attempting to connect to the k8s api on localhost. paralellism workaround has no effect. |
I've been using the Kubernetes provider version 2.4.1 and none of the above solutions works for me. My configuration use the gke_auth module to get the cluster configuration. Set the parallelism to 1 and avoid the use of kubeconfig and move to a lower version of the provider fixed the issue, now I'm using the version 2.3.2 My provider config: `
} provider "kubernetes" { |
For EKS, you can try to use an |
I have the same problem when trying to import an existing namespace from a cluster :
The provider is configured with the credentials from another resource in another module and it's fine for the other modules and resources deployed. The namespace already contains installed software I can't delete/redeploy now as it's actively used by multiple agencies. And anyway, this action should not fail. We approach two years old, and no one was able to track down the problem ? |
No idea what the issue is here, but heads up that |
I am using GKE autopilot. I have a single deployment of resources. The provider is generated by a module: provider "kubernetes" {
host = module.kubernetes.provider_config.host
token = module.kubernetes.provider_config.token
cluster_ca_certificate = module.kubernetes.provider_config.cluster_ca_certificate
} versions:
Just want to state that under normal usage, I do not experience any issues regarding the provider setup being incorrect. I began having the issue of dial out as soon as I added a Of the solutions proposed above, the only one I did not try was Again, I am only experiencing this issue when adding the This would be a major issue for teams running stateful set applications as the |
Just throwing another workaround in the mix for people who experience this issue. I couldn't work around the
What has worked for me is exporting
|
This comment suspects that the use of multiple clusters caused the "localhost" connection issue during TF refresh: hashicorp/terraform-provider-kubernetes#1028 (comment)
The problem: Because we're creating the cluster AND its K8s resources in the same Terraform config, the K8s provider (e.g., during terraform destroy and the 2nd terraform apply) will connect to localhost. This comment provides a reasonable suspicion for root cause: hashicorp/terraform-provider-kubernetes#1028 (comment) The fix: We make Terraform rewrite the kubernetes_provider.tf file with hard-coded cluster credentials. I've used "MyDelimiterWordForMultiLineString" as the delimiter for the heredocs (multi-line string). I avoided the conventional "EOT" (end of text) delimiter because it might show up in the cluster_ca_certificate value. This commit hopes to fix: hashicorp/terraform-provider-kubernetes#1028
Justing sharing another (very hacky) workaround... :)
Workaround: Write file with hard-coded cluster credentials for
|
This comment suspects that the use of multiple clusters caused the "localhost" connection issue during TF refresh: hashicorp/terraform-provider-kubernetes#1028 (comment)
The problem: Because we're creating the cluster AND its K8s resources in the same Terraform config, the K8s provider (e.g., during terraform destroy and the 2nd terraform apply) will connect to localhost. This comment provides a reasonable suspicion for root cause: hashicorp/terraform-provider-kubernetes#1028 (comment) The fix: We make Terraform rewrite the kubernetes_provider.tf file with hard-coded cluster credentials. I've used "MyDelimiterWordForMultiLineString" as the delimiter for the heredocs (multi-line string). I avoided the conventional "EOT" (end of text) delimiter because it might show up in the cluster_ca_certificate value. This commit hopes to fix: hashicorp/terraform-provider-kubernetes#1028
Ran into the same problem. This has worked before without issue; I'm getting bent around where the problem is :/ |
I have the same problem with digitalocean k8s cluster.
It is interesting that I do not have that problem if Terraform use manifest files like that:
|
I've found the following works for me. My config looks something like the following.
|
Do we have any other workarounds? Running into this on version 2.23.0. Even if I specify the env variable KUBE_CONFIG_PATH=path/to/kubeconfig (of course the path is just dummy in this case) Still on the first try it runs against localhost. |
Just hit this bug working with managed kubernetes cluster on UpCloud. In this case, the upcloud cli can provide a kubectl config: upctl kubernetes config "<uuid of cluster>" --write pilot_kubeconfig.yaml Which is needed for both workarunds. Note that neither of the following are really work arounds - one can't get the connection information from Terraform, which is the real intent of the config using data-sources as commented out below. option one - changing hcl: provider "kubernetes" {
# Does not work due to bug:
#
# host = data.upcloud_kubernetes_cluster.pilot.host
# client_certificate = data.upcloud_kubernetes_cluster.pilot.client_certificate
# client_key = data.upcloud_kubernetes_cluster.pilot.client_key
# cluster_ca_certificate = data.upcloud_kubernetes_cluster.pilot.cluster_ca_certificate
config_path = "path/pilot_kubeconfig.yaml"
} and option two - proxy (need kubectl config for access): kubectl proxy --port=8080
sudo kubectl proxy --port=80 I'm on MacOS - it might be possible to eschew sudo for packet filter hackery - but I haven't tried - and anyway it would effectively allow anything to bind to port 80 (for Linux see setcap, eg: The config for pf is stored in /etc/pf.conf - I believe that you can simply add rules to this file and reload pf for them to take effect. # Not tested!
echo "rdr pass inet proto tcp from any to any port 80 -> 127.0.0.1 port 8080" sudo tee /etc/pf.conf
sudo pfctl -F all -ef /etc/pf.conf |
This approach fails when I want to update GKE cluster resource (like |
I stumbled on this issue today as well. Using the |
|
This is a re-opening of #546
Occasionally, the kubernetes provider will start dialing localhost instead of the configured kubeconfig context.
In the instance of this problem that I ran into, the reason was: multiple terraform threads opening and writing the kubeconfig file without synchronization, which resulted in a messed-up kubeconfig file. This might have been related to the fact that my terraform config included multiple clusters (using this approach)
Workaround
I was able to make this go away by setting:
-parallelism=1
The text was updated successfully, but these errors were encountered: