Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terraform refresh attempts to dial localhost (reopening with workaround) #1028

Open
konryd opened this issue Oct 6, 2020 · 50 comments
Open

Comments

@konryd
Copy link

konryd commented Oct 6, 2020

This is a re-opening of #546

Occasionally, the kubernetes provider will start dialing localhost instead of the configured kubeconfig context.

Error: Get http://localhost/api/v1/namespaces/prometheus: dial tcp 127.0.0.1:80: connect: connection refused
Error: Get http://localhost/api/v1/namespaces/debug: dial tcp 127.0.0.1:80: connect: connection refused

In the instance of this problem that I ran into, the reason was: multiple terraform threads opening and writing the kubeconfig file without synchronization, which resulted in a messed-up kubeconfig file. This might have been related to the fact that my terraform config included multiple clusters (using this approach)

Workaround

I was able to make this go away by setting: -parallelism=1

@konryd konryd added the bug label Oct 6, 2020
@thirdeyenick
Copy link

Thanks for the provided workaround. We are also hitting this bug from time to time. I tried the parallelism approach and did not see the 'localhost issue' again. However, we went into a different issue with this.

I would love to know the reason why this bug happens at all (and why it can be mitigated by reducing the terraform threads). We are creating a kubeconfig file before we run terraform apply and are passing the path to it as a variable in our terraform modules. The kubernetes provider then just uses this path via var.kubeconfig. Still, from time to time it happens that the k8s provider wants to connect to localhost, although our file exists and the content is valid.

Here is our providers.tf:

provider "kubernetes" {
  config_path = var.kubeconfig
}

provider "helm" {
  kubernetes {
    config_path = var.kubeconfig
  }
  version = ">= 1.2.1"
}

@igoooor
Copy link

igoooor commented Oct 27, 2020

it happens to us too :(

@igoooor
Copy link

igoooor commented Oct 27, 2020

to be specific, in my case it happens during Refreshing state and my provider looks like this

provider "kubernetes" {
  load_config_file = false

  host                   = "https://${data.google_container_cluster.this.0.endpoint}"
  client_certificate     = data.google_container_cluster.this.0.master_auth.0.client_certificate
  client_key             = data.google_container_cluster.this.0.master_auth.0.client_key
  cluster_ca_certificate = data.google_container_cluster.this.0.master_auth.0.cluster_ca_certificate
}

If I run the same command (apply or destroy) with -refresh=false then it works fine

-parallelism=1 is not helping for me, the error is happening constantly.

@thirdeyenick
Copy link

Interesting @igoooor , does it also try to connect to localhost in your case? We have a similiar issue, like the one you describe, but in those cases we just get a 'permission denied' message (no indication that it tries to connect to localhost). If we use -refresh=false then everyhing works. I have the feeling that terraform uses an old client certificate which is not valid anymore (maybe cached in the state?).

@igoooor
Copy link

igoooor commented Oct 27, 2020

in my case I get the localhost error yes, when refreshing only

Error: Get http://localhost/api/v1/namespaces/xxx: dial tcp 127.0.0.1:80: connect: connection refused

If I replace my provider config and use variables (for host, client_certificate, etc...) instead of data.google_container_cluster... then it also works at refresh time.
It seems like when refreshing the state, it does not load values from data.google_container_cluster....

@thirdeyenick
Copy link

Yeah, this might be the case. In most of our cases we are not using terraform data sources to fill in the access credentials, but we are still experiencing this bug. I am currently checking if I get the same issue when not using a created kubeconfig file, but passing the client_certificate, client_key, etc instead directly via variables to the provider.

@igoooor
Copy link

igoooor commented Oct 27, 2020

This only happens to me since I update to terraform 13 today.
I stayed on terraform 12 until now because of some other stuff, and I never had this problem. Only now with the latest version

@alexsomesan
Copy link
Member

I'm unable to reproduce this scenario. To me import seems to work as expected.

@igoooor Is the cluster referred to by data.google_container_cluster.this in you case already present or are you also creating the cluster in that same apply operation?

Also, everyone else, please post the versions of Terraform and provider you used.

@igoooor
Copy link

igoooor commented Oct 28, 2020

It is already present, before starting the terraform command.

@alexsomesan
Copy link
Member

Alright, thanks for clarifying that.
Have you tried to A-B test with providing the credentials from that same cluster via a kubeconfig file?

@igoooor
Copy link

igoooor commented Oct 28, 2020

it works via kubeconfig and via parameters set for host, client_certificate, etc..
but it does not work when host, client_certificate, etc.. are set from the data.google_container_cluster.this

And again, it only fails during refresh, if I apply -refresh=false then it works.

@igoooor
Copy link

igoooor commented Nov 5, 2020

it also happens when I'm using a resource instead of a data.
Of course not the first time I apply when it creates the cluster, but afterwards if I apply again, it will try to refresh, and there it will fail as well with the same error

@sereinity
Copy link

For information, both workarounds doesn't work when using the remote backend:

Error: Custom parallelism values are currently not supported

The "remote" backend does not support setting a custom parallelism value at
this time.
Error: Planning without refresh is currently not supported

Currently the "remote" backend will always do an in-memory refresh of the
Terraform state prior to generating the plan.

@sodre
Copy link

sodre commented Dec 19, 2020

I am experiencing the same issue @igoooor. The only difference is that I am using digitalocean instead of google.

@holleyism
Copy link

I ran into this issue as well. What appears to have happened in my case is that I had originally created a kubernetes_secret resource in a modules main.tf. Things changed and that was removed since it was no longer needed. When the refresh happened, I guess because the original resource didn't exist, it ignores any configuration for the kubernetes provider and always tries to use localhost. Without looking at the code, I'd say if there's a secret (or maybe other k8s resource) in the state file (we use remote state), but the definition for that resource is removed, then this will happen (but that's just a guess)

Our fix is to simply remove that resource from the state manually, and then manually clean up the resource.

@pduchnovsky
Copy link

Same problem here, any workarounds available ?
Any of the above is irrelevant for terraform cloud unfortunately.

@jacobwgillespie
Copy link

@pduchnovsky on Terraform Cloud, you should still be able to use the above workarounds.

  1. To set -parallelism=1, you would add an environment variable named TFE_PARALLELISM and set it to 1. (see https://www.terraform.io/docs/cloud/workspaces/variables.html#special-environment-variables)
  2. All of the terraform state subcommands still work with Terraform Cloud, so if you wanted to manually delete a resource from the state for manual cleanup, you can run terraform state rm path.to.kubernetes_resource.name locally and it will update Cloud. Similarly terraform state pull and terraform state push also work, if you needed to pull the entire state file down from Terraform Cloud.

@pduchnovsky
Copy link

pduchnovsky commented Mar 3, 2021

@pduchnovsky on Terraform Cloud, you should still be able to use the above workarounds.

  1. To set -parallelism=1, you would add an environment variable named TFE_PARALLELISM and set it to 1. (see https://www.terraform.io/docs/cloud/workspaces/variables.html#special-environment-variables)
  2. All of the terraform state subcommands still work with Terraform Cloud, so if you wanted to manually delete a resource from the state for manual cleanup, you can run terraform state rm path.to.kubernetes_resource.name locally and it will update Cloud. Similarly terraform state pull and terraform state push also work, if you needed to pull the entire state file down from Terraform Cloud.

To be honest, this workaround is not really acceptable, e.g. I am creating single GKE cluster with two non-default node pools of which one is GPU enabled.. then I deploy around 10 kubernetes_deployment(s) of which one is created in average of 8 minutes (big images) and it would take AGES to deploy/update those if I set parallelism to 1.
I 'could' use the older version of this provider but it doesn't work with taint "nvidia.com/gpu" since it has a dot and a slash in the name..

So for the time being I made a workaround that after cluster is created I extract it's IP and cert to variables and then use those as a reference.. of course now I cannot change the cluster itself, but that's not something we do often.

Looking forward for when the PR #1078 is merged.

@RichiCoder1
Copy link

Also ran into this. Was able to work around it KUBECONFIG, but that has the exact issue stated above where I can no longer re-create my cluster as it depends on hard-coded/pre-existing variables instead of runtime variables.

@ilya-git
Copy link

I am not sure if this issue is related, but I have figured out that if a cluster needs recreation, then the output of host/ca certificate/etc. would be empty. This would result in the empty values being passed to the provider which results in connection to localhost error that hides the original fact that cluster needs recreation.

So you see the "localhost" error in the output, while the original problem that is hidden is that cluster will be re-created (meaning there is no host to connect to obviously before it is created).

See this issue for more details.

@alex-shafer-ceshop
Copy link

alex-shafer-ceshop commented Apr 5, 2021

I am also experiencing the same issue as @ilya-git where the refresh/plan will fail if a cluster that is referenced in a dynamic provider configuration needs to be recreated.

resource "aws_eks_cluster" "main" {
  # contains some update that requires a destroy/create rather than a modify
}

provider "kubernetes" {
  host = aws_eks_cluster.main.endpoint
  ...
}

This reliably produces an error message similar to the one in the initial comment on this issue, on the first attempt to refresh a resource using the kubernetes provider.

Targeting the cluster in a "first pass" and then proceeding with the rest appears to be a viable workaround; i.e. hashicorp/terraform#4149 would appear to be a viable fix.

@Krebsmar
Copy link

Krebsmar commented Jul 13, 2021

I hit the problem by destroying an AKS Cluster. I'm passing the Kubernetes configuration from the state.
provider "kubernetes" { host = azurerm_kubernetes_cluster.aks.kube_config.0.host client_key = base64decode(azurerm_kubernetes_cluster.aks.kube_config.0.client_key) client_certificate = base64decode(azurerm_kubernetes_cluster.aks.kube_config.0.client_certificate) cluster_ca_certificate = base64decode(azurerm_kubernetes_cluster.aks.kube_config.0.cluster_ca_certificate) }
With that configuration, Terraform is able to create a namespace. If I initiate terraform destroy, I get
Error: Get http://localhost/api/v1/namespaces/xxx: dial tcp 127.0.0.1:80: connect: connection refused
when terraform is refreshing the namespace resource.

Terraform v1.0.0
on windows_amd64

  • provider registry.terraform.io/hashicorp/azurerm v2.67.0
  • provider registry.terraform.io/hashicorp/kubernetes v2.3.2

@ahilmathew
Copy link

ahilmathew commented Jul 26, 2021

@Krebsmar I've hit the same issue now with AKS cluster. Did you manage to resolve this?

@schealex
Copy link

The workaround does not work for me sadly :/

@joshuaganger
Copy link

Seeing this same issue with the EKS module: https://github.com/terraform-aws-modules/terraform-aws-eks

Initial apply works fine, subsequent changes to the cluster fail with terraform attempting to connect to the k8s api on localhost. paralellism workaround has no effect.

@Rodrigonavarro23
Copy link

I've been using the Kubernetes provider version 2.4.1 and none of the above solutions works for me. My configuration use the gke_auth module to get the cluster configuration. Set the parallelism to 1 and avoid the use of kubeconfig and move to a lower version of the provider fixed the issue, now I'm using the version 2.3.2

My provider config:

`
required_providers {
google = {
source = "hashicorp/google"
version = ">=3.78.0"
}

kubernetes = {
  source  = "hashicorp/kubernetes"
  version = ">= 2.3.2"
}

}

provider "kubernetes" {
cluster_ca_certificate = module.gke_auth.cluster_ca_certificate
host = module.gke_auth.host
token = module.gke_auth.token
}
`

@marksumm
Copy link

For EKS, you can try to use an aws_eks_cluster data resource to ask AWS for the cluster endpoint details (based on the cluster name). That way, even if the original aws_eks_cluster resource attributes are empty, the correct information will still be obtained.

@seboss666
Copy link

I have the same problem when trying to import an existing namespace from a cluster :

Terraform v0.12.31
+ provider.google v4.14.0
+ provider.google-beta v4.14.0
+ provider.helm v2.2.0
+ provider.kubernetes v2.3.2
+ provider.random v3.2.0

The provider is configured with the credentials from another resource in another module and it's fine for the other modules and resources deployed. The namespace already contains installed software I can't delete/redeploy now as it's actively used by multiple agencies. And anyway, this action should not fail.

We approach two years old, and no one was able to track down the problem ?

@l1n
Copy link

l1n commented Jul 21, 2022

No idea what the issue is here, but heads up that kubectl proxy --port=80 is a reasonable workaround (see https://kubernetes.io/docs/tasks/extend-kubernetes/http-proxy-access-api/ for more details on what this does).

@frctnlss
Copy link

I am using GKE autopilot. I have a single deployment of resources. The provider is generated by a module:

provider "kubernetes" {
  host                   = module.kubernetes.provider_config.host
  token                  = module.kubernetes.provider_config.token
  cluster_ca_certificate = module.kubernetes.provider_config.cluster_ca_certificate
}

versions:

Terraform v1.1.8
on linux_amd64
+ provider registry.terraform.io/hashicorp/google v4.31.0
+ provider registry.terraform.io/hashicorp/google-beta v4.31.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.12.1

Just want to state that under normal usage, I do not experience any issues regarding the provider setup being incorrect.

I began having the issue of dial out as soon as I added a moved block directly related to Kubernetes resources. It failed to reload the state due to the same permission denied error on localhost as the OP. For me, there is little need to add the moved blocks, but I just wanted to be cleaner.

Of the solutions proposed above, the only one I did not try was kubectl proxy --port=80 as suggested by @l1n. This is a workaround that I am sure would work, but not one I personally wanted to entertain for an ad-hoc issue within CI. I did entertain the parallelism, but sadly that did not work either. I did suspect that the gke_auth module would work as I have used it in other projects to solve the OP issue as well. It did not work in this case either. I did not try the kubeconfig file and am confident that would have worked, but again I am not looking to add more complexity to the CI process. In my case, this would mean deploying terraform code twice. Not a very desirable use case.

Again, I am only experiencing this issue when adding the moved block. I was successful in getting consistent results. Meaning removing the moved block garnered a successful state refresh while adding it back in broke the state refresh resulting in the provider pointing to localhost.

This would be a major issue for teams running stateful set applications as the moved block help maintain currently deployed resources and allows teams to release refactor jobs repeatably.

@klagroix
Copy link

Just throwing another workaround in the mix for people who experience this issue.

I couldn't work around the Error: Get "http://localhost/api?timeout=32s": dial tcp [::1]:80: connect: connection refused issue with -parallelism=1 or even setting config_path and config_context as follows:

provider "kubernetes" {
  config_path     = "~/.kube/config"
  config_context  = data.aws_eks_cluster.main.arn
}

What has worked for me is exporting KUBE_CONFIG_PATH as an environment variable (export KUBE_CONFIG_PATH=~/.kube/config) and removing config_path from the provider settings:

provider "kubernetes" {
  config_context  = data.aws_eks_cluster.main.arn
}

NimJay added a commit to GoogleCloudPlatform/terraform-ecommerce-microservices-on-gke that referenced this issue Apr 17, 2023
This comment suspects that the use of multiple clusters caused the
"localhost" connection issue during TF refresh:
hashicorp/terraform-provider-kubernetes#1028 (comment)
NimJay added a commit to GoogleCloudPlatform/terraform-ecommerce-microservices-on-gke that referenced this issue Apr 18, 2023
The problem:
Because we're creating the cluster AND its K8s resources in the same
Terraform config, the K8s provider (e.g., during terraform destroy and
the 2nd terraform apply) will connect to localhost.
This comment provides a reasonable suspicion for root cause:
hashicorp/terraform-provider-kubernetes#1028 (comment)

The fix:
We make Terraform rewrite the kubernetes_provider.tf file with
hard-coded cluster credentials.

I've used "MyDelimiterWordForMultiLineString" as the delimiter for the
heredocs (multi-line string). I avoided the conventional "EOT" (end of
text) delimiter because it might show up in the cluster_ca_certificate
value.

This commit hopes to fix: hashicorp/terraform-provider-kubernetes#1028
@NimJay
Copy link

NimJay commented Apr 18, 2023

Justing sharing another (very hacky) workaround... :)

  • ❌ The -parallelism=1 workaround did not work for me. Regardless, it slows down my Terraform, so I want to avoid it.
  • ✅ The refresh=false workaround worked for me, but my specific situation doesn't allow it.
  • ✅ I'm sharing another workaround below that also worked for me: using Terraform to replace the provider "kubernetes" block with a block containing hard-coded credentials.

Workaround: Write file with hard-coded cluster credentials for provider "kubernetes"

  • I would avoid this workaround unless you're totally out of options.
  • See my commit here containing the workaround.
  • I placed my provider "kubernetes" block in a separate file called kubernetes_provider.tf.
provider "kubernetes" {
  host                   = "https://${google_container_cluster.my_cluster.endpoint}"
  cluster_ca_certificate = base64decode(google_container_cluster.my_cluster.master_auth[0].cluster_ca_certificate)
  token                  = data.google_client_config.default.access_token
}
  • I added a local_sensitive_file resource to my Terraform that will replace the above kubernetes_provider.tf file.
resource "local_sensitive_file" "kubernetes_provider" {
  content  = data.template_file.kubernetes_provider.rendered
  filename = "${path.module}/kubernetes_provider.tf"
}
  • I added a "template" file called kubernetes_provider.tf.template that my Terraform will use recreate kubernetes_provider.tf.
provider "kubernetes" {
  host                   = "${cluster_host}"
  token                  = "${cluster_token}"
  cluster_ca_certificate = <<MyDelimiterWordForMultiLineString
${cluster_ca_certificate}
MyDelimiterWordForMultiLineString
}
  • I used Terraform's template_file data feature to create the contents of the new kubernetes_provider.tf — hard-coding the token, host, and cluster_ca_certificate:
data "template_file" "kubernetes_provider" {
  template = file("${path.module}/kubernetes_provider.tf.template")
  vars = {
    cluster_host           = "https://${google_container_cluster.my_cluster.endpoint}"
    cluster_token          = data.google_client_config.default.access_token
    cluster_ca_certificate = base64decode(google_container_cluster.my_cluster.master_auth[0].cluster_ca_certificate)
  }
}

Cautions

  • I have only tested this workaround a few times.
  • I don't know if we need to hardcode all 3 things: token; host; and cluster_ca_certificate — I haven't tested each option individually.
  • Ideally, I would follow the advice from the kubernetes provider about "Stacking with managed ... cluster resources", and avoid using the same terraform apply for both cluster creation and Kubernetes resource management.
  • I am not sure how quickly the token expires. My testing confirms that it lasts at least 1 hour (for Google Kubernetes Engine).

Update (from 2023-04-25)

  • The token doesn't need to be hard-code for the above workaround.

NimJay added a commit to GoogleCloudPlatform/terraform-ecommerce-microservices-on-gke that referenced this issue Apr 20, 2023
This comment suspects that the use of multiple clusters caused the
"localhost" connection issue during TF refresh:
hashicorp/terraform-provider-kubernetes#1028 (comment)
NimJay added a commit to GoogleCloudPlatform/terraform-ecommerce-microservices-on-gke that referenced this issue Apr 20, 2023
The problem:
Because we're creating the cluster AND its K8s resources in the same
Terraform config, the K8s provider (e.g., during terraform destroy and
the 2nd terraform apply) will connect to localhost.
This comment provides a reasonable suspicion for root cause:
hashicorp/terraform-provider-kubernetes#1028 (comment)

The fix:
We make Terraform rewrite the kubernetes_provider.tf file with
hard-coded cluster credentials.

I've used "MyDelimiterWordForMultiLineString" as the delimiter for the
heredocs (multi-line string). I avoided the conventional "EOT" (end of
text) delimiter because it might show up in the cluster_ca_certificate
value.

This commit hopes to fix: hashicorp/terraform-provider-kubernetes#1028
@VRabadan
Copy link

VRabadan commented Sep 18, 2023

Ran into the same problem.
Status update calls localhost instead of the address specified on the .tf files.

This has worked before without issue; I'm getting bent around where the problem is :/
tf state file has all correct information, but terraform plan command seems unwiling or unable to use it :(

@antonguzun
Copy link

Just throwing another workaround in the mix for people who experience this issue.

I couldn't work around the Error: Get "http://localhost/api?timeout=32s": dial tcp [::1]:80: connect: connection refused issue with -parallelism=1 or even setting config_path and config_context as follows:

provider "kubernetes" {
  config_path     = "~/.kube/config"
  config_context  = data.aws_eks_cluster.main.arn
}

What has worked for me is exporting KUBE_CONFIG_PATH as an environment variable (export KUBE_CONFIG_PATH=~/.kube/config) and removing config_path from the provider settings:

provider "kubernetes" {
  config_context  = data.aws_eks_cluster.main.arn
}

I have the same problem with digitalocean k8s cluster.
The following works for me:

provider "kubernetes" {
  config_path     = "~/.kube/config"
}

It is interesting that I do not have that problem if Terraform use manifest files like that:

data "kubectl_path_documents" "docs" {
  pattern = "./manifests/*.yaml"
}

resource "kubectl_manifest" "kubegres" {
  for_each  = toset(data.kubectl_path_documents.docs.documents)
  yaml_body = each.value
}

@Hitobat
Copy link

Hitobat commented Nov 24, 2023

I've found the following works for me.
https://registry.terraform.io/providers/hashicorp/google/latest/docs/guides/using_gke_with_terraform

My config looks something like the following.
I know this likely won't work if I bring up the module from scratch, since the data cluster lookup would fail. But it's enough that I can update resources again properly. Terraform plan/apply both now work without giving the "localhost" connect error.

locals {
   gke_name = "dev-cluster"
   gke_location = "us-central1"
}

resource "google_container_cluster" "dev_cluster" {
  name     = local.gke_name
  location = local.gke_location
  ...
}

data "google_client_config" "provider" {}

data "google_container_cluster" "dev_cluster" {
  name     = local.gke_name
  location = local.gke_location
}

provider "kubernetes" {
  host  = "https://${data.google_container_cluster.dev_cluster.endpoint}"
  token = data.google_client_config.provider.access_token
  cluster_ca_certificate = base64decode(
    data.google_container_cluster.dev_cluster.master_auth[0].cluster_ca_certificate,
  )
}

@simonebenati
Copy link

Do we have any other workarounds? Running into this on version 2.23.0.

Even if I specify the env variable KUBE_CONFIG_PATH=path/to/kubeconfig (of course the path is just dummy in this case)

Still on the first try it runs against localhost.

@e12e
Copy link

e12e commented Feb 23, 2024

Just hit this bug working with managed kubernetes cluster on UpCloud.

In this case, the upcloud cli can provide a kubectl config:

upctl kubernetes config "<uuid of cluster>" --write pilot_kubeconfig.yaml

Which is needed for both workarunds.

Note that neither of the following are really work arounds - one can't get the connection information from Terraform, which is the real intent of the config using data-sources as commented out below.

option one - changing hcl:

provider "kubernetes" {
  # Does not work due to bug:
  #
  # host                   = data.upcloud_kubernetes_cluster.pilot.host
  # client_certificate     = data.upcloud_kubernetes_cluster.pilot.client_certificate
  # client_key             = data.upcloud_kubernetes_cluster.pilot.client_key
  # cluster_ca_certificate = data.upcloud_kubernetes_cluster.pilot.cluster_ca_certificate

  config_path              = "path/pilot_kubeconfig.yaml"
}

and option two - proxy (need kubectl config for access):

kubectl proxy --port=8080
sudo kubectl proxy --port=80

I'm on MacOS - it might be possible to eschew sudo for packet filter hackery - but I haven't tried - and anyway it would effectively allow anything to bind to port 80 (for Linux see setcap, eg: sudo setcap 'cap_net_bind_service=+ep' kubectl).

The config for pf is stored in /etc/pf.conf - I believe that you can simply add rules to this file and reload pf for them to take effect.

# Not tested!
echo "rdr pass inet proto tcp from any to any port 80 -> 127.0.0.1 port 8080" sudo tee /etc/pf.conf
sudo pfctl -F all -ef /etc/pf.conf

@shinebayar-g
Copy link

I've found the following works for me. https://registry.terraform.io/providers/hashicorp/google/latest/docs/guides/using_gke_with_terraform

My config looks something like the following. I know this likely won't work if I bring up the module from scratch, since the data cluster lookup would fail. But it's enough that I can update resources again properly. Terraform plan/apply both now work without giving the "localhost" connect error.

locals {
   gke_name = "dev-cluster"
   gke_location = "us-central1"
}

resource "google_container_cluster" "dev_cluster" {
  name     = local.gke_name
  location = local.gke_location
  ...
}

data "google_client_config" "provider" {}

data "google_container_cluster" "dev_cluster" {
  name     = local.gke_name
  location = local.gke_location
}

provider "kubernetes" {
  host  = "https://${data.google_container_cluster.dev_cluster.endpoint}"
  token = data.google_client_config.provider.access_token
  cluster_ca_certificate = base64decode(
    data.google_container_cluster.dev_cluster.master_auth[0].cluster_ca_certificate,
  )
}

This approach fails when I want to update GKE cluster resource (like minMasterVersion), then all the kubernetes_ resources fail with dial tcp 127.0.0.1:80: connect: connection refused error.

@WoodenMaiden
Copy link

I stumbled on this issue today as well. Using the KUBE_CONFIG_PATH env variable and deleting the config_path arguments on hashicorp/helm hashicorp/kubernetes and gavinbunney/kubectl worked for some reason on a self hosted OKD cluster

@papanito
Copy link

-refresh=false seems to skip the error. But I suspect it is related to an inconsistent state. I had an aks cluster deleted outside of terraform. So I also had to remove the config of the aks cluster from the state...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests