Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provider doesn't work without kubecontext in some clusters #645

Closed
StepanKuksenko opened this issue Oct 11, 2019 · 5 comments
Closed

Provider doesn't work without kubecontext in some clusters #645

StepanKuksenko opened this issue Oct 11, 2019 · 5 comments
Labels
acknowledged Issue has undergone initial review and is in our work queue. bug needs investigation

Comments

@StepanKuksenko
Copy link

StepanKuksenko commented Oct 11, 2019

Hi there,

We have several kubernetes clusters in GKE, the problem is reproduced only in one of the clusters. Although in fact there is no difference in setting up kubernetes providers in all clusters.

The bottom line is that we wanted to configure the creation of resources in a kubernetes cluster without kubecontext, but we succeeded in all clusters except one.

Terraform Version

Terraform v0.12.9

Affected Resource(s)

unknown

Terraform Configuration Files

# collect data from GKE clusters
data "google_client_config" "default" {}

data "google_container_cluster" "cluster1" {
  name       = "${var.cluster1_kube_master["name"]}"
  location   = "${var.zone}"
}

data "google_container_cluster" "cluster2" {
  name       = "${var.cluster2_kube_master["name"]}"
  location   = "${var.zone}"
}

data "google_container_cluster" "cluster3" {
  name       = "${var.cluster3_kube_master["name"]}"
  location   = "${var.zone}"
}


# connect to kubernates clusters

provider "kubernetes" {
  version = "~> 1.8.1"
  load_config_file = false

  alias = "cluster1"

  host  = "https://${data.google_container_cluster.cluster1.endpoint}"
  token = "${data.google_client_config.default.access_token}"

  cluster_ca_certificate = "${base64decode(data.google_container_cluster.cluster1.master_auth.0.cluster_ca_certificate)}"
}

provider "kubernetes" {
  version = "~> 1.8.1"
  load_config_file = false

  alias = "cluster2"

  host  = "https://${data.google_container_cluster.cluster2.endpoint}"
  token = "${data.google_client_config.default.access_token}"

  cluster_ca_certificate = "${base64decode(data.google_container_cluster.cluster2.master_auth.0.cluster_ca_certificate)}"
}

provider "kubernetes" {
  version = "~> 1.8.1"
  load_config_file = false

  alias = "cluster3"

  host  = "https://${data.google_container_cluster.cluster3.endpoint}"
  token = "${data.google_client_config.default.access_token}"

  cluster_ca_certificate = "${base64decode(data.google_container_cluster.cluster3.master_auth.0.cluster_ca_certificate)}"
}

# create secrets

resource "kubernetes_secret" "cluster1-key" {
  metadata {
    name = "git-ssh"
  }
  provider = kubernetes.cluster1

  data = {
    id_rsa = "${data.vault_generic_secret.key.data["value"]}"
  }

  type = "kubernetes.io/Opaque"
}

resource "kubernetes_secret" "cluster2-key" {
  metadata {
    name = "git-ssh"
  }
  provider = kubernetes.cluster2

  data = {
    id_rsa = "${data.vault_generic_secret.key.data["value"]}"
  }

  type = "kubernetes.io/Opaque"
}

resource "kubernetes_secret" "cluster3-key" {
  metadata {
    name = "git-ssh"
  }
  provider = kubernetes.cluster3

  data = {
    id_rsa = "${data.vault_generic_secret.key.data["value"]}"
  }

  type = "kubernetes.io/Opaque"
}

Debug Output

i can't provide full debug output because there are sensitive information.
Maybe you will say me what pieces need to check, i will try to do that.

Panic Output

Error: Get http://localhost/api/v1/namespaces/default/secrets/key: dial tcp [::1]:80: connect: connection refused
Error: Get http://localhost/api/v1/namespaces/default/secrets/key: dial tcp [::1]:80: connect: connection refused�
Error: Get http://localhost/api/v1/namespaces/default/secrets/key: dial tcp [::1]:80: connect: connection refused�

Expected Behavior

Plan: 0 to add, 0 to change, 0 to destroy.

Actual Behavior

Error: Get http://localhost/api/v1/namespaces/default/secrets/key: dial tcp [::1]:80: connect: connection refused
Error: Get http://localhost/api/v1/namespaces/default/secrets/key: dial tcp [::1]:80: connect: connection refused�
Error: Get http://localhost/api/v1/namespaces/default/secrets/key: dial tcp [::1]:80: connect: connection refused�

Steps to Reproduce

  1. Create resource "kubernetes_secret" in multiple kubernetes clusters using aliases.
    This we did before, the configuration file was exactly the same as I presented above, except for the option load_config_file = false. Successful creation of resources occurs only if the kubecontext is configured to connect to ANY cluster in one Google project.
    If the kubecontext is configured to connect to another project, an error occurred Error: Unauthorized.
  2. Now we added to each kubernetes provider option load_config_file = false to eliminate the use of kubecontext. In another clusters it works, except one where we got an error.
  3. Try to apply terraform plan
  4. Got an error (only in one cluster, on another clusters everything is ok - all works without kubecontext)

Important Factoids

References

@mnothic
Copy link

mnothic commented Mar 2, 2020

if you downgrade the kubernetes provider to 1.9.0 you solve the problem cuz it's the provider >= 1.10.0

@aareet
Copy link
Contributor

aareet commented May 6, 2020

@aareet aareet added the acknowledged Issue has undergone initial review and is in our work queue. label May 27, 2020
@aareet aareet added the bug label Jul 2, 2020
@dak1n1
Copy link
Contributor

dak1n1 commented Nov 5, 2020

Related note: here's a test I did today with static configuration of a GKE cluster. There's a chance a variable like KUBE_HOST could be interfering with the provider config. #1037 (comment)

Offhand, in this specific issue, I would suspect a problem with the token = "${data.google_client_config.default.access_token}", which appears to be used in all 3 clusters. That's the part that needs further testing to see if the same token can be used across multiple clusters. It depends on if the Google Cloud API is placing that token on each cluster as a service account token.

I do see this usage listed in the google provider's docs, but I'm not sure if that is the right token to actually use here.

The token returned by google_client_config seems to be a Google Cloud API Token, which is different than the Kubernetes service account token that the Kubernetes provider is expecting. (Again, this does need verification because I have not actually tested the google_client_config on GKE clusters).

The example I linked above might provide a work-around in the meantime.

@dak1n1
Copy link
Contributor

dak1n1 commented Mar 10, 2021

There have been some changes to authentication in version 2.0.2, and some incoming changes to fix #1179 will help in identifying the configuration that is causing this issue. Given that this issue is fairly old, and that we haven't seen any activity by the original poster, I'm thinking we should close it for now. But here is some information that might help:

Most likely, one of the data sources became unknown during the plan phase. When either data.google_client_config* or data.google_container_cluster* are unknown during plan, the provider will initialize using empty credentials, which causes the Error: Get http://localhost/... errors. It's unfortunately a common problem when you have a single apply that modifies an underlying GKE cluster while there are Kubernetes resources defined on it. For that reason, we recommend using two applies where possible, or at least separating the GKE resources from the Kubernetes resources using separate modules.

We do have an example for GKE which demonstrates separating the two into modules. It might provide some guidance about how to best approach these problems, depending on if you're replacing the cluster or making other modifications to it.

I'm going to close this issue for now, since it is something we're tracking in many other issues and upstream. It should resolve with #1179. But feel free to reopen if I've misunderstood the problem or if it's still ongoing even with the GKE infrastructure separated out from the Kubernetes infrastructure.

@dak1n1 dak1n1 closed this as completed Mar 10, 2021
@ghost
Copy link

ghost commented Apr 9, 2021

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked as resolved and limited conversation to collaborators Apr 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
acknowledged Issue has undergone initial review and is in our work queue. bug needs investigation
Projects
None yet
Development

No branches or pull requests

4 participants