Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2.0.1: Resources cannot be created. Does kubectl refference to kube config properly? #1127

Closed
tantweiler opened this issue Jan 22, 2021 · 15 comments · Fixed by #1132
Closed

Comments

@tantweiler
Copy link

Terraform version: v0.14.4
Kubernetes provider version: v2.0.1
Helm provider version: v1.3.2

Steps to Reproduce

I use a GitLab pipeline to deploy helm charts on my Kubernetes cluster by using the helm terraform provider.

provider "helm" {
  kubernetes {
    config_path = "~/.kube/config"
  }
}

Since version v2.0.1 of the Kubernetes provider the Helm provider is not able to access to the kube config file properly. The error message looks like:

module.helm.helm_release.nginx-ingress-internal: Creating...
Error: configmaps is forbidden: User "system:serviceaccount:gitlab-prod:default" cannot create resource "configmaps" in API group "" in the namespace "nginx-ingress"
Error: namespaces is forbidden: User "system:serviceaccount:gitlab-prod:default" cannot create resource "namespaces" in API group "" at the cluster scope
Error: namespaces is forbidden: User "system:serviceaccount:gitlab-prod:default" cannot create resource "namespaces" in API group "" at the cluster scope
Error: namespaces is forbidden: User "system:serviceaccount:gitlab-prod:default" cannot create resource "namespaces" in API group "" at the cluster scope
Error: namespaces is forbidden: User "system:serviceaccount:gitlab-prod:default" cannot create resource "namespaces" in API group "" at the cluster scope
Error: namespaces is forbidden: User "system:serviceaccount:gitlab-prod:default" cannot create resource "namespaces" in API group "" at the cluster scope

The reason why I use Helm provider v1.3.2 is described in this bug report:
hashicorp/terraform-provider-helm#662

Temporary solution

Revert back to version v1.13.3

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@tantweiler tantweiler added the bug label Jan 22, 2021
@aareet
Copy link
Contributor

aareet commented Jan 22, 2021

@tantweiler could you share your whole config and a trace log (https://www.terraform.io/docs/internals/debugging.html)? The error message does not seem to be related to a credential error

@dak1n1
Copy link
Contributor

dak1n1 commented Jan 22, 2021

Offhand, this looks related to RBAC rules in the cluster (which may have been installed by the helm chart). This command might help diagnose the permissions issues relating to the service account in the error message.

$ kubectl auth can-i create namespace --as=system:serviceaccount:gitlab-prod:default
$ kubectl auth can-i --list --as=system:serviceaccount:gitlab-prod:default

You might be able to compare that list with other users on the cluster:

kubectl auth can-i --list --namespace=default --as=system:serviceaccount:default:default
$ kubectl auth can-i create configmaps
yes

$ kubectl auth can-i create configmaps --namespace=nginx-ingress --as=system:serviceaccount:gitlab-prod:default
no

And investigate related clusterroles:

$ kube describe clusterrolebinding system:basic-user
Name:         system:basic-user
Labels:       kubernetes.io/bootstrapping=rbac-defaults
Annotations:  rbac.authorization.kubernetes.io/autoupdate: true
Role:
  Kind:  ClusterRole
  Name:  system:basic-user
Subjects:
  Kind   Name                  Namespace
  ----   ----                  ---------
  Group  system:authenticated


$ kubectl describe clusterrole system:basic-user
Name:         system:basic-user
Labels:       kubernetes.io/bootstrapping=rbac-defaults
Annotations:  rbac.authorization.kubernetes.io/autoupdate: true
PolicyRule:
  Resources                                      Non-Resource URLs  Resource Names  Verbs
  ---------                                      -----------------  --------------  -----
  selfsubjectaccessreviews.authorization.k8s.io  []                 []              [create]
  selfsubjectrulesreviews.authorization.k8s.io   []                 []              [create]

My guess is that the chart or Terraform config in question is responsible for creating the service account, and the [cluster] roles and rolebindings, but it might be doing so in the wrong order, or not idempotently (so you get different results on re-install vs the initial install). But we would need to see a configuration that reproduces this error. In my testing of version 2 of the providers on AKS, EKS, GKE, and minikube, I haven't seen this issue come up.

Feel free to browse these working examples of building specific clusters and using them with Kubernetes and Helm providers. Giving the config a skim might give you some ideas for troubleshooting further.

@elpapi42
Copy link

I have the same error, today suddenly all the CD pipelines to my Kubernetes cluster stopped working

@ghost ghost removed waiting-response labels Jan 22, 2021
@alon-dotan-starkware
Copy link

alon-dotan-starkware commented Jan 23, 2021

same here, at my case its look like the provider failed to read the kubeconfig file and use the proper context

@jrhouston
Copy link
Collaborator

jrhouston commented Jan 23, 2021

@alon-dotan-starkware @tantweiler @elpapi42 can you share some info about your environment and how your cluster is being provisioned and how your kubeconfig is generated so we can try and reproduce this?

AFAIK we didn't change anything about the way the kubeconfig gets loaded, just that you have to explicitly specify the path to the file now.

@jrhouston
Copy link
Collaborator

At second glance, this error looks like it is trying to use the default service account. I see this error when I run terraform inside a pod that doesn't have a service account associated with it. When I assign a serviceaccount with the correct permissions then I don't get the error anymore.

Are you running terraform inside a Kubernetes pod but intending to use a config file inside of the container instead of the serviceaccount token?

@tantweiler
Copy link
Author

Hello everyone,

Let me explain in a bit more detail what I'm doing here. We run a GitLab instance within a GKE cluster. I created a pipeline in GitLab that deploys cloud infrastructure on different hyper-scalers (GCP and Azure). For authenticating against each hyper-scaler and to be able to install any kind of infrastructure component, we use service accounts (GKE) or service principals (Azure) with administrative rights. Let's have look on the helm part of the pipeline:

test_helmplan:
  stage: test_helmplan
  only:
    - master
  artifacts:
    paths:
      - ${HELM_PATH_TEST}/planfilehelm
      - ${HELM_PATH_TEST}/.terraform
    expire_in: 5 hrs 
  script:
    - export TF_VAR_subscription_id_test=$SUBSCRIPTION_ID_TEST
    - export TF_VAR_client_id_test=$CLIENT_ID_TEST
    - export TF_VAR_client_secret_test=$CLIENT_SECRET_TEST
    - export TF_VAR_tenant_id=$TENANT_ID
    - cd ${HELM_PATH_TEST}
    - echo ${GCP_BUCKET} > devops-tf-bucket.json
    - az login --service-principal -u ${CLIENT_ID_TEST} -p ${CLIENT_SECRET_TEST} --tenant ${TENANT_ID}
    - az aks get-credentials --resource-group ${RG_NAME_TEST} --name ${AKS_CLUSTER_NAME_TEST}
    - sed -i "s~_CI_PROJECT_PATH_~$CI_PROJECT_PATH~g" main.tf
    - terraform init
    - terraform plan -out="planfilehelm"

test_helmdeploy:
  stage: test_helmdeploy
  only:
    - master
  environment:
    name: ${AKS_CLUSTER_NAME_TEST}-environment
    on_stop: test_destroy
  script:
    - cd ${HELM_PATH_TEST}
    - echo ${GCP_BUCKET} > devops-tf-bucket.json
    - az login --service-principal -u ${CLIENT_ID_TEST} -p ${CLIENT_SECRET_TEST} --tenant ${TENANT_ID}
    - az aks get-credentials --resource-group ${RG_NAME_TEST} --name ${AKS_CLUSTER_NAME_TEST}
    - terraform init
    - terraform apply -input=false -auto-approve "planfilehelm"

This is my provider section in the main.tf file now where I point to kubernetes version 1.13.3 (and also to helm v1.3.2 which has a state issue which I mentioned in my first comment) since I don't run into this issue with that version:

provider "helm" {
  kubernetes {
    config_path = "~/.kube/config"
  }
}

provider "kubernetes" {}

terraform {
  backend "gcs" {
    bucket  = "my-awesome-bucket"
    prefix  = "_CI_PROJECT_PATH_-helm-test"
    credentials = "devops-tf-bucket.json"
  }
  required_providers {
    helm = {
      version = "= 1.3.2"
    }
    kubernetes = {
      version = "= 1.13.3"
    }
  }
}

So some guys here mentioned that there might be a an issue with not enough rights or an RBAC issue. Again, the CLIENT_ID and the CLIENT_SECRET that we use to authenticate against the Azure cloud has administrative rights! With provider version v1.13.3 everything is working fine but with v2.0.1 something has changed.

@alon-dotan-starkware
Copy link

alon-dotan-starkware commented Jan 24, 2021

@jrhouston
here are a bit more details,
we using TF to deploy mixed resources (helm, native k8s) to local and remote k8s clusters, we using kubectx and kubes to switch the contexts,
here is example of helm and k8s deployment:

    "resource": {
        "kubernetes_role_binding": {
            "aerospike": {
                "metadata": {
                    "name": "aerospike",
                    "namespace": "${var.namespace}"
                },
                "role_ref": {
                    "api_group": "rbac.authorization.k8s.io",
                    "kind": "Role",
                    "name": "aerospike"
                },
                "subject": [
                    {
                        "kind": "ServiceAccount",
                        "name": "aerospike",
                        "namespace": "${var.namespace}"
                    }
                ]
            }
        }
    }
}

helm chart:

{
    "resource": {
        "helm_release": {
            "aerospike": {
                "provider": "helm",
                "chart": "aerospike",
                "name": "aerospike",
                "namespace": "${var.namespace}",
                "repository": "s3://xxxxxxxxx/helm-repo/charts",
                "set": {
                    "name": "namespace",
                    "type": "string",
                    "value": "${var.namespace}"
                },
                "values": [
                    "${file(\"${path.module}/files/values.yaml\")}"
                ],
                "version": "5.1.0"
            }
        }
    }
}

providers.tf.json:

{
    "provider": {
        "helm": {
            "version": "2.0.0",
            "kubernetes": {
               "config_path": "~/.kube/config"
            }
        }
    },
}

with provider version > 1.13.0 I got the following error:

Error: Get "http://localhost/apis/rbac.authorization.k8s.io/v1/namespaces/alon/rolebindings/aerospike": dial tcp [::1]:80: connect: connection refused

which looks like the k8s provider cant identify the right context and cluster config from ~/.kube/config file

@tantweiler
Copy link
Author

@aareet I uploaded two logfiles for each Kubernetes provider version to paste.in.

Here is the output for v2.0.1 which does not work:

https://paste.in/SuWloh

And here is the output for v1.13.3 which does work:

https://paste.in/9MjaPm

@jrhouston
Copy link
Collaborator

@tantweiler In your example I see your provider kubernetes block is empty, but your provider helm block has a config_path set. You need to set it in both provider blocks as both providers need to know the path to the kubeconfig. Did you try that?

@tantweiler
Copy link
Author

@jrhouston holy moly! That did the trick! I always thought that the config path only had to be defined for the helm provider which uses kubernetes but kubernetes itself uses the default which is ~/.kube/config. In my pipeline I use the kubernetes provider to install the namespaces first and then the helm releases. But the job already crashed at the point where it tried to created those namespaces. But then there was a change in v2.0.1 somehow that the provider does not look into the default kube config file anymore. v1.13.3 does this for sure.

From now on I will definitely define the config path for the kubernetes provider as well!

@tantweiler
Copy link
Author

@jrhouston you said you didn't change "anything about the way the kubeconfig gets loaded". But the changelog says something different:

2.0.0 (January 21, 2021)

BREAKING CHANGES:

Remove default of ~/.kube/config for config_path (#1052)

Honestly I don't understand that. ~/.kube/config is the standard! So why removing a standard that everyone is actually using?

@aareet
Copy link
Contributor

aareet commented Jan 25, 2021

@tantweiler we discuss it in the upgrade guide - one of the reasons was that it was causing confusion for folks who manage multiple clusters with Terraform

@jrhouston
Copy link
Collaborator

jrhouston commented Jan 25, 2021

@jrhouston you said you didn't change "anything about the way the kubeconfig gets loaded". But the changelog says something different

I worded this poorly, sorry for the confusion! We changed how you configure the path to the config file in the provider block (i.e you have to set it or use the KUBE_CONFIG_PATH environment variable now) but we didn't change anything about how the provider reads the file and gets contexts and so on– we continue to defer to client-go's loader for that.

Honestly I don't understand that. ~/.kube/config is the standard! So why removing a standard that everyone is actually using?

I responded to another user with this question on the helm provider with some backstory here: hashicorp/terraform-provider-helm#647 (comment)

We also talk about this in the Upgrade Guide here: https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/guides/v2-upgrade-guide#changes-in-v200

And we made an issue soliciting community reactions about these changes here: #909

If you feel strongly about this change please open a new issue advocating to change it back and we can discuss it!

tl;dr is that there was a set of users who would get caught out by the implicit default of ~/.kube/config and using KUBECONFIG and have their terraform config be applied to the wrong cluster when they were managing multiple environments.

I see what's happened here though, is that because you run terraform inside Kubernetes but didn't supply a path to a config file the loader has defaulted to using the in-cluster config. Perhaps this is an argument for adding an in_cluster attribute to make that explicit too.

@ghost
Copy link

ghost commented Apr 9, 2021

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked as resolved and limited conversation to collaborators Apr 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants