Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubernetes provider does not respect local config, can operate on other clusters #713

Closed
1 task done
cdaniluk opened this issue Jan 26, 2020 · 12 comments · Fixed by #784
Closed
1 task done

kubernetes provider does not respect local config, can operate on other clusters #713

cdaniluk opened this issue Jan 26, 2020 · 12 comments · Fixed by #784

Comments

@cdaniluk
Copy link

I have issues

The k8s provider does not seem to reliably work when load_config_file = false, as it would be when using this module to create a new cluster. I frequently see Unauthorized and/or attempts to call the endpoint on localhost. In one execution, I actually found that this module deleted the aws-auth config map from a cluster that was defined as the default context in my kubeconfig but was not in any way related to my terraform run.

I'm submitting a...

  • bug report

What is the current behavior?

Without defining your cluster in .kubeconfig, the provider cannot reliably be configured. This seems to make sense based on a bunch of bugs in the terraform-provider-kubernetes project, most specifically:

hashicorp/terraform-provider-kubernetes#521

Others seem to confirm tons of bugs when setting load_config_file = false. This one seemed most relevant since it also pointed toward long standing issues with terraform itself using interpolated values to configure a provider:

hashicorp/terraform#4149

In my execution where it operated on a different cluster, the deletion of the config map was successful, but it attempted to apply the config map back to an endpoint listening on localhost. In other executions, the deletion attempt ran against localhost. This suggests that there is a timing issue where terraform itself has deferred configuring the provider concurrently with the attempt to manage the config map in this module.

If this is a bug, how to reproduce? Please include a code sample if relevant.

I failed to create a cluster using the provided example, unless after cluster creation, I then added the cluster config to my local kubeconfig.

What's the expected behavior?

Regardless of any upstream issues with terraform and the k8s provider, this module should never operate on a cluster it didn't define. Defining a local kubeconfig file and pointing the provider to that may be the best option.

Are you able to fix this problem and submit a PR? Link here if you have already.

I'm not sure what the best fix is here. Given the various bugs in GitHub for this, I personally feel like the only workaround here is to set manage_aws_auth = false always until the upstream provider addresses these issues.

Environment details

  • Affected module version: 8.0.0
  • OS: macOS
  • Terraform version: 0.12.20

Any other relevant info

@max-rocket-internet
Copy link
Contributor

attempts to call the endpoint on localhost

I've seen this also.

@barryib
Copy link
Member

barryib commented Jan 27, 2020

@cdaniluk

@cdaniluk
Copy link
Author

  • Which version of kubernetes provider are you using ?
    v1.10.0_x4
  • Do you have env variables which conflict with the provider or kubernetes go-client (KUBERNETES_xxx) ?
$ set |grep KUBE
$

I think this would address one of the cases I've seen (Unauthorized). But not the calls to the localhost endpoint. I see Unauthorized on subsequent calls after the cluster is provisioned but calls to localhost when provisioning a new cluster.

  • If not, can you share you provider definition and debug output.

Here's my provider config.. literally using straight from the example:

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.cluster.token
  load_config_file       = false
  version                = "~> 1.10"
}

Let me know what debug output you would like to see.

Also fwiw I figured out why the provider was talking to another cluster entirely (sort of). I imported the config map from my cluster with the provider config above. It used my default kubeconfig context, which at the time was another cluster. Thus when this module went to delete the config map to recreate it (which in and of itself is scary!), it deleted it from the original context, then attempted to recreate in the undefined / localhost context.

tbh given the handful of bugs open in the k8s provider, I think the ideal fix would be to support exporting the config map to a file as in previous releases for those of us who are scared at the thought of directly managing a resource that can permanently revoke your access to the cluster. I'm running manage_aws_auth = false right now, but that means I have to hand generate the config maps for new clusters. I'd be happy to submit a PR along those lines.

@barryib
Copy link
Member

barryib commented Jan 28, 2020

I think this would address one of the cases I've seen (Unauthorized). But not the calls to the localhost endpoint. I see Unauthorized on subsequent calls after the cluster is provisioned but calls to localhost when provisioning a new cluster.

Did you try it ?

Let me know what debug output you would like to see.

The kubernetes provider output.

@bacchuswng
Copy link

bacchuswng commented Jan 31, 2020

@cdaniluk (cc @max-rocket-internet) I just posted on the thread you linked to. I was running into a similar problem: either Terraform would complain about a missing kubeconfig file or I would accidentally trigger updates on other clusters (because my KUBECONFIG environment variable was being used despite explicitly setting up a Terraform kubernetes provider).

I found out later that it was actually the helm provider that I had not explicitly set up that was causing all the problems. Because I didn't set up my helm provider with the appropriate Kubernetes settings, helm would complain that it couldn't load the default ~/.kube/config file, and when I happen to have KUBECONFIG set up, it would use that to spin up new pods.

If you are also using helm, you might want to give that a shot.

Best of luck!

@cdaniluk
Copy link
Author

Using helm.. but not with tf and have not set up a helm provider. This chart doesn't seem to do so implicitly.

@max-rocket-internet I need to set up an environment I can safely test this in and haven't had a chance to do so yet. Will try to over the weekend.

@nick4fake
Copy link

I have the same issue, no KUBE_ env variables.

@alexsomesan
Copy link

@cdaniluk if you have the EKS cluster resource being created or updated in the same apply operation as the Kubernetes provider, things won't work as you expect. This is due to a an issue in Terraform itself.

Please see here https://www.terraform.io/docs/providers/kubernetes/index.html#stacking-with-managed-kubernetes-cluster-resources and the TF docs link in that paragraph.

@cdaniluk
Copy link
Author

cdaniluk commented Mar 2, 2020

@cdaniluk if you have the EKS cluster resource being created or updated in the same apply operation as the Kubernetes provider, things won't work as you expect. This is due to a an issue in Terraform itself.

I'm aware of the limitation, which is why it's all the more confusing that this module is basically making it impossible to bootstrap a cluster, all in the name of loading a simple configmap that is easily loaded by hand. In previous versions, you could use a null provider to script injecting the config map, and it all worked just fine. Now, not only does the k8s provider behave inconsistently (as indicated in like 50 open issues in that repo, some of which are due to tf and some of which are due to the provider itself), but we can't bootstrap a new cluster. The old version of the module allowed this.

I really think adding a flag to dump the config map to filesystem instead of trying to load it via provider would be ideal. I don't think the tf issues are going to go away anytime soon, as any issues open for dynamic provider configs, interpolation, etc have been open forever and aren't on any roadmap.

@cmrust
Copy link

cmrust commented Mar 13, 2020

In case it's of use to anyone, I'm using the following to generate the aws-auth configmap in conjunction with manage_aws_auth = false:

worker_iam_role="$(terraform state pull | jq -r '.resources[] | select(.type=="aws_iam_role") | select(.name=="workers") | .instances[0].attributes.arn')"

cat << YAML > aws-auth-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapRoles: |
    - rolearn: $worker_iam_role
      username: system:node:{{EC2PrivateDNSName}}
      groups:
        - system:bootstrappers
        - system:nodes
YAML

@barryib
Copy link
Member

barryib commented Mar 13, 2020

@cdaniluk @cmrust sounds like the version 1.11.1 solves this problem. Can you confirm please ?

See also #784

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants