kubernetes provider does not respect local config, can operate on other clusters #713

cdaniluk · 2020-01-26T18:32:23Z

I have issues

The k8s provider does not seem to reliably work when load_config_file = false, as it would be when using this module to create a new cluster. I frequently see Unauthorized and/or attempts to call the endpoint on localhost. In one execution, I actually found that this module deleted the aws-auth config map from a cluster that was defined as the default context in my kubeconfig but was not in any way related to my terraform run.

I'm submitting a...

bug report

What is the current behavior?

Without defining your cluster in .kubeconfig, the provider cannot reliably be configured. This seems to make sense based on a bunch of bugs in the terraform-provider-kubernetes project, most specifically:

hashicorp/terraform-provider-kubernetes#521

Others seem to confirm tons of bugs when setting load_config_file = false. This one seemed most relevant since it also pointed toward long standing issues with terraform itself using interpolated values to configure a provider:

hashicorp/terraform#4149

In my execution where it operated on a different cluster, the deletion of the config map was successful, but it attempted to apply the config map back to an endpoint listening on localhost. In other executions, the deletion attempt ran against localhost. This suggests that there is a timing issue where terraform itself has deferred configuring the provider concurrently with the attempt to manage the config map in this module.

If this is a bug, how to reproduce? Please include a code sample if relevant.

I failed to create a cluster using the provided example, unless after cluster creation, I then added the cluster config to my local kubeconfig.

What's the expected behavior?

Regardless of any upstream issues with terraform and the k8s provider, this module should never operate on a cluster it didn't define. Defining a local kubeconfig file and pointing the provider to that may be the best option.

Are you able to fix this problem and submit a PR? Link here if you have already.

I'm not sure what the best fix is here. Given the various bugs in GitHub for this, I personally feel like the only workaround here is to set manage_aws_auth = false always until the upstream provider addresses these issues.

Environment details

Affected module version: 8.0.0
OS: macOS
Terraform version: 0.12.20

Any other relevant info

The text was updated successfully, but these errors were encountered:

max-rocket-internet · 2020-01-27T12:23:55Z

attempts to call the endpoint on localhost

I've seen this also.

barryib · 2020-01-27T22:57:57Z

@cdaniluk

Which version of kubernetes provider are you using ?
Do you have env variables which conflict with the provider or kubernetes go-client (KUBERNETES_xxx) ?
Are you running your terraform plan and apply from another kubernetes cluster ? If yes, this workaround would probably help Unauthorized after update to v1.10 using token auth hashicorp/terraform-provider-kubernetes#679 (comment)
If not, can you share you provider definition and debug output.

cdaniluk · 2020-01-28T00:34:56Z

Which version of kubernetes provider are you using ?
v1.10.0_x4
Do you have env variables which conflict with the provider or kubernetes go-client (KUBERNETES_xxx) ?

$ set |grep KUBE
$

Are you running your terraform plan and apply from another kubernetes cluster ? If yes, this workaround would probably help Unauthorized after update to v1.10 using token auth hashicorp/terraform-provider-kubernetes#679 (comment)

I think this would address one of the cases I've seen (Unauthorized). But not the calls to the localhost endpoint. I see Unauthorized on subsequent calls after the cluster is provisioned but calls to localhost when provisioning a new cluster.

If not, can you share you provider definition and debug output.

Here's my provider config.. literally using straight from the example:

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.cluster.token
  load_config_file       = false
  version                = "~> 1.10"
}

Let me know what debug output you would like to see.

Also fwiw I figured out why the provider was talking to another cluster entirely (sort of). I imported the config map from my cluster with the provider config above. It used my default kubeconfig context, which at the time was another cluster. Thus when this module went to delete the config map to recreate it (which in and of itself is scary!), it deleted it from the original context, then attempted to recreate in the undefined / localhost context.

tbh given the handful of bugs open in the k8s provider, I think the ideal fix would be to support exporting the config map to a file as in previous releases for those of us who are scared at the thought of directly managing a resource that can permanently revoke your access to the cluster. I'm running manage_aws_auth = false right now, but that means I have to hand generate the config maps for new clusters. I'd be happy to submit a PR along those lines.

barryib · 2020-01-28T01:12:00Z

I think this would address one of the cases I've seen (Unauthorized). But not the calls to the localhost endpoint. I see Unauthorized on subsequent calls after the cluster is provisioned but calls to localhost when provisioning a new cluster.

Did you try it ?

Let me know what debug output you would like to see.

The kubernetes provider output.

bacchuswng · 2020-01-31T06:45:11Z

@cdaniluk (cc @max-rocket-internet) I just posted on the thread you linked to. I was running into a similar problem: either Terraform would complain about a missing kubeconfig file or I would accidentally trigger updates on other clusters (because my KUBECONFIG environment variable was being used despite explicitly setting up a Terraform kubernetes provider).

I found out later that it was actually the helm provider that I had not explicitly set up that was causing all the problems. Because I didn't set up my helm provider with the appropriate Kubernetes settings, helm would complain that it couldn't load the default ~/.kube/config file, and when I happen to have KUBECONFIG set up, it would use that to spin up new pods.

If you are also using helm, you might want to give that a shot.

Best of luck!

cdaniluk · 2020-01-31T20:06:15Z

Using helm.. but not with tf and have not set up a helm provider. This chart doesn't seem to do so implicitly.

@max-rocket-internet I need to set up an environment I can safely test this in and haven't had a chance to do so yet. Will try to over the weekend.

nick4fake · 2020-02-19T19:28:14Z

I have the same issue, no KUBE_ env variables.

alexsomesan · 2020-02-28T23:40:56Z

@cdaniluk if you have the EKS cluster resource being created or updated in the same apply operation as the Kubernetes provider, things won't work as you expect. This is due to a an issue in Terraform itself.

Please see here https://www.terraform.io/docs/providers/kubernetes/index.html#stacking-with-managed-kubernetes-cluster-resources and the TF docs link in that paragraph.

cdaniluk · 2020-03-02T00:58:42Z

@cdaniluk if you have the EKS cluster resource being created or updated in the same apply operation as the Kubernetes provider, things won't work as you expect. This is due to a an issue in Terraform itself.

I'm aware of the limitation, which is why it's all the more confusing that this module is basically making it impossible to bootstrap a cluster, all in the name of loading a simple configmap that is easily loaded by hand. In previous versions, you could use a null provider to script injecting the config map, and it all worked just fine. Now, not only does the k8s provider behave inconsistently (as indicated in like 50 open issues in that repo, some of which are due to tf and some of which are due to the provider itself), but we can't bootstrap a new cluster. The old version of the module allowed this.

I really think adding a flag to dump the config map to filesystem instead of trying to load it via provider would be ideal. I don't think the tf issues are going to go away anytime soon, as any issues open for dynamic provider configs, interpolation, etc have been open forever and aren't on any roadmap.

cmrust · 2020-03-13T04:29:23Z

In case it's of use to anyone, I'm using the following to generate the aws-auth configmap in conjunction with manage_aws_auth = false:

worker_iam_role="$(terraform state pull | jq -r '.resources[] | select(.type=="aws_iam_role") | select(.name=="workers") | .instances[0].attributes.arn')"

cat << YAML > aws-auth-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapRoles: |
    - rolearn: $worker_iam_role
      username: system:node:{{EC2PrivateDNSName}}
      groups:
        - system:bootstrappers
        - system:nodes
YAML

barryib · 2020-03-13T08:14:16Z

@cdaniluk @cmrust sounds like the version 1.11.1 solves this problem. Can you confirm please ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubernetes provider does not respect local config, can operate on other clusters #713

kubernetes provider does not respect local config, can operate on other clusters #713

cdaniluk commented Jan 26, 2020

max-rocket-internet commented Jan 27, 2020

barryib commented Jan 27, 2020

cdaniluk commented Jan 28, 2020

barryib commented Jan 28, 2020 •

edited

Loading

bacchuswng commented Jan 31, 2020 •

edited

Loading

cdaniluk commented Jan 31, 2020

nick4fake commented Feb 19, 2020

alexsomesan commented Feb 28, 2020

cdaniluk commented Mar 2, 2020

cmrust commented Mar 13, 2020

barryib commented Mar 13, 2020 •

edited

Loading

github-actions bot commented Nov 27, 2022

kubernetes provider does not respect local config, can operate on other clusters #713

kubernetes provider does not respect local config, can operate on other clusters #713

Comments

cdaniluk commented Jan 26, 2020

I have issues

I'm submitting a...

What is the current behavior?

If this is a bug, how to reproduce? Please include a code sample if relevant.

What's the expected behavior?

Are you able to fix this problem and submit a PR? Link here if you have already.

Environment details

Any other relevant info

max-rocket-internet commented Jan 27, 2020

barryib commented Jan 27, 2020

cdaniluk commented Jan 28, 2020

barryib commented Jan 28, 2020 • edited Loading

bacchuswng commented Jan 31, 2020 • edited Loading

cdaniluk commented Jan 31, 2020

nick4fake commented Feb 19, 2020

alexsomesan commented Feb 28, 2020

cdaniluk commented Mar 2, 2020

cmrust commented Mar 13, 2020

barryib commented Mar 13, 2020 • edited Loading

github-actions bot commented Nov 27, 2022

barryib commented Jan 28, 2020 •

edited

Loading

bacchuswng commented Jan 31, 2020 •

edited

Loading

barryib commented Mar 13, 2020 •

edited

Loading