Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't fallback to localhost cluster #1479

Open
ikarlashov opened this issue Nov 2, 2021 · 12 comments
Open

Don't fallback to localhost cluster #1479

ikarlashov opened this issue Nov 2, 2021 · 12 comments

Comments

@ikarlashov
Copy link

Hi folks,

We have Gitlab-CI runners running pipeline in eks cluster. Whenever k8s provider can't establish connection to desired cluster through k8s provider config block - it falls back to localhost and trying to mess up cluster where pipeline runs. It's very dangerous behavior and should be enabled EXPLICITLY in k8s provider settings (if there's a real usecase for it).

Terraform Version, Provider Version and Kubernetes Version

Terraform version: 1.0.1
Kubernetes provider version: 2.6.1
Kubernetes version: 1.19

Affected Resource(s)

Authentication mechanism for provider

Debug Output

Fallback to localhost:
https://gist.github.com/ikarlashov/7af79c1225e9383bd6ca135cca2e0aa3

Steps to Reproduce

Misconfigured settings for k8s in k8s provider block

Expected Behavior

Fail and error message (like it does when runs in non-k8s enviro)

Actual Behavior

Trying to mess up wrong cluster

@jrhouston
Copy link
Collaborator

Thanks for opening this @ikarlashov. This seems to be the default behaviour of client-go (we don't set any explicit configuration for InCluster config, it's just what happens if no options are specified and the client is inside a cluster). I need to investigate if there is a way to disable this and make it configurable.

Is the KUBERNETES_MASTER environment being set in the pod you are running Terraform in? A workaround here may be to unset that variable before Terraform runs.

@ikarlashov
Copy link
Author

ikarlashov commented Nov 10, 2021

@jrhouston no problem :)

I don't think there's such env variable. I execed to the gitlab-runner pod and there're the following k8s-related vars:

KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT=tcp://172.20.0.1:443
KUBERNETES_SERVICE_PORT=443
FF_USE_LEGACY_KUBERNETES_EXECUTION_STRATEGY=false
KUBERNETES_SERVICE_HOST=172.20.0.1
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_SERVICE_PORT_HTTPS=443
KUBERNETES_PORT_443_TCP_ADDR=172.20.0.1
KUBERNETES_PORT_443_TCP=tcp://172.20.0.1:443

@jrhouston
Copy link
Collaborator

@ikarlashov Can you share some more information about how you are configuring the provider block in your Terraform config? After investigating it seems like you shouldn't fall back to the in-cluster config unless the provider block ends up with empty values.

@jrhouston
Copy link
Collaborator

Looks like client-go uses KUBERNETES_SERVICE_PORT and KUBERNETES_SERVICE_HOST to get the in-cluster config here. You could try unsetting those as a workaround for now.

@chandankashyap19
Copy link

chandankashyap19 commented Dec 1, 2021

facing the same issue in our own environment. Kubernetes provider is working fine with tf 0.13 but not at tf 1.0.x. It is falling back to localhost. Our cluster is AWS EKS.

Used configurations :

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
  
  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws-iam-authenticator"
    args = [
      "token",
      "-i",
      aws_eks_cluster.main.name,
      "--role",
      xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,
    ]
  }
}

Error : Error: Get "http://localhost/api/v1/namespaces/xxxxxxxx": dial tcp 127.0.0.1:80: connect: connection refused

@simwak
Copy link

simwak commented Mar 22, 2022

In our case, it even tries to connect to a completly different service (NoMachine webinterface). Because it runs on localhost and has a redirect. And that even when the cluster endpoint is available.

Get "https://127.0.0.1/nxwebplayer": x509: cannot validate certificate for 127.0.0.1 because it doesn't contain any IP SANs
with module.eks.module.eks.kubernetes_config_map.aws_auth[0],
on .terraform/modules/eks.eks/main.tf line 298, in resource "kubernetes_config_map" "aws_auth":
298: resource "kubernetes_config_map" "aws_auth"

On Helm it says the following. Somehow the configuration went missing.

Kubernetes cluster unreachable: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable

@apeabody
Copy link

apeabody commented Apr 8, 2022

Hi Team - We have observed the related issue when the provider "kubernetes" {} block is omitted, resulting in the unexpected behavior of the provider attempting to contact localhost. For a UX standpoint an invalid configuration error or warning for omitted values would be strongly preferable rather than silently falling back to localhost.

Terraform version: 1.1.6
Kubernetes provider version: v2.10.0_x5

@kaihendry
Copy link

@streamnsight
Copy link

I have a similar issue with kubernetes provider on a different cloud provider: the interesting part is that the provider config works fine on first run, but then on subsequent plan or apply, it fails with this issue.
It seems like it is just not re-runing the 'exec' block, therefore there is no config / no token, and it defaults to nothing which somehow turns into localhost

the problem lies in the exec block issue though...

@casualuser
Copy link

hello here!
any news about this issue?
it looks like I hit the same as described here and in #2127

Copy link

Marking this issue as stale due to inactivity. If this issue receives no comments in the next 30 days it will automatically be closed. If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. This helps our maintainers find and focus on the active issues. Maintainers may also remove the stale label at their discretion. Thank you!

@github-actions github-actions bot added the stale label Nov 25, 2024
@itaispiegel
Copy link

Hey, any update on this? We're also facing this issue and couldn't think of a valid solution that isn't "hacky", so we'll be glad to have this solved.

@github-actions github-actions bot removed the stale label Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants