Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Post "http://localhost/api/v1/namespaces/kube-system/configmaps": dial tcp 127.0.0.1:80: connect: connection refused #911

Closed
vrathore18 opened this issue Jun 7, 2020 · 53 comments

Comments

@vrathore18
Copy link

vrathore18 commented Jun 7, 2020

I am started getting this issue:

Error: Post "http://localhost/api/v1/namespaces/kube-system/configmaps": dial tcp 127.0.0.1:80: connect: connection refused

  on .terraform/modules/eks/terraform-aws-eks-11.1.0/aws_auth.tf line 62, in resource "kubernetes_config_map" "aws_auth":
  62: resource "kubernetes_config_map" "aws_auth" {

All my code were working fine but as I upgraded my terraform versions, providers version. I started getting above issue.

version on which everything was working:
provider:-
aws: 2.49
kubernetes: 1.10.0
helm: 0.10.4
eks: 4.0.2

others:-
terraform:0.11.13
kubectl: 1.11.7
aws-iam-authenticator:0.4.0-alpha.1

Now my versions
terraform:0.12.26
kubectl: 1.16.8
aws-iam-authenticator:0.5.0

eks.yaml

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "12.1.0"

  cluster_name    = var.name
  subnets         = module.vpc.private_subnets
  vpc_id          = module.vpc.vpc_id
  cluster_version = var.cluster_version
  manage_aws_auth = "true"

  kubeconfig_aws_authenticator_additional_args = ["-r", "arn:aws:iam::${var.target_account_id}:role/terraform"]

  worker_groups = [
    {
      instance_type        = var.eks_instance_type
      asg_desired_capacity = var.eks_asg_desired_capacity
      asg_max_size         = var.eks_asg_max_size
      key_name             = var.key_name
    }
  ]

  map_accounts = [var.target_account_id]

  map_roles = [
    {
      rolearn = format("arn:aws:iam::%s:role/admin", var.target_account_id)
      username = format("%s-admin", var.name)
      groups    = ["system:masters"]
    }
  ]

  # don't write local configs, as we do it anyway
  write_kubeconfig      = "false"
}

resource "local_file" "kubeconfig" {
  content  = module.eks.kubeconfig
  filename = "./.kube_config.yaml"
}

In the above code write_kubeconfig = "false" and creating a local file kubeconfig. I am using this file in helm and kubernetes provider.

provider.yaml

`provider "aws" {
region = var.region
version = "~> 2.65.0"

assume_role {
role_arn = "arn:aws:iam::${var.target_account_id}:role/terraform"
}
}

provider "kubernetes" {
config_path = "./.kube_config.yaml"
version = "~> 1.11.3"
}

provider "helm" {
version = "~> 1.2.2"

kubernetes {
config_path = "./.kube_config.yaml"
}
}`

On terraform apply, script is not able to create module.eks.kubernetes_config_map.aws_auth[0]:

I tried some of the suggestion mentioned here but didn't worked for me
#817

@dpiddockcmp
Copy link
Contributor

If you have manage_aws_auth = true then you need to configure the kubernetes provider as per the documentation in the README.

@arielvinas
Copy link
Contributor

I think this is a problem with the k8s provider it self... I have it configured correctly and randomly fails to connect:

hashicorp/terraform#4149
hashicorp/terraform#4149

I found a comment in the provider Golang code that explains the problem:
https://github.com/terraform-providers/terraform-provider-kubernetes/blob/master/kubernetes/provider.go#L244

@jurgenweber
Copy link
Contributor

I get this as well, but when I tried to disable the cluster after creation (so destroy it) the plan fails.

Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 127.0.0.1:80: connect: connection refused

@xsqian
Copy link

xsqian commented Jul 18, 2020

I encountered the similar issue:
Error: Post "https://0ED4D7D93F983B4B6F3664DA6B0262D0.gr7.us-east-2.eks.amazonaws.com/api/v1/namespaces/kube-system/configmaps": dial tcp: lookup 0ED4D7D93F983B4B6F3664DA6B0262D0.gr7.us-east-2.eks.amazonaws.com on 192.168.86.1:53: no such host
Any help would be appreciated!

@deanshelton913
Copy link

deanshelton913 commented Jul 29, 2020

@dpiddockcmp It would help if you were just a tad bit more specific.

What precisely in the README's example configuration solves this problem? Is it the version number which is called out explicitly in the readme the important part? Meaning we cant take the latest version for some reason? The concat functions in use? I tried a copy paste of that readme and i get:

No provider "kubernetes" plugins meet the constraint "1.10,>= 1.11.1".

@deanshelton913
Copy link

deanshelton913 commented Jul 29, 2020

As a workaround... I was able to use the AWS CLI to write my config, after my first deployment was only partially successful...

aws eks update-kubeconfig --name myApp --region $AWS_DEFAULT_REGION --alias myApp

Putting this ^ before the apply step (obviously only works after a successful creation of the cluster when the update to the aws-auth CM simply failed) worked for me... But if we ever burn the infra to the ground, we need to do this multi step process again.

@kunickiaj
Copy link

I get this as well, but when I tried to disable the cluster after creation (so destroy it) the plan fails.

Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 127.0.0.1:80: connect: connection refused

Ran into this as well... given the relative instability around the lifecycle of EKS using this module I'm probably going to consider separating it from other infra in the vpc.

@dpiddockcmp
Copy link
Contributor

You need to copy the two data sources and the kubernetes provider block from the usage example. Assuming your module definition was called eks:

data "aws_eks_cluster" "cluster" {
  name = module.eks.cluster_id
}

data "aws_eks_cluster_auth" "cluster" {
  name = module.eks.cluster_id
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.cluster.token
  load_config_file       = false
  version                = "~> 1.12"
}

@kunickiaj
Copy link

@dpiddockcmp yep I get that, the problem is encountered if you set the create_eks flag to false to destroy the cluster and then set it back to true.

I think I hit some other funkiness where the state file even showed the correct host and CA cert but the provider was using the local host and missing CA entirely.

Will see if I can get a more specific set of repro steps.

@cidemaxio
Copy link

I had the same problem. you have to delete the cluster manually because the terraform just says that it already exists instead of deleting then recreating. Once you delete the cluster you get this error.

I resolved the problem by running terraform state rm module.saasoptics.module.eks.kubernetes_config_map.aws_auth[0]

@kunickiaj
Copy link

kunickiaj commented Aug 25, 2020

Yeah, looked into this a bit. It's because terraform tries to refresh the config map resource before deleting it -- however the cluster's already been destroyed.

This module essentially needs to ensure that destruction of the config map happens before cluster destruction if that's possible. Otherwise the manual removal of the configmap from the state seems like the best solution here.

An alternative workaround to cleanly remove the cluster (you must not have gotten yourself into the state where you have the localhost error for this to work):

  1. Use target mode to destroy only the EKS cluster: terraform destroy -target module.eks
  2. Subsequently, set the create_eks flag to false after the first step
  3. Run an apply to clean up the old cluster configuration. terraform apply

@onprema
Copy link

onprema commented Nov 12, 2020

this fixed it for me

terraform state rm module.eks.kubernetes_config_map.aws_auth

thanks, @cidesaasoptics

@schollii
Copy link

schollii commented Nov 20, 2020

I got the same error after the following sequence:

  • created cluster but the instances would not join the node group, I deleted the node group
  • upon recreating the cluster, got the auth map error as terraform is trying to create the map. So terraform state rm was not an option for me since the map was no longer in the state but was still \ already in the cluster (still unclear how this happened).

I was able to complete the creation by setting manage_aws_auth=false, and later deleting the map with kubectl. Then I was able to set the flag back to true.

@fliphess
Copy link

fliphess commented Dec 10, 2020

Same issue here, in my pipeline the kubeconfig is not present during apply as it's a new ci run in a fresh ci container.
This results in the provisioner connecting to localhost.

EDIT: The workaround mentioned worked for me too: First apply the pipeline with manage_aws_auth=false. After that you can safely remove the cluster without errors and start over.

@andrewalexander
Copy link

if you are deleting and know the config map is gone, creating a listener and giving the terraform client a 204 response also seemed to work to make a stuck terraform destroy proceed happily.

I don't see any real difference in the net effect between this and the terraform state rm module.eks.kubernetes_config_map.aws_auth that also worked (as did running the terraform apply manage_aws_auth=false when creating the resources in the first place)

nc -l 80
DELETE /api/v1/namespaces/kube-system/configmaps/aws-auth HTTP/1.1
Host: localhost
User-Agent: HashiCorp/1.0 Terraform/0.14.2
Content-Length: 43
Accept: application/json, */*
Content-Type: application/json
Accept-Encoding: gzip

{"kind":"DeleteOptions","apiVersion":"v1"}
HTTP/1.1 204 OK
module.eks-cluster.module.eks.kubernetes_config_map.aws_auth[0]: Destruction complete after 2m30s

Destroy complete! Resources: 1 destroyed.

@luizmiguelsl
Copy link

I had the same problem. you have to delete the cluster manually because the terraform just says that it already exists instead of deleting then recreating. Once you delete the cluster you get this error.

I resolved the problem by running terraform state rm module.saasoptics.module.eks.kubernetes_config_map.aws_auth[0]

That's it. In my case first I ran terraform state list to see what was stored in my state and got module.your-module-name.kubernetes_config_map.aws_auth[0]. After this i just ran terraform state rm module.your-module-name.kubernetes_config_map.aws_auth[0] and now I'm able to run plans and applies again. Thanks

@leiarenee
Copy link

leiarenee commented Jan 17, 2021

I guess, root cause of the problem is insufficient validity time out value of cluster token which is only 15 minutes or very long cluster creation times of EKS which sometimes take more than 15 minutes, which ever you choose. If internal data resources such as aws_eks_cluster_auth is used to store cluster token you end up with following problems.

  • Refreshing data does not refresh token. I observed this behaviour after updating latest release of Terraform 0.14 and Kubernetes provider version 1.13.3. It seems that it is a common problem which is discussed here Unauthorized after update to v1.10 using token auth #679 I don't know if there is some kind of caching mechanism in Terraform which prevents refreshing token if it is already in state file.
  • After final release EKS control plane update, the creation times are prolonging more than 12 minutes where this value some times go beyond 15 minutes. I asked to AWS engineers why EKS control plane update takes too much in an AWS workshop and they answered me as We are trying to make sure everything works ok in your K8s master. ;-) Ok Thanks for that but I wish it was shorter and you don't have to be so much control freak. Anyway, it is what it is and we have to accept it peacefully.

How ever In those cases token is invalidated and have to be refreshed, but in the same terraform file this is not possible because data refresh occurs at the very beginning of the TF process.

My final Workaround to guarantee "Unauthorized" error free updates

I'm using Terragrunt to overcome terraforms many of those weaknesses such as dynamic dependencies, counts... which should be calculated before TF is applied

I changed K8s provider authorization configuration as follows enabling kubeconfig file instead of using internal token mechanism.

data "aws_eks_cluster" "cluster" {
  count = var.cluster_id != "" ? 1 : 0
  name  = var.cluster_id
}

data "aws_eks_cluster_auth" "cluster" {
  count = var.cluster_id != "" ? 1 : 0
  name  = var.cluster_id
}

provider "kubernetes" {
  config_path = "${path.module}/.kubeconfig"
  config_context =  element(concat(data.aws_eks_cluster.cluster[*].arn, list("")), 0)
  
  // host                   = element(concat(data.aws_eks_cluster.cluster[*].endpoint, list("")), 0)
  // cluster_ca_certificate = base64decode(element(concat(data.aws_eks_cluster.cluster[*].certificate_authority.0.data, list("")), 0))
  // token                  = element(concat(data.aws_eks_cluster_auth.cluster[*].token, list("")), 0)
  // load_config_file       = false
}

Notice that data resource aws_eks_cluster_auth is no longer used. Instead a local kubeconfig file is used for authorization.

For every request which is sent to kubernetes api:
In before hook section of terragrunt.hcl file I run following command:

aws --profile $profile eks update-kubeconfig --kubeconfig .kubeconfig --name $cluster > /dev/null 2>&1 || true

Which:

  • Creates a local kubeconfig file in temporary cash folder to be consumed by kubernetes provider for subsequent TF updates.

Where:

  • profile is the environment variable holding aws credentials profile name
  • cluster is the environment variable holding cluster name

This solution worked well for my deployments which are held separately outside of the terraform-aws-eks module as separate TG folders which depends on cluster creation TG folder. However it did not work for internal config map update of this module since before creation there is no cluster and provider is refreshed for that empty state.

POSSIBLE SOLUTION: disable config map in this module : manage_aws_auth = false

Then take resource "kubernetes_config_map" "aws_auth" from aws_auth.tf in this module and apply it as separate terragrunt file creating a dependency block to terraform-aws-eks TG folder. Create outputs from original library to be consumed in your new aws_auth TG file.

New TF file to be excuted after cluster creation.

resource "kubernetes_config_map" "aws_auth" {
  #count      = var.create_eks ? 1 : 0

  metadata {
    name      = "aws-auth"
    namespace = "kube-system"
    labels = merge(
      {
        "app.kubernetes.io/managed-by" = "Terraform"
        "terraform.io/module" = "terraform-aws-modules.eks.aws"
      },
      var.aws_auth_config.additional_labels
    )
  }

  data = var.aws_auth_config.data
}

variable "aws_auth_config" {
  description = "aws_auth_config data"
  type        = any
}

This should be added as extra_outputs.tf in your original TG environment which creates cluster.

output "aws_auth_config" {
  value = {
    additional_labels = var.aws_auth_additional_labels
    data = {
      mapRoles = yamlencode(
        distinct(concat(
          local.configmap_roles,
          var.map_roles,
        ))
      )
      mapUsers    = yamlencode(var.map_users)
      mapAccounts = yamlencode(var.map_accounts)
    }
  }
}

terragrunt.hcl for applying aws_auth seperately

locals {
  profile="<my_aws_profile>"
  cluster_name="<my-cluster-name>"
}

terraform {
  source = "${get_parent_terragrunt_dir()}/modules/terraform-aws-eks-auth"
  before_hook "refresh_kube_token" {
    commands     = ["apply", "plan","destroy","apply-all","plan-all","destroy-all","init", "init-all"]
    execute      = ["aws", "--profile", local.profile, "eks", "update-kubeconfig", "--kubeconfig", ".kubeconfig", "--name", local.cluster_name]
   }
}

# Inputs to passed to the TF module
inputs = {
  aws_auth_config = dependency.cluster.outputs.aws_auth_config
}

dependency "cluster" {
  config_path = "../cluster"  
}

Output:

[terragrunt] 2021/01/18 13:16:08 Executing hook: refresh_eks_token
[terragrunt] 2021/01/18 13:16:08 Running command: aws --profile my-profile eks update-kubeconfig --kubeconfig .kubeconfig --name my-cluster
Updated context arn:aws:eks:eu-central-1:767*****7216:cluster/my-cluster in /Users/***/dev/repo/live/infrastructure/.terragrunt-cache/767****216/dynamic/eu-central-1/shared/k8s/auth/zlnrmfl5SmuqAD673E7b9AUFDew/NYFQi6hkBT1xdxho53XJtCNnpYs/.kubeconfig

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # kubernetes_config_map.aws_auth will be created
  + resource "kubernetes_config_map" "aws_auth" {
      + data = {
          + "mapAccounts" = jsonencode([])
          + "mapRoles"    = <<-EOT
                - "groups":
                  - "system:bootstrappers"
                  - "system:nodes"
                  "rolearn": "arn:aws:iam::760******216:role/my-cluster20210118081306354500000009"
                  "username": "system:node:{{EC2PrivateDNSName}}"
                - "groups":
                  - "system:masters"
                  "rolearn": "arn:aws:iam::292******551:role/MyTestRole"
                  "username": "MyTestRole"
            EOT
          + "mapUsers"    = jsonencode([])
        }
      + id   = (known after apply)

      + metadata {
          + generation       = (known after apply)
          + labels           = {
              + "app.kubernetes.io/managed-by" = "Terraform"
              + "terraform.io/module"          = "terraform-aws-modules.eks.aws"
            }
          + name             = "aws-auth"
          + namespace        = "kube-system"
          + resource_version = (known after apply)
          + self_link        = (known after apply)
          + uid              = (known after apply)
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

kubernetes_config_map.aws_auth: Creating...
kubernetes_config_map.aws_auth: Creation complete after 1s [id=kube-system/aws-auth]

Conclusion:
I hope in the future this token refreshing problem is solved in kubernetes provider natively, but until then you can use this hack to overcome the problem. Terraform is all about using such dirty hacks and finding workarounds, right? :-)

Happy Coding

@spkane
Copy link
Contributor

spkane commented Jan 18, 2021

This also happens when the EKS cluster is deleted out from under terraform, since it is try to talk to the K8S api endpoint which no longer exists. I have seen this is some dev workflows. The command from above terraform state rm module.eks.kubernetes_config_map.aws_auth will generally allow terraform commands to run correctly again.

@Vlaaaaaaad
Copy link

I can confirm successful creation and destroys and everything using the v14.0.0 of this module, with Terrafrom 0.14.4, AWS Provider v3.26.0, and terrafrom-provider-kubernetes v2.0.1.

In the new v2 of the Kubernetes provider, there is a dedicated example on how to use it with EKS, which I just copy/pasted 🙂

@bohdanyurov-gl
Copy link

bohdanyurov-gl commented Feb 4, 2021

Unfortunately I've just hit the same issue with module version v14.0.0 and terraform 0.14.5. Still trying to find a fix.

Deleting kubernetes_config_map from the state doesn't work.

@matthewmrichter
Copy link

matthewmrichter commented Feb 7, 2021

I figured out how to pin the version of the Kube provider in Terraform 14:

  1. Remove the provider "registry.terraform.io/hashicorp/kubernetes" { ... } block in .terraform.lock.hcl if it's there
  2. Add the following your top-level terraform { ...} block:
  required_providers {
    kubernetes = {
      source  = "registry.terraform.io/hashicorp/kubernetes"
      version = "~> 1.0"
    }
  }
  1. re-init your terraform

terraform-aws-modules seems to behave much better after.

@acevedomiguel
Copy link

acevedomiguel commented Feb 19, 2021

It happened to me, after I destroyed the cluster successfully only the configmap resource is still there, then when I try to run the terraform destory again:

terraform destroy

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  - destroy

Terraform will perform the following actions:

  # module.eks.kubernetes_config_map.aws_auth[0] will be destroyed
  - resource "kubernetes_config_map" "aws_auth" {
      - binary_data = {} -> null
      - data        = {
          - "mapAccounts" = jsonencode([])
          - "mapRoles"    = <<-EOT
                - "groups":
                  - "system:bootstrappers"
                  - "system:nodes"
                  "rolearn": "arn:aws-cn:iam::XXXXXXX:role/eks-cluster-XXXXX"
                  "username": "system:node:{{EC2PrivateDNSName}}"
            EOT
          - "mapUsers"    = jsonencode([])
        } -> null
      - id          = "kube-system/aws-auth" -> null

      - metadata {
          - annotations      = {} -> null
          - generation       = 0 -> null
          - labels           = {
              - "app.kubernetes.io/managed-by" = "Terraform"
              - "terraform.io/module"          = "terraform-aws-modules.eks.aws"
            } -> null
          - name             = "aws-auth" -> null
          - namespace        = "kube-system" -> null
          - resource_version = "xxx" -> null
          - uid              = "xxxxxxx" -> null
        }
    }

Plan: 0 to add, 0 to change, 1 to destroy.

The cluster doesn't exist anymore so will never succeed the destroy.

I manually removed (https://www.terraform.io/docs/cli/commands/state/rm.html)

terraform state rm module.eks.kubernetes_config_map.aws_auth

@schollii
Copy link

Same thing here and manual remove worked. I wonder if a depends_on is missing.

@adv4000
Copy link

adv4000 commented Mar 4, 2021

Same here, fixed by manually removing state
terraform state rm module.eks.module.eks-cluster.kubernetes_config_map.aws_auth[0]

@annyip
Copy link

annyip commented Mar 22, 2021

so does anyone know the root cause of this? I've seen this issue on POST and DELETE for the configmap

@acim
Copy link

acim commented Mar 23, 2021

https://github.com/terraform-aws-modules/terraform-aws-eks/blob/e5d26e1dcc41f859eb8d2be16460fd3b5b016412/docs/faq.md#configmap-aws-auth-already-exists

Error: Get http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth: dial tcp 127.0.0.1:80: connect: connection refused

Usually this means that the kubernetes provider has not been configured, there is no default ~/.kube/config and so the kubernetes provider is attempting to talk to localhost.

It looks to me that the problem comes when you have more clusters defined in ~/.kube/config. It seems this module ignores the current-context and then fails to read the configuration properly. I use this module for quite a long time and upgraded it lot of times but I never had kubernetes provider and it should continue working without it. It should just read the ~/.kube/config and respect the current context or something like that. I also renamed context name in my kube config, this may be another reason, but still if this module reads the current context it should have correct data. This may be general terraform problem, though, maybe not this module.

@schollii
Copy link

schollii commented Mar 23, 2021

@acim interesting as long as you configure kubernetes provider to use the context that corresponds to the terraform config files not just the current context of kubectl (which could lead to modifying kubernetes resources in wrong kubernetes cluster). Eg

data "aws_eks_cluster" "cluster" {
  name  = module.eks.eks_cluster_id
}

provider "kubernetes" {
  config_path = "path/to/.kube/config"
  config_context =  data.aws_eks_cluster.cluster.arn 
}

@acim
Copy link

acim commented Mar 23, 2021

@acim interesting as long as you configure kubernetes provider to use the context that corresponds to the terraform config files not just the current context of kubectl (which could lead to modifying kubernetes resources in wrong kubernetes cluster). Eg

data "aws_eks_cluster" "cluster" {
  name  = module.eks.eks_cluster_id
}

provider "kubernetes" {
  config_path = "path/to/.kube/config"
  config_context =  data.aws_eks_cluster.cluster.arn 
}

This makes sense, thank you :)

@lpkirby
Copy link

lpkirby commented May 11, 2021

I figured out how to pin the version of the Kube provider in Terraform 14:

1. Remove the `provider "registry.terraform.io/hashicorp/kubernetes" { ... }`  block in `.terraform.lock.hcl` if it's there

2. Add the following your top-level `terraform { ...}` block:
  required_providers {
    kubernetes = {
      source  = "registry.terraform.io/hashicorp/kubernetes"
      version = "~> 1.0"
    }
  }
1. re-init your terraform

terraform-aws-modules seems to behave much better after.

Thank you @matthewmrichter. This solved my problems too.

@ptc-mrucci
Copy link

Thanks, will consider changing to that, although it's surprising that the simple config_path and config_context options, as in the example from the official docs is not reliably usable.

provider "kubernetes" {
  config_path    = "~/.kube/config"
  config_context = "my-context"
}

@daroga0002
Copy link
Contributor

Thanks, will consider changing to that, although it's surprising that the simple config_path and config_context options, as in the example from the official docs is not reliably usable.

provider "kubernetes" {
  config_path    = "~/.kube/config"
  config_context = "my-context"
}

issue with this approach is that you must set your cluster/context before running terraform, so most probably your scenario is:

  1. you created EKS < everything was working
  2. you start playing with minikube or other kubernetes cluster
  3. you tried change something via terraform in EKS

so between step 2 and 3 you changed kubernetes context/cluster so terraform provider relying on kubeconfig trying to connect to cluster set there in line:

current-context: arn:aws:eks:us-east-1:REDACTED:cluster/eks-cluster

which doesn't reflect your EKS which you trying to modify

@ptc-mrucci
Copy link

ptc-mrucci commented Aug 26, 2021

I don't think the problem is a mismatch between the current-context set in kubeconfig and the "config_context" in terraform. What would be the point of specifying config_context otherwise?

I also just tried setting different contexts in kubeconfig and terraform in a multi-cluster configuration and could not reproduce the issue.

@stale
Copy link

stale bot commented Sep 25, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Sep 25, 2021
@matthewmrichter
Copy link

/remove_stale

@stale stale bot removed the stale label Sep 29, 2021
@stale
Copy link

stale bot commented Oct 29, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions
Copy link

github-actions bot commented Nov 9, 2021

This issue was automatically closed because of stale in 10 days

@github-actions github-actions bot closed this as completed Nov 9, 2021
@bryantbiggs bryantbiggs reopened this Nov 15, 2021
@github-actions
Copy link

This issue was automatically closed because of stale in 10 days

@creeefs
Copy link

creeefs commented Dec 23, 2021

Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 127.0.0.1:80: connect: connection refused

Any ideas why I'm getting this error when trying to destroy the cluster?

@acemasterjb
Copy link

Okay so I'm using Terratest and I get this issue sometimes, I've now fixed it for the second time so I'll share my solution here since I have not seen a solution for persons using Terratest.

So for those unfamiliar, Terratest is a testing framework that automates testing IaC focusing on Terraform and Container Orchestration services.

A common testing practice is to add in a unique identifier to the Terratest scripts to make every EKS cluster created unique. This ensures that if multiple tests are run, they can be run in isolation with each other, creating new EKS clusters with unique names every time a test is ran. They call this namespacing in the Terratest docs.

Anyway, if something goes wrong with the cleanup then you probably haven't been able to destroy the EKS cluster and tried to do a terraform destroy that lead you here.

What you need to do is go to the EKS dashboard on aws console, copy the name of the cluster, and paste it in your terraform HCL files/template files in the cluster_name property of your eks module or variables.tf file.

TL;DR: Once the EKS cluster name on the AWS dashboard is the same as the one in your cluster_name in your EKS *.tf file then you can run terraform destroy again.

@julian-berks-2020
Copy link

I also found that eks-k8s-role-mapping doesn't work.
It immediately fails with
│ Error: Post "http://localhost/api/v1/namespaces/kube-system/configmaps": dial tcp [::1]:80: connect: connection refused

My fix is to wait for it to fail (not ideal), then create a ~/.kube/config using
kubergrunt eks configure --eks-cluster-arn ""

Then adding the following (per a suggestion above) seems to solve it.
provider "kubernetes" {
config_path = "~/.kube/config"
}

But of course this isn't possible until the cluster is built so not really ideal.
Still, at least my instances finally connect to the cluster

@ptc-mrucci
Copy link

If it helps anybody, the root cause of my issue was due to differences between the context name (alias) and the cluster name.

In particular:

  • the config_context provider argument is the equivalent of kubectl --context option. AKA the context name or alias.
  • the config_context_cluster is the equivalent of kubectl --cluster option.

Shouldn't there be a clear failure when the provider references a non existent context or cluster? This would mirror kubectl behaviour:

$ kubectl --context INEXISTENT_CONTEXT get svc 
Error in configuration: context was not found for specified context: INEXISTENT_CONTEXT
$ kubectl --cluster INEXISTENT_CLUSTER get svc       
error: no server found for cluster "INEXISTENT_CLUSTER"

@icicimov
Copy link

icicimov commented Jul 25, 2022

This:

data "aws_eks_cluster" "cluster" {
  name = module.eks.cluster_id
}
data "aws_eks_cluster_auth" "cluster" {
  name = module.eks.cluster_id
}
provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.cluster.token
}

is still not working even with

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "3.75.2"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "2.11.0"
    }
}
module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 18.24.1"
}

It errors with:

╷
│ Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 127.0.0.1:80: connect: connection refused
│ 
│   with module.eks.kubernetes_config_map_v1_data.aws_auth[0],
│   on .terraform/modules/eks/main.tf line 437, in resource "kubernetes_config_map_v1_data" "aws_auth":
│  437: resource "kubernetes_config_map_v1_data" "aws_auth" {
│ 
╵

I can see the issue was just left to expire due to inactivity, wasn't this a bug worth attention?

UPDATE: Some observations

I don't have any issues upon initial terraform plan + terraform apply nor deleting the cluster for as long as I don't change anything in EKS module (managed node group) like subnet_ids for example. My module call:

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 18.24.1"

  cluster_name                    = var.k8s_cluster
  cluster_version                 = var.cluster_version

  cluster_endpoint_private_access = true
  cluster_endpoint_public_access  = true
  cluster_endpoint_public_access_cidrs = var.cluster_endpoint_public_access_cidrs

  cluster_enabled_log_types       = var.cluster_enabled_log_types

  vpc_id                          = module.vpc.vpc_id
  subnet_ids                      = flatten([for i in range(var.vpc["priv_subnet_sets"]) : module.private-subnets[i].subnet_ids])

  manage_aws_auth_configmap = true
  aws_auth_roles            = concat(local.admin_user_map_roles, local.developer_user_map_roles)
}

and if I change the subnet_ids value for example once the cluster has been created like:

subnet_ids                      = module.private-subnets[1].subnet_ids

in order to trigger a change I get the above error. Until then everything is fine.

For the record, using the eks provider in token version like above or exec variant like:

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
  }
}

makes no difference I get the same error in both cases.

@EnriqueHormilla
Copy link

This:

data "aws_eks_cluster" "cluster" {
  name = module.eks.cluster_id
}
data "aws_eks_cluster_auth" "cluster" {
  name = module.eks.cluster_id
}
provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.cluster.token
}

is still not working even with

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "3.75.2"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "2.11.0"
    }
}
module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 18.24.1"
}

It errors with:

╷
│ Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 127.0.0.1:80: connect: connection refused
│ 
│   with module.eks.kubernetes_config_map_v1_data.aws_auth[0],
│   on .terraform/modules/eks/main.tf line 437, in resource "kubernetes_config_map_v1_data" "aws_auth":
│  437: resource "kubernetes_config_map_v1_data" "aws_auth" {
│ 
╵

I can see the issue was just left to expire due to inactivity, wasn't this a bug worth attention?

UPDATE: Some observations

I don't have any issues upon initial terraform plan + terraform apply nor deleting the cluster for as long as I don't change anything in EKS module (managed node group) like subnet_ids for example. My module call:

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 18.24.1"

  cluster_name                    = var.k8s_cluster
  cluster_version                 = var.cluster_version

  cluster_endpoint_private_access = true
  cluster_endpoint_public_access  = true
  cluster_endpoint_public_access_cidrs = var.cluster_endpoint_public_access_cidrs

  cluster_enabled_log_types       = var.cluster_enabled_log_types

  vpc_id                          = module.vpc.vpc_id
  subnet_ids                      = flatten([for i in range(var.vpc["priv_subnet_sets"]) : module.private-subnets[i].subnet_ids])

  manage_aws_auth_configmap = true
  aws_auth_roles            = concat(local.admin_user_map_roles, local.developer_user_map_roles)
}

and if I change the subnet_ids value for example once the cluster has been created like:

subnet_ids                      = module.private-subnets[1].subnet_ids

in order to trigger a change I get the above error. Until then everything is fine.

For the record, using the eks provider in token version like above or exec variant like:

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
  }
}

makes no difference I get the same error in both cases.

Same bug here, when I tried to change the subnets for the eks, terraform plan want to replace the eks, but finally throw this error.

╷
│ Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 127.0.0.1:80: connect: connection refused
│ 
│   with module.eks.kubernetes_config_map_v1_data.aws_auth[0],
│   on .terraform/modules/eks/main.tf line 437, in resource "kubernetes_config_map_v1_data" "aws_auth":
│  437: resource "kubernetes_config_map_v1_data" "aws_auth" {}
│ 
╵

I'm testing with the last version currently available, "18.26.6" , can reproduce the error using the example on the repo:
https://github.com/terraform-aws-modules/terraform-aws-eks/blob/v18.26.6/examples/complete/main.tf
@bryantbiggs it's a bug about my terraform config or a bug about the module?

@bryantbiggs
Copy link
Member

changing subnet Ids on the cluster is a destructive operation. This is not controlled by the module but the AWS EKS API

@a0s
Copy link

a0s commented Jul 26, 2022

I dont know why this issue is closed, cause the error is still here.

I've got Error: Post "http://localhost/api/v1/namespaces/kube-system/configmaps": dial tcp [::1]:80: connect: connection refused in the process of cluster creation (from scratch). I have this in config:

createAwsAuthConfigmap: true,
manageAwsAuthConfigmap: true,
awsAuthRoles: [],
awsAuthUsers: [],

I'am creating a cluster under assumed role's aws provider.

I'm very curious why it tries connect to localhost?

@icicimov
Copy link

changing subnet Ids on the cluster is a destructive operation. This is not controlled by the module but the AWS EKS API

@bryantbiggs thanks for still looking at this and trying to help. Does this mean we need to raise this issue with AWS EKS team? Because from terraform and this module's perspective even in a case of destructive action like destroying and re-creating the cluster we as users expect terraform plan to run successfully and tell us all about it in the plan so we can decide to apply the changes or not. Instead we are seeing this cryptic error message about trying to connect to localhost.

@icicimov
Copy link

icicimov commented Jul 27, 2022

@bryantbiggs please ignore my previous message I should have read your last message before replying, sorry about that :-/

For the others following on, using terraform plan -refresh=false .... as suggested in one of the issues Bryant linked to worked for me and the plan finished successfully upon subnets change in the module.

P.S. This of course is hardly an acceptable solution (more like a workaround) since there are for sure many other modules in people's project and running the plan with -refresh=false permanently (like in CI/CD pipeline) will not reflect any possible changes done to those modules that one would want to apply in the future

@bryantbiggs
Copy link
Member

No worries

@github-actions
Copy link

github-actions bot commented Nov 9, 2022

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 9, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests