-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Destroy never succeeds, DependencyViolation for Security Group #285
Comments
Im also facing this issue and I can attest the same behavior. |
Not sure if this is an issue with the module itself. I had the same problem and to solve it I had to remove all the kubernetes services before destroying it. |
@LuisC09 manually? |
Yes manually. If you create a service with type ELB then k8s will create a security group for this ELB. And this will stop the destroy process. |
latest run of eks cluster creation followed by destroy is successful. Not sure what has changed, but this didnt work before. Note: In my case, i haven't deployed any apps after provisioning the cluster. |
OK no worries. Feel free to debug and add info here. |
I have just used this module, since I have moved from premises, and trying to create an eks cluster with terraform. In my case I have used a little modification of the example fixture, apply and then destroy with any other interaction with the eks cluster. I got two DependencyViolation error with security groups attached with interfaces. |
Hi, |
OK cool, then perhaps we merge that PR to solve this issue. It sounds like it would be a popular option. Question: Don't you have left over ENIs and security groups after cluster is destroyed? |
Hi, I don't think so. The creation is very simple since it was just for a PoC. Bellow are some fragments of the terraform code locals {
tags = {
Environment = "${var.environment}"
Owner = "${var.owner}"
Workspace = "${var.cluster_name}"
}
worker_groups = [
{
instance_type = "${var.instance_type}"
key_name = "${var.key_name}"
subnets = "${join(",", var.subnets)}"
additional_userdata = "${file("${path.module}/user_data.sh")}"
asg_desired_capacity = "${var.asg_desired_capacity}"
},
]
worker_groups_launch_template = [
{
instance_type = "${var.instance_type}"
key_name = "${var.key_name}"
subnets = "${join(",", var.subnets)}"
additional_userdata = "${file("${path.module}/user_data.sh")}"
asg_desired_capacity = "${var.asg_spot_desired_capacity}"
spot_instance_pools = "${var.spot_instance_pools}"
on_demand_percentage_above_base_capacity = "0"
},
]
}
module "eks" {
source = "./terraform-aws-eks"
#source = "terraform-aws-modules/eks/aws"
#version = "2.3.1"
cluster_name = "${var.cluster_name}"
subnets = ["${var.subnets}"]
vpc_id = "${var.vpc_id}"
worker_groups = "${local.worker_groups}"
worker_groups_launch_template = "${local.worker_groups_launch_template}"
worker_group_count = 1
worker_group_launch_template_count = 1
worker_additional_security_group_ids = ["${aws_security_group.eks_sec_group.id}"]
tags = "${local.tags}"
}
resource "aws_security_group" "eks_sec_group" {
name_prefix = "eks-sec-group"
description = "Security to be applied for eks nodes"
vpc_id = "${var.vpc_id}"
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = [
"10.0.0.0/8",
"172.16.0.0/12",
"192.168.0.0/16",
]
}
tags = "${merge(local.tags, map("Name", "${var.cluster_name}-database_sec_group"))}"
}
data "aws_availability_zones" "available" {}
locals {
network_count = "${length(data.aws_availability_zones.available.names)}"
tags = {
Environment = "${var.environment}"
Owner = "${var.owner}"
Workspace = "${var.cluster_name}"
}
}
resource "aws_route53_zone" "hosted_zone" {
name = "eks-lab.com"
comment = "Private hosted zone for eks cluster"
vpc {
vpc_id = "${module.vpc.vpc_id}"
}
tags = "${local.tags}"
}
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "1.60.0"
name = "${var.cluster_name}"
cidr = "${var.cidr_block}"
azs = ["${data.aws_availability_zones.available.names[0]}", "${data.aws_availability_zones.available.names[1]}", "${data.aws_availability_zones.available.names[2]}"]
public_subnets = [
"${cidrsubnet(var.cidr_block, var.cidr_subnet_bits, 0)}",
"${cidrsubnet(var.cidr_block, var.cidr_subnet_bits, 1)}",
"${cidrsubnet(var.cidr_block, var.cidr_subnet_bits, 2)}"
]
private_subnets = [
"${cidrsubnet(var.cidr_block, var.cidr_subnet_bits, local.network_count)}",
"${cidrsubnet(var.cidr_block, var.cidr_subnet_bits, local.network_count + 1)}",
"${cidrsubnet(var.cidr_block, var.cidr_subnet_bits, local.network_count + 2)}"
]
database_subnets = [
"${cidrsubnet(var.cidr_block, var.cidr_subnet_bits, local.network_count + 3)}",
"${cidrsubnet(var.cidr_block, var.cidr_subnet_bits, local.network_count + 4)}",
"${cidrsubnet(var.cidr_block, var.cidr_subnet_bits, local.network_count + 5)}"
]
enable_nat_gateway = true
single_nat_gateway = true
enable_dns_hostnames = true
tags = "${merge(local.tags, map("kubernetes.io/cluster/${var.cluster_name}", "shared"))}"
}
terraform {
required_version = ">= 0.11.8"
}
provider "aws" {
version = ">= 1.47.0"
region = "${var.region}"
}
module "vpc" {
source = "./vpc"
owner = "${var.owner}"
environment = "${var.environment}"
cluster_name = "${var.cluster_name}"
cidr_block = "${var.cidr_block}"
cidr_subnet_bits = "${var.cidr_subnet_bits}"
}
module "rds" {
...
}
module "eks" {
source = "./eks"
owner = "${var.owner}"
environment = "${var.environment}"
cluster_name = "${var.cluster_name}"
vpc_id = "${module.vpc.vpc_id}"
key_name = "${module.bastion.key_name}"
subnets = "${module.vpc.private_subnets}"
instance_type = "${var.eks_instance_type}"
asg_desired_capacity = "${var.eks_asg_desired_capacity}"
asg_spot_desired_capacity = "${var.eks_asg_spot_desired_capacity}"
}
module "bastion" {
...
}
|
Also experiencing this issue in
Running it the second time was successful. |
This is happening for me too, and I think I know why. I setup my cluster to have private access only. The ENIs that hang around and prevent deletion of the SG are created by Amazon accounts. I suspect they're created in order to allow access from the workers to the endpoint via private IPs. In any case, it seems to be an order-of-operations issue here, as if you first manually destroy the EKS cluster (via console or CLI), the ENIs disappear and destruction of all other resources proceeds without issue. Of course, that confuses things because destroying the cluster first and then the workers doesn't make much sense. Or maybe it doesn't make a difference? That could be a solution to this. |
This issue occurs for me in 5.0.0 https://cloud.drone.io/astronomer/terraform-kubernetes-astronomer/8/1/4 I think it's because I am using the parameter worker_additional_security_group_ids |
I'm getting the same with 5.1.0 as well, if I use 'worker_groups' to create the worker node pools. The ENIs don't get destroyed with the instances, which prevents the destruction of the worker node security group. But if I use 'worker_groups_launch_template' to create the worker node pools, then the ENIs get destroyed with the instances, and the SG destruction works as expected. Is there a down side to using worker_groups_launch_template? Maybe it could be the default or recommended way of creating worker node pools? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had recent activity since being marked as stale. |
/remove-lifecycle stale |
I got the following error when destroying eks with the module terraform-aws-eks:v11.0.0:
I think it may not relate to #815 because the cluster is created with public access endpoint. |
Can you find out what was still using the security group when terraform tried to delete it? |
We created EKS with custom networking (pod IPs are in different subnets). The security group that TF tried to delete and failed refers to the ENI of the pod subnets. After it fails, I check and the ENI is in "available" state and can be deleted manually, then I can also delete the security group as well. I suspect this bug may relate to the leaking ENI issue when additional ENIs were not deleted when worker node is decommissioned. Update: I can reproduce the issue. When a worker node is deleted and in |
I think I ran into this today, but I'm not using this module (I use the AWS provider directly). My For context, I had a LoadBalancer deployed via Kubernetes when I started the Terraform Hope this helps. |
Same here. Still experiencing this during destroy. But i am using the private endpoint. |
I ran into this as well: private eni's lingering after a terraform-aws-eks/workers_launch_template.tf Line 226 in 7de18cd
So, I added
This seems to have corrected the issue. I am not using templates in my code explicitly (only the ones in module, implicitly). What I am not understanding is if local.tf has |
Hi @sighupper . The |
Today I ran into this issue, I will troubleshoot and add more details. I do think its the deploying of Kubernetes resources into the cluster, which then creates AWS resources which is making this hang.
|
I found my issue, I had a null_resource creating an IngressRoute which in-turn, created more resources. Although I was running terraform destroy in the directory that created these resources, the null_resource was only for creating...so it had no way to destroy what it created. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had recent activity since being marked as stale. |
I occasionally ran into this same issue the past months. When running |
In the table below I recorded 19 runs of
|
I am still getting the same issue. Using Terraform Version "1.0.3" and AWS Provider Version "3.50.0". |
/remove-lifecycle stale |
I am still seeing this issue. not sure why the remote security group keeps lingering around since there were attached ENI's. once i delet those ENI's , it works fine. Anyone got it working or any work arounds ? Below is the destroy console output: **aws_security_group_rule.HRIT-cluster-ingress-workstation-https: Destroying... [id=sgrule-1571703115] Error: error waiting for EKS Node Group (HRIT:CustA-HR) deletion: Ec2SecurityGroupDeletionFailure: DependencyViolation - resource has a dependent object. Resource IDs: [sg-0f2eef7aff2bb7765]** |
I am also having this issue. Not sure why this issue is closed, It seems to be still be a problem. |
Also still seeing this. |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
I have issues
I'm submitting a...
What is the current behavior?
A cluster cannot be destroyed without manual intervention
If this is a bug, how to reproduce? Please include a code sample if relevant.
Given this (stripped down but working version) of the cluster:
terraform apply
completes fine, howeverterraform destroy
fails withNetwork Interfaces in the AWS console ends up looking like this:
Manually detaching the green interfaces and deleting them all allows terraform to complete destruction.
What's the expected behavior?
Cluster can be destroyed entirely by terraform itself.
Are you able to fix this problem and submit a PR? Link here if you have already.
Environment details
Any other relevant info
Resources remaining after attempted destroy:
The text was updated successfully, but these errors were encountered: