-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws_ecs_service no longer retries when target group is not attached to load balancer #3495
Comments
@tomelliff can you please provide the full
We removed the blanket I would personally say that banking on retries for Terraform eventually creating the appropriate resource is not an ideal scenario. There are many factors that would contribute to this error still occurring even if retries are put in place:
Adding the retries for the load balancer listener might provide some convenience and help sometimes, but it does not fix the underlying requirement for ECS wanting the explicit ordering of resources before it.
I would personally disagree here as the explicit Until Terraform core supports us configuring something like waiting for all children resources of a parent resource to complete, the I hope this makes sense! |
Full error was:
I think I saw something in the code base a long time back that looked at the message as well as the status code so maybe that could be used here if you want to minimise how wide that retry logic is? I'm not sure if there's a good way to have different retry timeouts in Terraform either but the listener creation is barely slower than instant so if so dropping that to a tiny amount would work for me. The explanation makes sense but I'm stuck right now because I can't see a way of linking the listener from the parent module to the child module even in a hacky way so I can't force a dependency in Terraform without core enabling modules to depend on things and not do anything in the module until the depends_on is complete. Right now 1.9.0 breaks any time we deploy a new environment/service and we need to retry. I could probably force Gitlab CI to retry the job automatically but I'd rather not have that there long term. Longer term the plan is to move to a single ALB per ECS cluster and add listener rules for each service and environment once the auto priority PR is merged but I'm not sure if I'm going to have the same race condition there. |
Sorry for the delay here. 😅 The pull request to allow some retries for this condition has been merged into master and will release with version 1.33.0 of the AWS provider, likely middle of next week. Please note: we'll continue to recommend the usage of |
This has been released in version 1.33.0 of the AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. |
FWIW @bflad, I just tested this it and still got the error...
resource "aws_ecs_service" "app" {
...
load_balancer {
target_group_arn = "${aws_alb_target_group.main.id}"
container_name = "${var.container_name}"
container_port = "${var.container_port}"
}
depends_on = [
"aws_alb_target_group.main",
] Tested with:
|
I believe this issue is not fixed. I get the above error on every run. I'm using
I can't add an explicit |
Can confirm this is not fixed, or the retries are not working. |
Can you share a minimal, complete example of Terraform code that errors out with that error? I've just ran the I have since refactored how we deploy ECS services behind ALBs so don't have this issue any more but it was definitely working fine and no longer erroring as of up to a month ago for us when previously it would error regularly. Test output:
|
Thanks for your work on this so far. Here's a much reduced version of my setup. I'm afraid it's still fairly long since it uses a 3rd party module to create a VPC to isolate it from other resources. Also I've kept the ECS service in a separate module ./vars.tf variable "aws_region" {
default = "eu-central-1"
}
variable "azs" {
default = [
"eu-central-1a",
"eu-central-1b",
]
}
variable "cidr_private" {
default = [
"10.0.1.0/24",
"10.0.2.0/24",
]
}
variable "cidr_public" {
default = [
"10.0.101.0/24",
"10.0.102.0/24",
]
} ./main.tf provider "aws" {
region = "${var.aws_region}"
}
module "base_vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "1.46.0"
name = "tf-vpc"
cidr = "10.0.0.0/16"
azs = [
"${var.azs}",
]
private_subnets = [
"${var.cidr_private}",
]
public_subnets = [
"${var.cidr_public}",
]
enable_nat_gateway = true
single_nat_gateway = true
}
resource "aws_alb" "alb_for_fargate" {
internal = false
subnets = [
"${module.base_vpc.public_subnets}",
]
}
resource "aws_ecs_cluster" "my_cluster" {
name = "mycluster"
}
module "myapp_instance" {
source = "myapp/"
alb_arn = "${aws_alb.alb_for_fargate.arn}"
ecs_cluster_arn = "${aws_ecs_cluster.my_cluster.arn}"
private_subnets = ["${module.base_vpc.private_subnets}"]
vpc_arn = "${module.base_vpc.vpc_id}"
} ./myapp/vars.tf variable "alb_arn" {}
variable "ecs_cluster_arn" {}
variable "private_subnets" {
type = "list"
}
variable "vpc_arn" {} ./myapp/main.tf resource "aws_alb_target_group" "my_tg" {
protocol = "HTTP"
port = "80"
vpc_id = "${var.vpc_arn}"
target_type = "ip"
}
resource "aws_ecr_repository" "my_repo" {
name = "myapp"
}
resource "aws_ecs_task_definition" "my_td" {
family = "myapp"
requires_compatibilities = ["FARGATE"]
network_mode = "awsvpc"
cpu = 256
memory = 512
container_definitions = <<EOF
[
{
"name": "myapp",
"image": "myapp:latest",
"networkMode": "awsvpc",
"portMappings": [
{
"containerPort": 80,
"protocol": "tcp"
}
],
"requiresCompatibilities": [
"FARGATE"
]
}
]
EOF
}
resource "aws_ecs_service" "my_service" {
name = "myapp"
cluster = "${var.ecs_cluster_arn}"
launch_type = "FARGATE"
task_definition = "${aws_ecs_task_definition.my_td.arn}"
network_configuration = {
subnets = ["${var.private_subnets}"]
}
load_balancer {
target_group_arn = "${aws_alb_target_group.my_tg.arn}"
container_name = "myapp"
container_port = 80
}
} |
This is not fixed !!! Using load_balancer properties for aws_ecs_service fails with : validParameterException: The target group with targetGroupArn arn:aws:elasticloadbalancing:us-east-1:367555685970:targetgroup/depends-on-test-dev/b96ad7836a2fe8e7 does not have an associated load balancer. The problem is the load_balancer option for aws_ecs_service, it doesn't matter if it depends on target group creation, it fails trying to attaching the service to the target group, if you run this twice, the second one will work because target group was already attached for the first run. |
I still have this issue too. It fails the first time, but the second time, it works.
|
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks! |
Terraform Version
1.9.0 AWS provider, all TF versions
Affected Resource(s)
Terraform Configuration Files
Taken from testAccAWSEcsService_healthCheckGracePeriodSeconds but removing the
depends_on
thelb_listener
resource:Expected Behavior
The ECS service should be created at the same time as the LB listener because they both depend on the LB target group. At this point the target group may not yet be attached to the load balancer because the LB listener resource hasn't finished being created. This throws an
InvalidParameterException
which before @bflad's change in #3240 was then retried.Actual Behavior
Now it just throws the error and doesn't retry.
Steps to Reproduce
terraform apply
References
Looks like we just need to re-add the InvalidParameterException retry but I'm wary of doing without understanding why @bflad removed it in the first place. We should probably remove that
depends_on
as well in the acceptance tests although I think it's still needed for the IAM role policy.Note that I'm unable to add a
depends_on
to the listener rule because I have nested modules that has one module create an ECS service (potentially a worker based, non load balanced service) and another one that creates the load balancer and sets up security groups etc that uses the ECS service module, telling it to use the load balanced service resource. I can provide the config if necessary but ultimately I don't think we should be forcing people to put adepends_on
for a race condition that will resolve itself if we simply retry for as much as a couple of seconds.The text was updated successfully, but these errors were encountered: