Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky TestAccEcsCapacityProviderService #1418

Closed
t0yv0 opened this issue Nov 6, 2024 · 3 comments · Fixed by #1441
Closed

Flaky TestAccEcsCapacityProviderService #1418

t0yv0 opened this issue Nov 6, 2024 · 3 comments · Fixed by #1441
Assignees
Labels
impact/flaky-test A test that is unreliable kind/engineering Work that is not visible to an external user resolution/fixed This issue was fixed

Comments

@t0yv0
Copy link
Member

t0yv0 commented Nov 6, 2024

Failed with:

    aws:ecs:Service (my-service):
      error:   sdk-v2/provider2.go:520: sdk.helper_schema: creating ECS Service (my-service-9669d07): operation error ECS: CreateService, https response error StatusCode: 400, RequestID: 187a5383-55d4-4df4-b185-0f69370c60cf, InvalidParameterException: The target group with targetGroupArn arn:aws:elasticloadbalancing:us-west-2:894850187425:targetgroup/nginx-lb-111793d/91d31eae55038f3a does not have an associated load balancer.: [email protected]
      error: 1 error occurred:
      	* creating ECS Service (my-service-9669d07): operation error ECS: CreateService, https response error StatusCode: 400, RequestID: 187a5383-55d4-4df4-b185-0f69370c60cf, InvalidParameterException: The target group with targetGroupArn arn:aws:elasticloadbalancing:us-west-2:894850187425:targetgroup/nginx-lb-111793d/91d31eae55038f3a does not have an associated load balancer.

But restarting CI fixed the test.

@pulumi-bot pulumi-bot added the needs-triage Needs attention from the triage team label Nov 6, 2024
@t0yv0 t0yv0 added impact/flaky-test A test that is unreliable and removed needs-triage Needs attention from the triage team labels Nov 6, 2024
@mikhailshilkov mikhailshilkov added the kind/engineering Work that is not visible to an external user label Nov 18, 2024
@t0yv0
Copy link
Member Author

t0yv0 commented Dec 16, 2024

I suspect there is actually a data race in the example from a missing dependency:

 +   pulumi:pulumi:Stack                           ecs-node-awsx-examples-ecs-nodejs  creating (168s).    
 +   ├─ aws:lb:ApplicationLoadBalancer             nginx-lb                           created (1s)
 +   │  ├─ awsx:lb:ApplicationListener             nginx-lb                           created (1s)
 +   │  │  ├─ awsx:x:ec2:IngressSecurityGroupRule  nginx-lb-external-0-ingress        created (1s)
 +   │  │  │  └─ aws:ec2:SecurityGroupRule         nginx-lb-external-0-ingress        created (1s)
 +   │  │  └─ awsx:x:ec2:EgressSecurityGroupRule   nginx-lb-external-0-egress         created (2s)
 +   │  │     └─ aws:ec2:SecurityGroupRule         nginx-lb-external-0-egress         created (1s)
 +   │  ├─ awsx:lb:ApplicationTargetGroup          nginx-lb                           created (2s)
 +   │  │  └─ aws:lb:TargetGroup                   nginx-lb                           created (3s)    ###### <--- ##### 
 +   │  └─ aws:lb:LoadBalancer                     nginx-lb                           creating (156s) ###### <--- ##### 
 +   ├─ awsx:x:ecs:Cluster                         cluster                            created (1s)
 +   │  ├─ awsx:x:ec2:SecurityGroup                cluster                            created (1s)
 +   │  │  ├─ awsx:x:ec2:IngressSecurityGroupRule  cluster-ssh                        created (3s)
 +   │  │  │  └─ aws:ec2:SecurityGroupRule         cluster-ssh                        created (1s)
 +   │  │  ├─ awsx:x:ec2:EgressSecurityGroupRule   cluster-egress                     created (4s)
 +   │  │  │  └─ aws:ec2:SecurityGroupRule         cluster-egress                     created (3s)
 +   │  │  ├─ awsx:x:ec2:IngressSecurityGroupRule  cluster-containers                 created (5s)
 +   │  │  │  └─ aws:ec2:SecurityGroupRule         cluster-containers                 created (2s)
 +   │  │  └─ aws:ec2:SecurityGroup                cluster                            created (3s)
 +   │  └─ aws:ecs:Cluster                         cluster                            created (12s)
 +   ├─ awsx:x:ec2:SecurityGroup                   nginx-lb                           created (1s)
 +   │  └─ aws:ec2:SecurityGroup                   nginx-lb                           created (3s)
 +   ├─ awsx:x:ec2:Vpc                             default-vpc                        created (2s)
 +   │  ├─ awsx:x:ec2:Subnet                       default-vpc-public-1               created (2s)
 +   │  └─ awsx:x:ec2:Subnet                       default-vpc-public-0               created (2s)
 +   ├─ awsx:ecs:FargateTaskDefinition             fargate-task                       created (1s)
 +   │  ├─ aws:iam:Role                            fargate-task-execution             created (0.79s)
 +   │  ├─ aws:cloudwatch:LogGroup                 fargate-task                       created (1s)
 +   │  ├─ aws:iam:Role                            fargate-task-task                  created (0.79s)
 +   │  ├─ aws:iam:RolePolicyAttachment            fargate-task-execution-9a42f520    created (0.65s)
 +   │  └─ aws:ecs:TaskDefinition                  fargate-task                       created (0.95s)
 +   └─ awsx:ecs:FargateService                    my-service                         created (0.40s)
 +      └─ aws:ecs:Service                         my-service                         creating (147s).. ###### <--- ##### 

@t0yv0
Copy link
Member Author

t0yv0 commented Dec 16, 2024

The theory is that aws:ecs:Service starts creating before waiting on aws:lb:LoadBalancer to finish creating, aws:lb:TargetGroup has already been created but nothing here depends formally on the LB.

Looking at the dependencies from the state file.

ECS service:
"urn:pulumi:awsx-examples-ecs-nodejs::ecs-node::awsx:ecs:FargateService$aws:ecs/service:Service::my-service"

Depends on:
"urn:pulumi:awsx-examples-ecs-nodejs::ecs-node::awsx:ecs:FargateTaskDefinition$aws:ecs/taskDefinition:TaskDefinition::fargate-task"

Which in turn depends on the target group:

            "dependencies": [
                "urn:pulumi:awsx-examples-ecs-nodejs::ecs-node::aws:lb:ApplicationLoadBalancer$awsx:lb:ApplicationTargetGroup$aws:lb/targetGroup:TargetGroup::nginx-lb"
            ],

Which only depends on the VPC:
"urn:pulumi:awsx-examples-ecs-nodejs::ecs-node::aws:ec2/vpc:Vpc::default-vpc"

t0yv0 added a commit that referenced this issue Dec 16, 2024
Pulumi was not recognizing that the underlying aws.ecs.Service should wait for the load balancer to be provisioned,
causing a race that sporadically would flake up the test. This is now compensated by with a dependsOn option.

Fixes #1418
t0yv0 added a commit that referenced this issue Dec 17, 2024
Pulumi was not recognizing that the underlying aws.ecs.Service should
wait for the load balancer to be provisioned, causing a race that
sporadically would flake up the test. This is now compensated by with a
dependsOn option.

Fixes #1418
@pulumi-bot pulumi-bot added the resolution/fixed This issue was fixed label Dec 17, 2024
@pulumi-bot
Copy link
Contributor

This issue has been addressed in PR #1441 and shipped in release v2.20.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impact/flaky-test A test that is unreliable kind/engineering Work that is not visible to an external user resolution/fixed This issue was fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants