Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lambda Associated EC2 Subnet and Security Group Deletion Issues and Improvements #10329

Closed
bflad opened this issue Oct 1, 2019 · 41 comments
Closed
Labels
enhancement Requests to existing resources that expand the functionality or scope. service/ec2 Issues and PRs that pertain to the ec2 service. service/lambda Issues and PRs that pertain to the lambda service. upstream Addresses functionality related to the cloud provider.

Comments

@bflad
Copy link
Contributor

bflad commented Oct 1, 2019

Description

Beginning in September 2019, improved VPC networking for AWS Lambda began rolling out in certain AWS Commercial regions. Due to the underlying AWS infrastructure changes associated with this improved networking for Lambda, an unexpected consequence was a slight change in the Elastic Network Interface (ENI) description that Terraform used to manually delete those in those EC2 Subnets and Security Groups as well as an increased amount of time to delete them. During this Lambda service deployment, it was noticed by HashiCorp, AWS, and the community that deleting Elastic Compute Cloud (EC2) Subnets and Security Groups previously associated with Lambda Functions were now receiving DependencyViolation errors after those Terraform resources' default deletion timeouts (20 minutes and 10 minutes respectively). These errors during a Terraform apply operation may look like the following:

$ terraform destroy
...
Error: errors during apply: 2 problems:
        
        - Error deleting subnet: timeout while waiting for state to become 'destroyed' (last state: 'pending', timeout: 20m0s)
        - Error deleting security group: DependencyViolation: resource sg-xxxxxxxxxxxx has a dependent object
          status code: 400, request id: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx

Please note: not all DependencyViolation errors like the above are associated with this Lambda service change. The DependencyViolation error occurs when any infrastructure is still associated with an EC2 Subnet or Security Group during deletion. This may occur due to multiple, separate Terraform configurations working with the same subnet/security group or infrastructure manually associated with the subnet/security group.

Working on top of a community contribution (thanks, @ewbankkit and @obourdon!) and in close communication with the AWS Lambda service team to determine the highest percentile deletion times, Terraform AWS Provider version 2.31.0 and later includes automatic handling of the updated ENI description and handles the increased deletion times for the new Lambda infrastructure. See the Terraform documentation on provider versioning for information about upgrading Terraform Providers.

For Terraform environments that cannot be updated to Terraform AWS Provider version 2.31.0 or later yet, this issue can be mitigated by setting the customizable deletion timeouts available for these two Terraform resources to at least 45 minutes and ensuring any Lambda execution IAM Role permissions with ec2:DeleteNetworkInterface are explicitly ordered after the deletion of associated subnets/security groups so the Lambda service has permissions to delete the ENIs it created in your VPC before those permissions are removed.

Example configuration for Terraform AWS Provider versions 2.30.0 and earlier:

resource "aws_iam_role_policy_attachment" "example" {
  policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole"
  role       = "${aws_iam_role.example.id}"
}

resource "aws_subnet" "example" {
  # ... other configuration ...

  timeouts = {
    delete = "45m"
  }

  depends_on = ["aws_iam_role_policy_attachment.example"]
}

resource "aws_security_group" "example" {
  # ... other configuration ...

  timeouts = {
    delete = "45m"
  }

  depends_on = ["aws_iam_role_policy_attachment.example"]
}

In those earlier versions of the Terraform AWS Provider, if the IAM Role permissions are removed before Lambda is able to delete its Hyperplane ENIs, the subnet/security groups deletions will continually fail with a DependencyViolation error as those ENIs must be manually deleted. Those ENIs can be discovered by searching for the ENI description AWS Lambda VPC ENI*.

Example AWS CLI commands to find Lambda ENIs (see the AWS CLI documentation for additional filtering options):

# EC2 Subnet example
$ aws ec2 describe-network-interfaces --filter 'Name=description,Values="AWS Lambda VPC ENI*",Name=subnet-id,Values=subnet-12345678'
# EC2 Security Group example
$ aws ec2 describe-network-interfaces --filter 'Name=description,Values="AWS Lambda VPC ENI*",Name=group-id,Values=sg-12345678'

Example AWS CLI command to delete an ENI:

$ aws ec2 delete-network-interface --network-interface-id eni-12345678

While the deletion issues are now handled (either automatically in version 2.31.0 or later, or manually with the configuration above), the increased deletion time for this infrastructure is less than ideal. HashiCorp and AWS are continuing to closely work together on reducing this time, which will likely be handled by additional changes to the AWS Lambda service without any necessary changes to Terraform configurations. This issue serves as a location to capture updates relating to those service improvements.

@bflad bflad added enhancement Requests to existing resources that expand the functionality or scope. upstream Addresses functionality related to the cloud provider. service/lambda Issues and PRs that pertain to the lambda service. service/ec2 Issues and PRs that pertain to the ec2 service. labels Oct 1, 2019
@github-actions github-actions bot added the needs-triage Waiting for first response or review from a maintainer. label Oct 1, 2019
@bflad
Copy link
Contributor Author

bflad commented Oct 1, 2019

Terraform AWS Provider version 2.31.0, with the increased deletion time handling, is scheduled for release on Thursday, October 3rd, 2019.

@bflad bflad removed the needs-triage Waiting for first response or review from a maintainer. label Oct 1, 2019
@bflad bflad pinned this issue Oct 1, 2019
bflad added a commit that referenced this issue Oct 2, 2019
Reference: #10044
Reference: #10114
Reference: #10329

The introduction of [improved VPC networking for Lambda]() brought some welcome enhancements to Lambda functionality, but initially has some unintentional consequences when working with Terraform due to the underlying infrastructure changes. The main issue is that these new Hyperplane ENIs associated with Lambda take additional time currently to detach/delete and that the Lambda service itself is the owner of these ENIs, which prevents early detachment.

In working with the AWS Lambda service team, we have received some confirmation on expected detachment/deletion timeframes for Lambda Hyperplane ENIs. Using this information, we set the Lambda ENI timeout to be at a minimum the expected deletion time to match the service expectations without adjusting the overall default `aws_security_group` or `aws_subnet` resource deletion timeouts. This is to ensure legitimate `DependencyViolation` errors return to operators in a fairly timely manner (left as 10 minutes and 20 minutes respectfully).

Output from AWS Commerical (us-east-2 - Hyperplane enabled)

```
--- PASS: TestAccAWSLambdaFunction_basic (23.37s)
--- PASS: TestAccAWSLambdaFunction_concurrency (30.76s)
--- PASS: TestAccAWSLambdaFunction_concurrencyCycle (43.12s)
--- PASS: TestAccAWSLambdaFunction_DeadLetterConfig (42.40s)
--- PASS: TestAccAWSLambdaFunction_DeadLetterConfigUpdated (41.70s)
--- PASS: TestAccAWSLambdaFunction_EmptyVpcConfig (22.99s)
--- PASS: TestAccAWSLambdaFunction_encryptedEnvVariables (51.21s)
--- PASS: TestAccAWSLambdaFunction_envVariables (45.14s)
--- PASS: TestAccAWSLambdaFunction_expectFilenameAndS3Attributes (10.90s)
--- PASS: TestAccAWSLambdaFunction_importLocalFile (31.12s)
--- PASS: TestAccAWSLambdaFunction_importLocalFile_VPC (1422.82s)
--- PASS: TestAccAWSLambdaFunction_importS3 (22.66s)
--- PASS: TestAccAWSLambdaFunction_Layers (34.75s)
--- PASS: TestAccAWSLambdaFunction_LayersUpdate (54.60s)
--- PASS: TestAccAWSLambdaFunction_localUpdate (31.40s)
--- PASS: TestAccAWSLambdaFunction_localUpdate_nameOnly (24.21s)
--- PASS: TestAccAWSLambdaFunction_nilDeadLetterConfig (12.71s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_java8 (23.05s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_NodeJs10x (26.99s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_NodeJs810 (26.53s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_noRuntime (0.72s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_provided (18.66s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_python27 (27.62s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_python36 (22.87s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_python37 (27.09s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_ruby25 (27.87s)
--- PASS: TestAccAWSLambdaFunction_s3 (22.59s)
--- PASS: TestAccAWSLambdaFunction_s3Update_basic (32.58s)
--- PASS: TestAccAWSLambdaFunction_s3Update_unversioned (31.07s)
--- PASS: TestAccAWSLambdaFunction_tags (42.41s)
--- PASS: TestAccAWSLambdaFunction_tracingConfig (39.12s)
--- PASS: TestAccAWSLambdaFunction_updateRuntime (29.16s)
--- PASS: TestAccAWSLambdaFunction_versioned (28.09s)
--- PASS: TestAccAWSLambdaFunction_versionedUpdate (47.13s)
--- PASS: TestAccAWSLambdaFunction_VPC (1331.55s)
--- PASS: TestAccAWSLambdaFunction_VPC_withInvocation (1376.24s)
--- PASS: TestAccAWSLambdaFunction_VpcConfig_ProperIamDependencies (1327.69s)
--- PASS: TestAccAWSLambdaFunction_VPCRemoval (1490.19s)
--- PASS: TestAccAWSLambdaFunction_VPCUpdate (1685.40s)
```

Output from AWS Commercial (us-west-2 - Hyperplane not deployed)

```
--- PASS: TestAccAWSLambdaFunction_basic (40.50s)
--- PASS: TestAccAWSLambdaFunction_concurrency (47.79s)
--- PASS: TestAccAWSLambdaFunction_concurrencyCycle (62.65s)
--- PASS: TestAccAWSLambdaFunction_DeadLetterConfig (55.95s)
--- PASS: TestAccAWSLambdaFunction_DeadLetterConfigUpdated (50.23s)
--- PASS: TestAccAWSLambdaFunction_EmptyVpcConfig (37.47s)
--- PASS: TestAccAWSLambdaFunction_encryptedEnvVariables (73.66s)
--- PASS: TestAccAWSLambdaFunction_envVariables (80.88s)
--- PASS: TestAccAWSLambdaFunction_expectFilenameAndS3Attributes (22.59s)
--- PASS: TestAccAWSLambdaFunction_importLocalFile (42.78s)
--- PASS: TestAccAWSLambdaFunction_importLocalFile_VPC (39.40s)
--- PASS: TestAccAWSLambdaFunction_importS3 (36.62s)
--- PASS: TestAccAWSLambdaFunction_Layers (53.78s)
--- PASS: TestAccAWSLambdaFunction_LayersUpdate (89.78s)
--- PASS: TestAccAWSLambdaFunction_localUpdate (54.31s)
--- PASS: TestAccAWSLambdaFunction_localUpdate_nameOnly (56.10s)
--- PASS: TestAccAWSLambdaFunction_nilDeadLetterConfig (26.12s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_java8 (46.49s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_NodeJs10x (52.25s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_NodeJs810 (43.59s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_noRuntime (2.71s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_provided (43.88s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_python27 (47.91s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_python36 (45.95s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_python37 (41.40s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_ruby25 (50.32s)
--- PASS: TestAccAWSLambdaFunction_s3 (35.28s)
--- PASS: TestAccAWSLambdaFunction_s3Update_basic (57.89s)
--- PASS: TestAccAWSLambdaFunction_s3Update_unversioned (58.81s)
--- PASS: TestAccAWSLambdaFunction_tags (75.77s)
--- PASS: TestAccAWSLambdaFunction_tracingConfig (55.61s)
--- PASS: TestAccAWSLambdaFunction_updateRuntime (57.19s)
--- PASS: TestAccAWSLambdaFunction_versioned (33.52s)
--- PASS: TestAccAWSLambdaFunction_versionedUpdate (58.25s)
--- PASS: TestAccAWSLambdaFunction_VPC (56.81s)
--- PASS: TestAccAWSLambdaFunction_VPC_withInvocation (86.81s)
--- PASS: TestAccAWSLambdaFunction_VpcConfig_ProperIamDependencies (42.99s)
--- PASS: TestAccAWSLambdaFunction_VPCRemoval (80.28s)
--- PASS: TestAccAWSLambdaFunction_VPCUpdate (81.84s)

--- PASS: TestAccAWSSecurityGroup_basic (10.14s)
--- PASS: TestAccAWSSecurityGroup_Change (19.36s)
--- PASS: TestAccAWSSecurityGroup_CIDRandGroups (31.78s)
--- PASS: TestAccAWSSecurityGroup_DefaultEgress_Classic (6.53s)
--- PASS: TestAccAWSSecurityGroup_DefaultEgress_VPC (25.29s)
--- PASS: TestAccAWSSecurityGroup_drift (7.55s)
--- PASS: TestAccAWSSecurityGroup_drift_complex (31.62s)
--- PASS: TestAccAWSSecurityGroup_Egress_ConfigMode (23.76s)
--- PASS: TestAccAWSSecurityGroup_egressWithPrefixList (24.51s)
--- PASS: TestAccAWSSecurityGroup_failWithDiffMismatch (12.13s)
--- PASS: TestAccAWSSecurityGroup_forceRevokeRules_false (1228.05s)
--- PASS: TestAccAWSSecurityGroup_forceRevokeRules_true (1242.70s)
--- PASS: TestAccAWSSecurityGroup_generatedName (25.26s)
--- PASS: TestAccAWSSecurityGroup_importBasic (12.91s)
--- PASS: TestAccAWSSecurityGroup_importIPRangeAndSecurityGroupWithSameRules (14.68s)
--- PASS: TestAccAWSSecurityGroup_importIPRangesWithSameRules (12.19s)
--- PASS: TestAccAWSSecurityGroup_importIpv6 (30.08s)
--- PASS: TestAccAWSSecurityGroup_importPrefixList (25.01s)
--- PASS: TestAccAWSSecurityGroup_importSelf (31.64s)
--- PASS: TestAccAWSSecurityGroup_importSourceSecurityGroup (30.19s)
--- PASS: TestAccAWSSecurityGroup_Ingress_ConfigMode (23.47s)
--- PASS: TestAccAWSSecurityGroup_ingressWithCidrAndSGs (31.60s)
--- PASS: TestAccAWSSecurityGroup_ingressWithCidrAndSGs_classic (9.86s)
--- PASS: TestAccAWSSecurityGroup_ingressWithPrefixList (44.12s)
--- PASS: TestAccAWSSecurityGroup_invalidCIDRBlock (1.28s)
--- PASS: TestAccAWSSecurityGroup_ipv4andipv6Egress (11.90s)
--- PASS: TestAccAWSSecurityGroup_ipv6 (12.77s)
--- PASS: TestAccAWSSecurityGroup_MultiIngress (12.33s)
--- PASS: TestAccAWSSecurityGroup_namePrefix (6.47s)
--- PASS: TestAccAWSSecurityGroup_RuleDescription (26.52s)
--- PASS: TestAccAWSSecurityGroup_ruleGathering (24.55s)
--- PASS: TestAccAWSSecurityGroup_ruleLimitCidrBlockExceededAppend (48.89s)
--- PASS: TestAccAWSSecurityGroup_ruleLimitExceededAllNew (53.89s)
--- PASS: TestAccAWSSecurityGroup_ruleLimitExceededAppend (50.48s)
--- PASS: TestAccAWSSecurityGroup_ruleLimitExceededPrepend (54.09s)
--- PASS: TestAccAWSSecurityGroup_rulesDropOnError (22.40s)
--- PASS: TestAccAWSSecurityGroup_self (11.93s)
--- PASS: TestAccAWSSecurityGroup_tags (40.86s)
--- PASS: TestAccAWSSecurityGroup_vpc (10.39s)
--- PASS: TestAccAWSSecurityGroup_vpcNegOneIngress (10.55s)
--- PASS: TestAccAWSSecurityGroup_vpcProtoNumIngress (11.84s)

--- PASS: TestAccAWSSubnet_availabilityZoneId (26.56s)
--- PASS: TestAccAWSSubnet_basic (26.69s)
--- PASS: TestAccAWSSubnet_enableIpv6 (42.97s)
--- PASS: TestAccAWSSubnet_ipv6 (69.30s)
```
@richardgavel
Copy link

richardgavel commented Oct 2, 2019

@bflad I'm a little confused. Is original comment stating that that the ENIs go away on their own and all that is needed is the increased timeout because that does not appear to be the case. Ours don't seem to go away (e.g. the ENIs are still around 50 min after the DeleteFunction call). Isn't #10347, which modifies the dangling ENI removal logic, still needed?

@bflad
Copy link
Contributor Author

bflad commented Oct 2, 2019

@richardgavel yes, #10347 which adjusts the ENI description matching and tweaks the ENI detachment/deletion logic is what will be part be part of version 2.31.0 of the Terraform AWS Provider when it is released tomorrow. 👍 This issue is a followup issue for tracking Lambda service enhancements to reduce the amount of time necessary for Hyperplane ENIs to become eligible for deletion, which are mostly expected to occur within the Lambda service itself.

One thing that is missing in the above Terraform configuration for those that cannot upgrade yet (when its released) is explicit depends_on logic with the Lambda IAM Role permissions to allow Lambda to delete Hyperplane ENIs itself, otherwise they must be manually deleted (due to the change in the ENI description, which older versions of the provider would not know about). I will edit the configuration above shortly to account for that.

bflad added a commit that referenced this issue Oct 2, 2019
Reference: #10044
Reference: #10114
Reference: #10329

The introduction of [improved VPC networking for Lambda]() brought some welcome enhancements to Lambda functionality, but initially has some unintentional consequences when working with Terraform due to the underlying infrastructure changes. The main issue is that these new Hyperplane ENIs associated with Lambda take additional time currently to detach/delete and that the Lambda service itself is the owner of these ENIs, which prevents early detachment.

In working with the AWS Lambda service team, we have received some confirmation on expected detachment/deletion timeframes for Lambda Hyperplane ENIs. Using this information, we set the Lambda ENI timeout to be at a minimum the expected deletion time to match the service expectations without adjusting the overall default `aws_security_group` or `aws_subnet` resource deletion timeouts. This is to ensure legitimate `DependencyViolation` errors return to operators in a fairly timely manner (left as 10 minutes and 20 minutes respectfully).

Output from AWS Commerical (us-east-2 - Hyperplane enabled)

```
--- PASS: TestAccAWSLambdaFunction_basic (23.37s)
--- PASS: TestAccAWSLambdaFunction_concurrency (30.76s)
--- PASS: TestAccAWSLambdaFunction_concurrencyCycle (43.12s)
--- PASS: TestAccAWSLambdaFunction_DeadLetterConfig (42.40s)
--- PASS: TestAccAWSLambdaFunction_DeadLetterConfigUpdated (41.70s)
--- PASS: TestAccAWSLambdaFunction_EmptyVpcConfig (22.99s)
--- PASS: TestAccAWSLambdaFunction_encryptedEnvVariables (51.21s)
--- PASS: TestAccAWSLambdaFunction_envVariables (45.14s)
--- PASS: TestAccAWSLambdaFunction_expectFilenameAndS3Attributes (10.90s)
--- PASS: TestAccAWSLambdaFunction_importLocalFile (31.12s)
--- PASS: TestAccAWSLambdaFunction_importLocalFile_VPC (1422.82s)
--- PASS: TestAccAWSLambdaFunction_importS3 (22.66s)
--- PASS: TestAccAWSLambdaFunction_Layers (34.75s)
--- PASS: TestAccAWSLambdaFunction_LayersUpdate (54.60s)
--- PASS: TestAccAWSLambdaFunction_localUpdate (31.40s)
--- PASS: TestAccAWSLambdaFunction_localUpdate_nameOnly (24.21s)
--- PASS: TestAccAWSLambdaFunction_nilDeadLetterConfig (12.71s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_java8 (23.05s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_NodeJs10x (26.99s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_NodeJs810 (26.53s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_noRuntime (0.72s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_provided (18.66s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_python27 (27.62s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_python36 (22.87s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_python37 (27.09s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_ruby25 (27.87s)
--- PASS: TestAccAWSLambdaFunction_s3 (22.59s)
--- PASS: TestAccAWSLambdaFunction_s3Update_basic (32.58s)
--- PASS: TestAccAWSLambdaFunction_s3Update_unversioned (31.07s)
--- PASS: TestAccAWSLambdaFunction_tags (42.41s)
--- PASS: TestAccAWSLambdaFunction_tracingConfig (39.12s)
--- PASS: TestAccAWSLambdaFunction_updateRuntime (29.16s)
--- PASS: TestAccAWSLambdaFunction_versioned (28.09s)
--- PASS: TestAccAWSLambdaFunction_versionedUpdate (47.13s)
--- PASS: TestAccAWSLambdaFunction_VPC (1331.55s)
--- PASS: TestAccAWSLambdaFunction_VPC_withInvocation (1376.24s)
--- PASS: TestAccAWSLambdaFunction_VpcConfig_ProperIamDependencies (1327.69s)
--- PASS: TestAccAWSLambdaFunction_VPCRemoval (1490.19s)
--- PASS: TestAccAWSLambdaFunction_VPCUpdate (1685.40s)
```

Output from AWS Commercial (us-west-2 - Hyperplane not deployed)

```
--- PASS: TestAccAWSLambdaFunction_basic (40.50s)
--- PASS: TestAccAWSLambdaFunction_concurrency (47.79s)
--- PASS: TestAccAWSLambdaFunction_concurrencyCycle (62.65s)
--- PASS: TestAccAWSLambdaFunction_DeadLetterConfig (55.95s)
--- PASS: TestAccAWSLambdaFunction_DeadLetterConfigUpdated (50.23s)
--- PASS: TestAccAWSLambdaFunction_EmptyVpcConfig (37.47s)
--- PASS: TestAccAWSLambdaFunction_encryptedEnvVariables (73.66s)
--- PASS: TestAccAWSLambdaFunction_envVariables (80.88s)
--- PASS: TestAccAWSLambdaFunction_expectFilenameAndS3Attributes (22.59s)
--- PASS: TestAccAWSLambdaFunction_importLocalFile (42.78s)
--- PASS: TestAccAWSLambdaFunction_importLocalFile_VPC (39.40s)
--- PASS: TestAccAWSLambdaFunction_importS3 (36.62s)
--- PASS: TestAccAWSLambdaFunction_Layers (53.78s)
--- PASS: TestAccAWSLambdaFunction_LayersUpdate (89.78s)
--- PASS: TestAccAWSLambdaFunction_localUpdate (54.31s)
--- PASS: TestAccAWSLambdaFunction_localUpdate_nameOnly (56.10s)
--- PASS: TestAccAWSLambdaFunction_nilDeadLetterConfig (26.12s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_java8 (46.49s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_NodeJs10x (52.25s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_NodeJs810 (43.59s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_noRuntime (2.71s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_provided (43.88s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_python27 (47.91s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_python36 (45.95s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_python37 (41.40s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_ruby25 (50.32s)
--- PASS: TestAccAWSLambdaFunction_s3 (35.28s)
--- PASS: TestAccAWSLambdaFunction_s3Update_basic (57.89s)
--- PASS: TestAccAWSLambdaFunction_s3Update_unversioned (58.81s)
--- PASS: TestAccAWSLambdaFunction_tags (75.77s)
--- PASS: TestAccAWSLambdaFunction_tracingConfig (55.61s)
--- PASS: TestAccAWSLambdaFunction_updateRuntime (57.19s)
--- PASS: TestAccAWSLambdaFunction_versioned (33.52s)
--- PASS: TestAccAWSLambdaFunction_versionedUpdate (58.25s)
--- PASS: TestAccAWSLambdaFunction_VPC (56.81s)
--- PASS: TestAccAWSLambdaFunction_VPC_withInvocation (86.81s)
--- PASS: TestAccAWSLambdaFunction_VpcConfig_ProperIamDependencies (42.99s)
--- PASS: TestAccAWSLambdaFunction_VPCRemoval (80.28s)
--- PASS: TestAccAWSLambdaFunction_VPCUpdate (81.84s)

--- PASS: TestAccAWSSecurityGroup_basic (10.14s)
--- PASS: TestAccAWSSecurityGroup_Change (19.36s)
--- PASS: TestAccAWSSecurityGroup_CIDRandGroups (31.78s)
--- PASS: TestAccAWSSecurityGroup_DefaultEgress_Classic (6.53s)
--- PASS: TestAccAWSSecurityGroup_DefaultEgress_VPC (25.29s)
--- PASS: TestAccAWSSecurityGroup_drift (7.55s)
--- PASS: TestAccAWSSecurityGroup_drift_complex (31.62s)
--- PASS: TestAccAWSSecurityGroup_Egress_ConfigMode (23.76s)
--- PASS: TestAccAWSSecurityGroup_egressWithPrefixList (24.51s)
--- PASS: TestAccAWSSecurityGroup_failWithDiffMismatch (12.13s)
--- PASS: TestAccAWSSecurityGroup_forceRevokeRules_false (1228.05s)
--- PASS: TestAccAWSSecurityGroup_forceRevokeRules_true (1242.70s)
--- PASS: TestAccAWSSecurityGroup_generatedName (25.26s)
--- PASS: TestAccAWSSecurityGroup_importBasic (12.91s)
--- PASS: TestAccAWSSecurityGroup_importIPRangeAndSecurityGroupWithSameRules (14.68s)
--- PASS: TestAccAWSSecurityGroup_importIPRangesWithSameRules (12.19s)
--- PASS: TestAccAWSSecurityGroup_importIpv6 (30.08s)
--- PASS: TestAccAWSSecurityGroup_importPrefixList (25.01s)
--- PASS: TestAccAWSSecurityGroup_importSelf (31.64s)
--- PASS: TestAccAWSSecurityGroup_importSourceSecurityGroup (30.19s)
--- PASS: TestAccAWSSecurityGroup_Ingress_ConfigMode (23.47s)
--- PASS: TestAccAWSSecurityGroup_ingressWithCidrAndSGs (31.60s)
--- PASS: TestAccAWSSecurityGroup_ingressWithCidrAndSGs_classic (9.86s)
--- PASS: TestAccAWSSecurityGroup_ingressWithPrefixList (44.12s)
--- PASS: TestAccAWSSecurityGroup_invalidCIDRBlock (1.28s)
--- PASS: TestAccAWSSecurityGroup_ipv4andipv6Egress (11.90s)
--- PASS: TestAccAWSSecurityGroup_ipv6 (12.77s)
--- PASS: TestAccAWSSecurityGroup_MultiIngress (12.33s)
--- PASS: TestAccAWSSecurityGroup_namePrefix (6.47s)
--- PASS: TestAccAWSSecurityGroup_RuleDescription (26.52s)
--- PASS: TestAccAWSSecurityGroup_ruleGathering (24.55s)
--- PASS: TestAccAWSSecurityGroup_ruleLimitCidrBlockExceededAppend (48.89s)
--- PASS: TestAccAWSSecurityGroup_ruleLimitExceededAllNew (53.89s)
--- PASS: TestAccAWSSecurityGroup_ruleLimitExceededAppend (50.48s)
--- PASS: TestAccAWSSecurityGroup_ruleLimitExceededPrepend (54.09s)
--- PASS: TestAccAWSSecurityGroup_rulesDropOnError (22.40s)
--- PASS: TestAccAWSSecurityGroup_self (11.93s)
--- PASS: TestAccAWSSecurityGroup_tags (40.86s)
--- PASS: TestAccAWSSecurityGroup_vpc (10.39s)
--- PASS: TestAccAWSSecurityGroup_vpcNegOneIngress (10.55s)
--- PASS: TestAccAWSSecurityGroup_vpcProtoNumIngress (11.84s)

--- PASS: TestAccAWSSubnet_availabilityZoneId (26.56s)
--- PASS: TestAccAWSSubnet_basic (26.69s)
--- PASS: TestAccAWSSubnet_enableIpv6 (42.97s)
--- PASS: TestAccAWSSubnet_ipv6 (69.30s)
```
bflad added a commit that referenced this issue Oct 2, 2019
Reference: #10044
Reference: #10114
Reference: #10329

The introduction of [improved VPC networking for Lambda]() brought some welcome enhancements to Lambda functionality, but initially has some unintentional consequences when working with Terraform due to the underlying infrastructure changes. The main issue is that these new Hyperplane ENIs associated with Lambda take additional time currently to detach/delete and that the Lambda service itself is the owner of these ENIs, which prevents early detachment.

In working with the AWS Lambda service team, we have received some confirmation on expected detachment/deletion timeframes for Lambda Hyperplane ENIs. Using this information, we set the Lambda ENI timeout to be at a minimum the expected deletion time to match the service expectations without adjusting the overall default `aws_security_group` or `aws_subnet` resource deletion timeouts. This is to ensure legitimate `DependencyViolation` errors return to operators in a fairly timely manner (left as 10 minutes and 20 minutes respectfully).

Output from AWS Commerical (us-east-2 - Hyperplane enabled)

```
--- PASS: TestAccAWSLambdaFunction_basic (23.37s)
--- PASS: TestAccAWSLambdaFunction_concurrency (30.76s)
--- PASS: TestAccAWSLambdaFunction_concurrencyCycle (43.12s)
--- PASS: TestAccAWSLambdaFunction_DeadLetterConfig (42.40s)
--- PASS: TestAccAWSLambdaFunction_DeadLetterConfigUpdated (41.70s)
--- PASS: TestAccAWSLambdaFunction_EmptyVpcConfig (22.99s)
--- PASS: TestAccAWSLambdaFunction_encryptedEnvVariables (51.21s)
--- PASS: TestAccAWSLambdaFunction_envVariables (45.14s)
--- PASS: TestAccAWSLambdaFunction_expectFilenameAndS3Attributes (10.90s)
--- PASS: TestAccAWSLambdaFunction_importLocalFile (31.12s)
--- PASS: TestAccAWSLambdaFunction_importLocalFile_VPC (1422.82s)
--- PASS: TestAccAWSLambdaFunction_importS3 (22.66s)
--- PASS: TestAccAWSLambdaFunction_Layers (34.75s)
--- PASS: TestAccAWSLambdaFunction_LayersUpdate (54.60s)
--- PASS: TestAccAWSLambdaFunction_localUpdate (31.40s)
--- PASS: TestAccAWSLambdaFunction_localUpdate_nameOnly (24.21s)
--- PASS: TestAccAWSLambdaFunction_nilDeadLetterConfig (12.71s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_java8 (23.05s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_NodeJs10x (26.99s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_NodeJs810 (26.53s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_noRuntime (0.72s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_provided (18.66s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_python27 (27.62s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_python36 (22.87s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_python37 (27.09s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_ruby25 (27.87s)
--- PASS: TestAccAWSLambdaFunction_s3 (22.59s)
--- PASS: TestAccAWSLambdaFunction_s3Update_basic (32.58s)
--- PASS: TestAccAWSLambdaFunction_s3Update_unversioned (31.07s)
--- PASS: TestAccAWSLambdaFunction_tags (42.41s)
--- PASS: TestAccAWSLambdaFunction_tracingConfig (39.12s)
--- PASS: TestAccAWSLambdaFunction_updateRuntime (29.16s)
--- PASS: TestAccAWSLambdaFunction_versioned (28.09s)
--- PASS: TestAccAWSLambdaFunction_versionedUpdate (47.13s)
--- PASS: TestAccAWSLambdaFunction_VPC (1331.55s)
--- PASS: TestAccAWSLambdaFunction_VPC_withInvocation (1376.24s)
--- PASS: TestAccAWSLambdaFunction_VpcConfig_ProperIamDependencies (1327.69s)
--- PASS: TestAccAWSLambdaFunction_VPCRemoval (1490.19s)
--- PASS: TestAccAWSLambdaFunction_VPCUpdate (1685.40s)
```

Output from AWS Commercial (us-west-2 - Hyperplane not deployed)

```
--- PASS: TestAccAWSLambdaFunction_basic (40.50s)
--- PASS: TestAccAWSLambdaFunction_concurrency (47.79s)
--- PASS: TestAccAWSLambdaFunction_concurrencyCycle (62.65s)
--- PASS: TestAccAWSLambdaFunction_DeadLetterConfig (55.95s)
--- PASS: TestAccAWSLambdaFunction_DeadLetterConfigUpdated (50.23s)
--- PASS: TestAccAWSLambdaFunction_EmptyVpcConfig (37.47s)
--- PASS: TestAccAWSLambdaFunction_encryptedEnvVariables (73.66s)
--- PASS: TestAccAWSLambdaFunction_envVariables (80.88s)
--- PASS: TestAccAWSLambdaFunction_expectFilenameAndS3Attributes (22.59s)
--- PASS: TestAccAWSLambdaFunction_importLocalFile (42.78s)
--- PASS: TestAccAWSLambdaFunction_importLocalFile_VPC (39.40s)
--- PASS: TestAccAWSLambdaFunction_importS3 (36.62s)
--- PASS: TestAccAWSLambdaFunction_Layers (53.78s)
--- PASS: TestAccAWSLambdaFunction_LayersUpdate (89.78s)
--- PASS: TestAccAWSLambdaFunction_localUpdate (54.31s)
--- PASS: TestAccAWSLambdaFunction_localUpdate_nameOnly (56.10s)
--- PASS: TestAccAWSLambdaFunction_nilDeadLetterConfig (26.12s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_java8 (46.49s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_NodeJs10x (52.25s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_NodeJs810 (43.59s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_noRuntime (2.71s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_provided (43.88s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_python27 (47.91s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_python36 (45.95s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_python37 (41.40s)
--- PASS: TestAccAWSLambdaFunction_runtimeValidation_ruby25 (50.32s)
--- PASS: TestAccAWSLambdaFunction_s3 (35.28s)
--- PASS: TestAccAWSLambdaFunction_s3Update_basic (57.89s)
--- PASS: TestAccAWSLambdaFunction_s3Update_unversioned (58.81s)
--- PASS: TestAccAWSLambdaFunction_tags (75.77s)
--- PASS: TestAccAWSLambdaFunction_tracingConfig (55.61s)
--- PASS: TestAccAWSLambdaFunction_updateRuntime (57.19s)
--- PASS: TestAccAWSLambdaFunction_versioned (33.52s)
--- PASS: TestAccAWSLambdaFunction_versionedUpdate (58.25s)
--- PASS: TestAccAWSLambdaFunction_VPC (56.81s)
--- PASS: TestAccAWSLambdaFunction_VPC_withInvocation (86.81s)
--- PASS: TestAccAWSLambdaFunction_VpcConfig_ProperIamDependencies (42.99s)
--- PASS: TestAccAWSLambdaFunction_VPCRemoval (80.28s)
--- PASS: TestAccAWSLambdaFunction_VPCUpdate (81.84s)

--- PASS: TestAccAWSSecurityGroup_basic (10.14s)
--- PASS: TestAccAWSSecurityGroup_Change (19.36s)
--- PASS: TestAccAWSSecurityGroup_CIDRandGroups (31.78s)
--- PASS: TestAccAWSSecurityGroup_DefaultEgress_Classic (6.53s)
--- PASS: TestAccAWSSecurityGroup_DefaultEgress_VPC (25.29s)
--- PASS: TestAccAWSSecurityGroup_drift (7.55s)
--- PASS: TestAccAWSSecurityGroup_drift_complex (31.62s)
--- PASS: TestAccAWSSecurityGroup_Egress_ConfigMode (23.76s)
--- PASS: TestAccAWSSecurityGroup_egressWithPrefixList (24.51s)
--- PASS: TestAccAWSSecurityGroup_failWithDiffMismatch (12.13s)
--- PASS: TestAccAWSSecurityGroup_forceRevokeRules_false (1228.05s)
--- PASS: TestAccAWSSecurityGroup_forceRevokeRules_true (1242.70s)
--- PASS: TestAccAWSSecurityGroup_generatedName (25.26s)
--- PASS: TestAccAWSSecurityGroup_importBasic (12.91s)
--- PASS: TestAccAWSSecurityGroup_importIPRangeAndSecurityGroupWithSameRules (14.68s)
--- PASS: TestAccAWSSecurityGroup_importIPRangesWithSameRules (12.19s)
--- PASS: TestAccAWSSecurityGroup_importIpv6 (30.08s)
--- PASS: TestAccAWSSecurityGroup_importPrefixList (25.01s)
--- PASS: TestAccAWSSecurityGroup_importSelf (31.64s)
--- PASS: TestAccAWSSecurityGroup_importSourceSecurityGroup (30.19s)
--- PASS: TestAccAWSSecurityGroup_Ingress_ConfigMode (23.47s)
--- PASS: TestAccAWSSecurityGroup_ingressWithCidrAndSGs (31.60s)
--- PASS: TestAccAWSSecurityGroup_ingressWithCidrAndSGs_classic (9.86s)
--- PASS: TestAccAWSSecurityGroup_ingressWithPrefixList (44.12s)
--- PASS: TestAccAWSSecurityGroup_invalidCIDRBlock (1.28s)
--- PASS: TestAccAWSSecurityGroup_ipv4andipv6Egress (11.90s)
--- PASS: TestAccAWSSecurityGroup_ipv6 (12.77s)
--- PASS: TestAccAWSSecurityGroup_MultiIngress (12.33s)
--- PASS: TestAccAWSSecurityGroup_namePrefix (6.47s)
--- PASS: TestAccAWSSecurityGroup_RuleDescription (26.52s)
--- PASS: TestAccAWSSecurityGroup_ruleGathering (24.55s)
--- PASS: TestAccAWSSecurityGroup_ruleLimitCidrBlockExceededAppend (48.89s)
--- PASS: TestAccAWSSecurityGroup_ruleLimitExceededAllNew (53.89s)
--- PASS: TestAccAWSSecurityGroup_ruleLimitExceededAppend (50.48s)
--- PASS: TestAccAWSSecurityGroup_ruleLimitExceededPrepend (54.09s)
--- PASS: TestAccAWSSecurityGroup_rulesDropOnError (22.40s)
--- PASS: TestAccAWSSecurityGroup_self (11.93s)
--- PASS: TestAccAWSSecurityGroup_tags (40.86s)
--- PASS: TestAccAWSSecurityGroup_vpc (10.39s)
--- PASS: TestAccAWSSecurityGroup_vpcNegOneIngress (10.55s)
--- PASS: TestAccAWSSecurityGroup_vpcProtoNumIngress (11.84s)

--- PASS: TestAccAWSSubnet_availabilityZoneId (26.56s)
--- PASS: TestAccAWSSubnet_basic (26.69s)
--- PASS: TestAccAWSSubnet_enableIpv6 (42.97s)
--- PASS: TestAccAWSSubnet_ipv6 (69.30s)
```
@ghost ghost added the service/iam Issues and PRs that pertain to the iam service. label Oct 3, 2019
@bflad bflad removed the service/iam Issues and PRs that pertain to the iam service. label Oct 3, 2019
@ghost ghost added the service/iam Issues and PRs that pertain to the iam service. label Oct 3, 2019
@bflad
Copy link
Contributor Author

bflad commented Oct 3, 2019

@richardgavel I tried to add a little more detail in the issue description about the IAM Role permissions issue and Lambda Hyperplane ENIs. I also left some notes on the AWS CLI process for manually finding/deleting Lambda Hyperplane ENIs, if necessary. Please let me know if anything is still unclear or if you are still noticing issues not covered above.

@bflad bflad changed the title Improve Lambda Associated EC2 Subnet and Security Group Deletion Time Lambda Associated EC2 Subnet and Security Group Deletion Issues and Improvements Oct 3, 2019
@bflad bflad removed the service/iam Issues and PRs that pertain to the iam service. label Oct 3, 2019
@richardgavel
Copy link

@bflad Before I saw this, I was exploring another option, which was a provisioner { when = "destroy" } script on the security group that would simply detach the security group from the ENI, removing the dependency while not having to worry about ENI removal timeframes. Do you think this is something I should continue to explore (many of our teams are still on version 1.x of the provider so we need some due diligence before upgrading to a provider version that has breaking changes)?

@bflad
Copy link
Contributor Author

bflad commented Oct 3, 2019

@richardgavel the practice of using provisioners is discouraged (documentation fairly recently added) and destroy provisioners in general tend to be more problematic/buggy in Terraform than creation provisioners. They may work as expected in your scenario, however I think avoiding them if possible is preferred.

@ewbankkit
Copy link
Contributor

AWS Compute blog post:

@MarvinChen003
Copy link

The same issue is happening again, but it is slightly different.

Terraform Version

terraform_version=0.12.19
provider_aws_version=2.43.0
provider_null_version=2.1.2
provider_template_version=2.1.2
provider_archive_version=1.3.0
provider_tls_version=2.1.1

Terraform Configuration Files

resource aws_lambda_function moogsoft_alerting {
  ...
  function_name    = "xxx"
  ... 

  vpc_config {
    security_group_ids = [aws_security_group.moogsoft_alerting_sg.id]
    subnet_ids         = local.private_subnet_ids
  }
}

resource aws_security_group moogsoft_alerting_sg {
  ...
  vpc_id      = [xxx]
}

resource aws_subnet private_subnet {
  vpc_id            = aws_vpc.vpc.id
  cidr_block        = [xxx]
  availability_zone = [xxx]
}

Expected Behavior

The VPC attached lambda, ENI, security groups and subnets should all be destroyed. The ENIs take a long time to delete and terraform should wait before deleting security groups and subnets.

Actual Behavior

From time to time, one of the security groups OR one of the subnets can not be destroyed which leaves terraform in an inconsistent state.

Error with security group

mmodule.health_check_agent.aws_security_group.health_check_agent_sg[0]: Still destroying... [id=sg-048867b1721b2ee0b, 5m30s elapsed]·[0m·[0m
...
mmodule.health_check_agent.aws_security_group.health_check_agent_sg[0]: Still destroying... [id=sg-048867b1721b2ee0b, 15m10s elapsed]·[0m·[0m
...
1mmodule.health_check_agent.aws_security_group.health_check_agent_sg[0]: Still destroying... [id=sg-048867b1721b2ee0b, 24m0s elapsed]·[0m·[0m
...
1mmodule.networking.aws_subnet.private_subnet[1]: Still destroying... [id=subnet-011055026fce20f52, 28m0s elapsed]·[0m·[0m
1mmodule.networking.aws_subnet.private_subnet[1]: Still destroying... [id=subnet-011055026fce20f52, 28m10s elapsed]·[0m·[0m
1mmodule.networking.aws_subnet.private_subnet[1]: Destruction complete after 28m17s·[0m·[0m
Error: ·[0m·[0m·[1merror deleting Lambda ENIs using Security Group (sg-0ce5ea1b233712f5f): error detaching Lambda ENI (eni-0b2f852b8ddb9eca9): error detaching ENI (eni-0b2f852b8ddb9eca9): IncorrectState: The instance is not in a valid state for this operation.
status code: 400, request id: b72d747e-bbe7-455f-a9b2-82001f5a6a04·[0m
Error: ·[0m·[0m·[1merror deleting Lambda ENIs using Security Group (sg-048867b1721b2ee0b): error deleting Lambda ENI (eni-0693a5ba589e136cb): error deleting ENI (eni-0693a5ba589e136cb): InvalidParameterValue: Network interface 'eni-0693a5ba589e136cb' is currently in use.
tatus code: 400, request id: d67d3a5b-96fa-490e-b76c-86cbf406adce·[0m

Error with subnet

module.networking.aws_subnet.private_subnet[2]: Still destroying... [id=subnet-0d969b793d450a291, 40s elapsed]·[0m·[0m
...
module.networking.aws_subnet.private_subnet[2]: Still destroying... [id=subnet-0d969b793d450a291, 15m40s elapsed]·[0m·[0m
...
module.networking.aws_subnet.private_subnet[2]: Still destroying... [id=subnet-0d969b793d450a291, 16m40s elapsed]·[0m·[0m
...
[1mmodule.moogsoft_alerting.aws_security_group.moogsoft_alerting_sg: Still destroying... [id=sg-0a3eb7f09729871ed, 29m30s elapsed]·[0m·[0m[1mmodule.moogsoft_alerting.aws_security_group.moogsoft_alerting_sg: Still destroying... [id=sg-0a3eb7f09729871ed, 29m40s elapsed]·[0m·[0m
[1mmodule.moogsoft_alerting.aws_security_group.moogsoft_alerting_sg: Destruction complete after 29m49s·[0m·[0m

###merror deleting Lambda ENIs using subnet (subnet-0d969b793d450a291): error deleting Lambda ENI (eni-082e57ec7b8e3b66a): error deleting ENI (eni-082e57ec7b8e3b66a): InvalidParameterValue: Network interface 'eni-082e57ec7b8e3b66a' is currently in use.
 code: 400, request id: 13f90dff-2a51-401d-a9b6-89191a364b0d·[0m

Retry fails

Terraform is in an inconsistent state and won't run because of error:

Error: leftover module module.xxx in state that should have been removed; this is a bug in Terraform and should be reported

Steps to Reproduce

terraform destroy

@bflad
Copy link
Contributor Author

bflad commented Jan 22, 2020

For this particular error:

Error: leftover module module.xxx in state that should have been removed; this is a bug in Terraform and should be reported

I believe this was addressed upstream in hashicorp/terraform#23821 and will release with Terraform version 0.12.20.

@Mjb141
Copy link

Mjb141 commented Jan 28, 2020

Is there anywhere that progress on this issue can be tracked, aside from this issue? The note is still present at https://www.terraform.io/docs/providers/aws/r/lambda_function.html and it's still a very real barrier to using Terraform.

@kotfic
Copy link

kotfic commented Feb 6, 2020

FWIW I am seeing this issue related to lambda's that are in a security group that allows access to an EMR cluster. The security group fails to delete because it is attached to the lambda ein during the extended destruction time. If it were possible to detach all security groups from the ein before destruction that would resolve my issue. Not to muddy the waters, but any thoughts on how this might be achieved?

@schollii
Copy link

schollii commented Feb 8, 2020

Is there a temporary workaround to avoid this problem or is it currently not possible to use tf to provision lambdas (because destroy will corrupt state)?

@jufemaiz
Copy link
Contributor

jufemaiz commented May 1, 2020

The ENI impact isn't just on a terraform destroy but also on anything that causes a security group to be destroyed and recreated.

We're currently trying to get serverless working with Terraform nicely and this is another sticking point.

I'm not sure if this would cause a similar issue in Terraform but given the above I am included to believe it is the case.

@xdays
Copy link

xdays commented May 29, 2020

we also use terraform and serverless framework together, I'm afriad there's no elegant method as terraform has no idea of who depends on the security group. my current solution is:

  1. use terraform apply to create a new security group
  2. change security group in serverless.yml and deploy lamba fucntion
  3. use terraform destroy to delete the old security group.

@jufemaiz
Copy link
Contributor

@xdays I think you're right with the blue-green security group approach. It's a little disappointing but c'est la vie :(

We may end up shifting back to full terraform. A module for local development isn't the end of developer happiness.

@schollii
Copy link

schollii commented May 30, 2020

The page at https://www.serverless.com/framework/docs/providers/aws/guide/variables/ documents several ways that serverless.yml can be parameterized.

A couple that seem promising are additional yml input files, and reading from json/yaml files. It might be possible therefore to add a local file element in tf file that generates that additional config file to contain the security group to use, and a local exec to call sls. This could allow some degree of automation for the workaround of first switching lambdas to use a new security group, followed by destroying the old one once switch complete.

It would be even nicer if one of the parametrization methods supported by serverless were terraform outputs. It already supports cloudformation as input so may not be all that hard to patch serverless and submit PR.

@jefftucker
Copy link

The issue happens with Transit Gateway attachments too. Any subnets included in a Transit Gateway Attachment will also have ENIs that prevent the subnet from deleting. Terraform correctly identifies that the attachment needs to be modified, but it probably can't delete the ENIs since I can't delete them manually from the console either (AWS gives me an error that I can't manage that type of ENI or something similar). The only way to make it work is to first detach the transit gateway attachment, which will remove the ENIs for each subnet. As soon as I did that, deleting the subnets was pretty much instantaneous. I suspect that this is a bug in Terraform's dependency graph and that if it detects that it's changing the subnets used in a aws_ec2_transit_gateway_vpc_attachment resource due to a deletion of any of those subnets, then the attachment itself should also be tainted and forced to re-create and the ordering should be that the transit gateway attachment is deleted first, followed by the subnets.

@anGie44 anGie44 unpinned this issue Nov 12, 2020
@wes-novack
Copy link

wes-novack commented Jan 15, 2021

I'm still seeing intermittent errors/failures occur when attempting to terraform destroy AWS Lambda functions and their associated security_group resources using Terraform 0.12.29 and terraform-provider-aws_v3.24.0_x5

This only happens some of the times, but it is highly frustrating, as we can't rely on our pipeline to destroy things and clean things up every time, as it should.

I have added the following attributes to the security_group resource definition in Terraform:

timeouts {
  delete = "45m"
}

That delete timeout attribute has not resolved the problem, as our latest failure occurred after ~6.5 minutes of attempting to destroy the security_group:

2021/01/15 17:33:24 [ERROR] module.ps_vault_dotnet_acceptance_tests_app_sg: eval: *terraform.EvalApplyPost, err: error deleting Lambda ENIs using Security Group (sg-<redacted>): error deleting Lambda ENI (eni-<redacted>): error deleting ENI (eni-<redacted>): UnauthorizedOperation: You are not authorized to perform this operation. Encoded authorization failure message: <redacted>
	status code: 403, request id: a85da372-be6c-4018-8bce-d7ded509c948
2021/01/15 17:33:24 [ERROR] module.ps_vault_dotnet_acceptance_tests_app_sg: eval: *terraform.EvalSequence, err: error deleting Lambda ENIs using Security Group (sg-<redacted>): error deleting Lambda ENI (eni-<redacted>): error deleting ENI (eni-<redacted>): UnauthorizedOperation: You are not authorized to perform this operation. Encoded authorization failure message: <redacted>
	status code: 403, request id: a85da372-be6c-4018-8bce-d7ded509c948
2021/01/15 17:33:24 [ERROR] module.ps_vault_dotnet_acceptance_tests_app_sg: eval: *terraform.EvalOpFilter, err: error deleting Lambda ENIs using Security Group (sg-<redacted>): error deleting Lambda ENI (eni-<redacted>): error deleting ENI (eni-<redacted>): UnauthorizedOperation: You are not authorized to perform this operation. Encoded authorization failure message: <redacted>
	status code: 403, request id: a85da372-be6c-4018-8bce-d7ded509c948
2021/01/15 17:33:24 [TRACE] [walkApply] Exiting eval tree: module.ps_vault_dotnet_acceptance_tests_app_sg.aws_security_group.sg (destroy)
2021/01/15 17:33:24 [TRACE] vertex "module.ps_vault_dotnet_acceptance_tests_app_sg.aws_security_group.sg (destroy)": visit complete
2021/01/15 17:33:24 [TRACE] dag/walk: upstream of "provider.aws (close)" errored, so skipping
2021/01/15 17:33:24 [TRACE] dag/walk: upstream of "data.terraform_remote_state.vpc (destroy)" errored, so skipping
2021/01/15 17:33:24 [TRACE] dag/walk: upstream of "provider.terraform (close)" errored, so skipping
2021/01/15 17:33:24 [TRACE] dag/walk: upstream of "module.ps_vault_dotnet_acceptance_tests_app_sg.output.id" errored, so skipping
2021/01/15 17:33:24 [TRACE] dag/walk: upstream of "output.sg_is_app_ps_vault_dotnet_acceptance_tests_id" errored, so skipping
2021/01/15 17:33:24 [TRACE] dag/walk: upstream of "meta.count-boundary (EachMode fixup)" errored, so skipping
2021/01/15 17:33:24 [TRACE] dag/walk: upstream of "root" errored, so skipping
�[31m
2021-01-15T17:33:25.355Z [WARN]  plugin.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = transport is closing"
�[1m�[31mError: �[0m�[0m�[1merror deleting Lambda ENIs using Security Group (sg-<redacted>): error deleting Lambda ENI (eni-<redacted>): error deleting ENI (eni-<redacted>): UnauthorizedOperation: You are not authorized to perform this operation. Encoded authorization failure message: <redacted>
	status code: 403, request id: a85da372-be6c-4018-8bce-d7ded509c948�[0m

Are there any other suggested solutions for this issue?

@rajaie-sg
Copy link

Seeing this issue when attempting to delete a security group that was never attached to a Lambda function, only an RDS instance.

@cgomestw
Copy link

cgomestw commented May 1, 2021

I did this hack to change the Security Group from NICs attached in older Lambda's versions, although it doesn't solve the problem maybe could help.

data "aws_network_interfaces" "lambda_nic" {
  filter {
    name   = "group-name"
    values = ["secg-${var.name}-${var.environment}*","${local.security_group_name}*"]
  }
}

resource "null_resource" "changeSG" {
  for_each = length(data.aws_network_interfaces.lambda_nic.ids) > 0 ? toset(data.aws_network_interfaces.lambda_nic.ids) : []

  provisioner "local-exec" {
    command = "aws ec2 modify-network-interface-attribute --network-interface-id ${each.value} --groups ${aws_security_group.sg[0].id} --region ${var.region}"
  }

  triggers = {
    aws_security_group = aws_security_group.sg[0].id
  }

  depends_on = [aws_security_group.sg]
}

resource "aws_security_group" "sg" {
  count = var.vpc_less ? 0 : 1

  name        = "${local.security_group_name}-lambda"
  description = "Security Group created for ${local.security_group_name}"
  vpc_id      = data.aws_vpc.vpc[0].id

  tags = local.tags

  lifecycle {
    create_before_destroy = true
  }

  depends_on = [aws_iam_role_policy_attachment.role_policy,aws_iam_policy.role_policy,aws_iam_role_policy_attachment.lamba_exec_role_eni]
}

@hiroshi-nakagoe
Copy link

I still got this issue. This is just migration from @richardgavel solution but with bash. My workaround made my resource destroy from 1 hour to 1 sec.

resource "null_resource" "assign_default_sg" {
  triggers = {
    sg       = aws_security_group.this.id
    func     = aws_lambda_function.this.id
    vpc_name = "${var.vpc_name}"
    sg_name  = "${var.sg_name}"
  }

  provisioner "local-exec" {
    when    = destroy
    command = "/bin/bash /path/to/update-lambda-sg.sh ${self.triggers.vpc} ${self.triggers.sg}"
  }
}

Here is update-lambda-sg.sh.

#!/bin/bash

VPC_NAME=$1
SG_NAME=$2

VPC=$(aws ec2 describe-vpcs --filters Name=tag:Name,Values=${VPC_NAME} --query 'Vpcs[0].VpcId')
SG=$(aws ec2 describe-security-groups --filters Name=description,Values='default VPC security group' Name=vpc-id,Values=${VPC} --query 'SecurityGroups[0].GroupId')
ENIS=$(aws ec2 describe-network-interfaces --filters Name=group-name,Values=${SG_NAME} --query 'NetworkInterfaces[*].NetworkInterfaceId')

enis=$(echo ${ENIS} | jq -c -r '.[]' | tr '\n' ' ')
SG=$(echo ${SG} | jq -r)

# change security group to default
for item in ${enis}; do
  aws ec2 modify-network-interface-attribute --network-interface-id ${item} --groups ${SG}
done

echo detached ${enis} from ${SG}

@debugger24
Copy link

Hi @hiroshi-nakagoe

In my case, the lambda function and sg are destroyed instantly as expected.

But those ENI are not deleted and left with the status Available.

@kapilt
Copy link

kapilt commented Apr 20, 2022

@hiroshi-nakagoe thanks for sharing, i made a few minor tweaks, but can confirm this takes from 1hr+ waiting to a few seconds, at least for app security group removal.

  • drop jq and tr usage via using builtin aws cli output formatting
  • allow for optional specifying of target sg (default sg doesn't always exist in some envs)
  • use an interface-type filter to only target enis that are from lambda, this requires a newer aws cli (i'm on 2.5.6)
  • use env to locate bash for better portability

@debugger24 its really the lambda service (or behind the scenes the aws hyperplane) that's allocating and deallocating the enis, i've observed removal times from minutes to hrs, to fully orphaned (aka never deleted). this isn't really a terraform issue, but an aws one. its not clear the original change to move the timeout here to 45m is being observed by any sla on the aws side, versus some point in time guarantee.

#!/usr/bin/env bash

# we get the default group for the given vpc
VPC_ID=$1

# source security group that we will remove from enis
SOURCE_SG_ID=$2

# optional - target security group that we will move the enis to, if not specified use default sg
TARGET_SG_ID=$3

if [ -z $TARGET_SG_ID ]; then
    TARGET_SG_ID=$(aws ec2 describe-security-groups \
		     --filters Name=description,Values='default VPC security group' \
		     Name=vpc-id,Values=${VPC_ID} \
		     --output text \
		     --query 'SecurityGroups[0].GroupId')
fi

enis=$(aws ec2 describe-network-interfaces \
	   --filters Name=group-id,Values=${SOURCE_SG_ID} Name=interface-type,Values=lambda \
	   --output text \
	   --query 'NetworkInterfaces[*].NetworkInterfaceId')

# Change security groups on the lambda nics
for item in ${enis}; do
  aws ec2 modify-network-interface-attribute --network-interface-id ${item} --groups ${TARGET_SG_ID}
done

echo detached ${enis} from ${SOURCE_SG_ID} to ${TARGET_SG_ID}

@stasrelevanceai
Copy link

Still have this issue, but thanks for the workaround, guys. Works like a charm. Much appreciated.

garutilorenzo added a commit to garutilorenzo/k3s-aws-terraform-cluster that referenced this issue Oct 14, 2022
## New features

* Added EFS for persistent storage via [AWS EFS csi driver](https://github.com/kubernetes-sigs/aws-efs-csi-driver).
* Added the possibility to expose the kubeapi server to the internet. Access is granted only from *my_public_ip_cidr* for security reasons.
* Added Lambda funcion to automatically delete removed nodes from k3s cluster (EC2 spot instances)
* Added *null_resource* to fix Lambda orphan ENI. [Ref.](hashicorp/terraform-provider-aws#10329)

## Minor changes

* Fixed resource tags/names
* Renamed *agent* resources to *worker* resources
* Added SSM Policy to access EC2 from AWS console

## Bug fixes

* Added lifecycle on k3s-workers
* Fix allow_strict security group deletion on apply
@gdavison
Copy link
Contributor

gdavison commented Nov 1, 2022

Implementation notes: determine which ENIs are associated with the Lambda Function using tfec2.FindNetworkInterfacesByAttachmentInstanceOwnerIDAndDescription and detach them using tfec2.DetachNetworkInterface

@ash-murphy-colibri
Copy link

This is still an issue, ENIs hanging around and blocking subnet from being deleted

@jamengual
Copy link

same issue here....

@cringdahl
Copy link

Is there no way for Terraform to issue the ENI delete request to AWS, then just call it done and gone from a state perspective? It's a hack, sure, but dealing with this asynchronously would save us all a lot of time.

@ascopes
Copy link

ascopes commented Feb 6, 2023

How does CloudFormation deal with this same issue? Does CF have to wait over an hour for deletion of a Lambda for the same reason?

@jar-b
Copy link
Member

jar-b commented Feb 10, 2023

v4.54.0 of the AWS Provider includes a pair of new Lambda function attributes which may reduce security group deletion times under certain conditions. See #29289 for a complete write up and limitations.

resource "aws_lambda_function" "example" {
  # ... other configuration ...

  # by itself, this attribute will replace all security groups defined in vpc_config.security_group_ids
  # with the default security group after the function is destroyed
  replace_security_groups_on_destroy = true

  # if this attribute is configured, these groups are used instead of the default security group
  replacement_security_group_ids = ["sg-1234"]

  vpc_config {
    subnet_ids         = [aws_subnet.example.id]
    security_group_ids = [aws_security_group.example.id]
  }
}

As this issue was originally intended to provide guidance for pre-v2.30.0 configurations and relevant updates to the AWS Lambda service itself, we feel it has now served that purpose and can be closed. For bug reports or new feature requests, please open a new issue referring back to this one as necessary. Thank you!

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement Requests to existing resources that expand the functionality or scope. service/ec2 Issues and PRs that pertain to the ec2 service. service/lambda Issues and PRs that pertain to the lambda service. upstream Addresses functionality related to the cloud provider.
Projects
None yet
Development

No branches or pull requests