Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DescribeClientVpnTargetNetworks #14380

Closed
wants to merge 2 commits into from

Conversation

nijave
Copy link
Contributor

@nijave nijave commented Jul 29, 2020

Amazon API doesn't return the correct results when using
"AssociationIds" but works correctly when using a filter for the same
thing

Community Note

  • Please vote on this pull request by adding a 👍 reaction to the original pull request comment to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for pull request followers and do not help prioritize the request

Closes #11586

Release note for CHANGELOG:

Fix Client Access VPN network association refresh

Output from acceptance testing:

$ make testacc TESTARGS='-run=TestAccAwsEc2ClientVpn_serial/NetworkAssociation'
==> Checking that code complies with gofmt requirements...
TF_ACC=1 go test ./aws -v -count 1 -parallel 20 -run=TestAccAwsEc2ClientVpn_serial/NetworkAssociation -timeout 120m
=== RUN   TestAccAwsEc2ClientVpn_serial
=== PAUSE TestAccAwsEc2ClientVpn_serial
=== CONT  TestAccAwsEc2ClientVpn_serial
=== RUN   TestAccAwsEc2ClientVpn_serial/NetworkAssociation_basic
=== PAUSE TestAccAwsEc2ClientVpn_serial/NetworkAssociation_basic
=== RUN   TestAccAwsEc2ClientVpn_serial/NetworkAssociation_disappears
=== PAUSE TestAccAwsEc2ClientVpn_serial/NetworkAssociation_disappears
=== RUN   TestAccAwsEc2ClientVpn_serial/NetworkAssociation_securityGroups
=== PAUSE TestAccAwsEc2ClientVpn_serial/NetworkAssociation_securityGroups
=== CONT  TestAccAwsEc2ClientVpn_serial/NetworkAssociation_basic
=== CONT  TestAccAwsEc2ClientVpn_serial/NetworkAssociation_securityGroups
=== CONT  TestAccAwsEc2ClientVpn_serial/NetworkAssociation_disappears
    resource_aws_ec2_client_vpn_network_association_test.go:148: [INFO] Got non-empty plan, as expected
--- PASS: TestAccAwsEc2ClientVpn_serial (0.86s)
    --- PASS: TestAccAwsEc2ClientVpn_serial/NetworkAssociation_securityGroups (613.89s)
    --- PASS: TestAccAwsEc2ClientVpn_serial/NetworkAssociation_basic (755.98s)
    --- PASS: TestAccAwsEc2ClientVpn_serial/NetworkAssociation_disappears (832.45s)
PASS
ok      github.com/terraform-providers/terraform-provider-aws/aws       833.015s

...

We submitted an AWS ticket about this back in February and they said "Use filter attribute". I've reopened it and will update if they admit this is a bug

@nijave nijave requested a review from a team July 29, 2020 17:20
@ghost ghost added size/XS Managed by automation to categorize the size of a PR. service/ec2 Issues and PRs that pertain to the ec2 service. needs-triage Waiting for first response or review from a maintainer. labels Jul 29, 2020
@borisroman
Copy link
Contributor

This also relates to #9585

@borisroman
Copy link
Contributor

Hi @nijave

Thanks for the PR! Can you update this PR to resolve the conflict and also include the call to DescribeClientVpnTargetNetworks in (https://github.com/terraform-providers/terraform-provider-aws/blob/master/aws/internal/service/ec2/waiter/status.go#L121)?

@nijave nijave force-pushed the nv-fix-access-vpn-assoc-lookup branch from aa06a04 to c01c03c Compare September 10, 2020 23:35
@ghost ghost added dependencies Used to indicate dependency changes. size/S Managed by automation to categorize the size of a PR. and removed size/XS Managed by automation to categorize the size of a PR. labels Sep 10, 2020
@nijave nijave force-pushed the nv-fix-access-vpn-assoc-lookup branch from c01c03c to 58d0084 Compare September 10, 2020 23:39
@ghost ghost added size/XS Managed by automation to categorize the size of a PR. and removed size/S Managed by automation to categorize the size of a PR. labels Sep 10, 2020
@nijave nijave force-pushed the nv-fix-access-vpn-assoc-lookup branch from 58d0084 to 2596fb5 Compare September 10, 2020 23:43
@ghost ghost added the tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure. label Sep 10, 2020
Amazon API doesn't return the correct results when using
"AssociationIds" but works correctly when using a filter for the same
thing
@nijave nijave force-pushed the nv-fix-access-vpn-assoc-lookup branch from 2596fb5 to a266e25 Compare September 10, 2020 23:44
@nijave
Copy link
Contributor Author

nijave commented Sep 10, 2020

I think probably the integration test should be updated here to include two subnets instead of a single one and maybe that would weed out the issue?

@ghost ghost added size/M Managed by automation to categorize the size of a PR. and removed size/XS Managed by automation to categorize the size of a PR. labels Sep 11, 2020
@nijave
Copy link
Contributor Author

nijave commented Sep 11, 2020

Hmm don't think the test is working right since it still seems to pass without the changes

@bflad bflad removed the dependencies Used to indicate dependency changes. label Oct 2, 2020
@caphrim007
Copy link

caphrim007 commented Oct 22, 2020

@gdavison @anGie44 @bflad this bug effectively makes it not possible to use multiple subnet associations. there are several issues open that are related to this bug, and the fix is straight-forward (as proposed by @nijave ).

I have independently compiled and verified that the proposed patch works. Would you be able to review and merge this?

Before the patch, TF would report a change on an existing resource which was imported

# module.vpn.aws_ec2_client_vpn_network_association.this["subnet-e41194c9"] must be replaced
-/+ resource "aws_ec2_client_vpn_network_association" "this" {
      ~ association_id         = "cvpn-assoc-020e45fccc33e866f" -> (known after apply)
        client_vpn_endpoint_id = "cvpn-endpoint-00602a889f4647142"
      ~ id                     = "cvpn-assoc-09f5c539655b051a0" -> (known after apply)
      ~ security_groups        = [
          - "sg-0a0c33042a7758338",
        ] -> (known after apply)
      ~ status                 = "associated" -> (known after apply)
      ~ subnet_id              = "subnet-3d1da366" -> "subnet-e41194c9" # forces replacement
      ~ vpc_id                 = "vpc-ff1ae199" -> (known after apply)
    }

This is because the provider assumes there will be a single network association returned

https://github.com/terraform-providers/terraform-provider-aws/blob/master/aws/resource_aws_ec2_client_vpn_network_association.go#L141

If there is more than 1 subnet, the first subnet will only ever be "found" and imported into the state. All other subnets will then show in the (subsequent) TF plans as changing this.

Filtering constrains the returned list to a single item, and the provisioner imports correctly once again.

After removing the instances of broken state and re-importing with the patched version of the provider, the state is now accurate and shows 2 associations instead of 1 broken association

          "index_key": "subnet-3d1da366",
          "attributes": {
            "association_id": "cvpn-assoc-020e45fccc33e866f",
            "client_vpn_endpoint_id": "cvpn-endpoint-00602a889f4647142",
            "id": "cvpn-assoc-020e45fccc33e866f",
            "security_groups": [
              "sg-0a0c33042a7758338"
            ],
            "status": "associated",
            "subnet_id": "subnet-3d1da366",
            "vpc_id": "vpc-ff1ae199"
          },
          "private": "..."
        },
        {
          "index_key": "subnet-e41194c9",
          "schema_version": 0,
          "attributes": {
            "association_id": "cvpn-assoc-09f5c539655b051a0",
            "client_vpn_endpoint_id": "cvpn-endpoint-00602a889f4647142",
            "id": "cvpn-assoc-09f5c539655b051a0",
            "security_groups": [
              "sg-0a0c33042a7758338"
            ],
            "status": "associated",
            "subnet_id": "subnet-e41194c9",
            "vpc_id": "vpc-ff1ae199"
          },

@DOboznyi
Copy link

Hey, I asked support team bout this issue and they also said to use filters. They also opened an internal ticket for fix this issue in API but has no ETA.

@Jorge-Rodriguez
Copy link

@nijave @DOboznyi are the issues you've posted with AWS public? Could you link them here for reference?

@nijave
Copy link
Contributor Author

nijave commented Dec 27, 2020

@nijave @DOboznyi are the issues you've posted with AWS public? Could you link them here for reference?

Unfortunately not, it was through AWS Support (we have a support contract)

@nijave
Copy link
Contributor Author

nijave commented Dec 27, 2020

I kind of expect they'll just ignore it. Afaik we hit the same issue with ruby aws-sdk and it appeared the API wasn't working correctly https://github.com/aws/aws-sdk-go/issues/3713

@caphrim007
Copy link

caphrim007 commented Jan 8, 2021

Would those on this thread please check to see if this is still an issue? My recent tests are not seeing this issue in either us-east-1 or sa-east-1, and I don't remember seeing a similar patch merged. Not sure if I might have forgotten my original tests which I used when posting my first comment

@moserke
Copy link

moserke commented Jan 11, 2021

FWIW this does seem to be working for me now. A resource like

resource "aws_ec2_client_vpn_network_association" "example" {
  for_each               = data.aws_subnet_ids.private.ids
  client_vpn_endpoint_id = aws_ec2_client_vpn_endpoint.example.id
  subnet_id              = each.value
  security_groups        = [aws_security_group.vpn-ingress.id]
}

Is functioning as I would expect it to this morning. This is in us-west-2 for me. This configuration previously would not work for me as it would constantly want to delete and recreate the associations.

@caphrim007
Copy link

Is functioning as I would expect it to this morning. This is in us-west-2 for me. This configuration previously would not work for me as it would constantly want to delete and recreate the associations.

My AWS reps said that this had been fixed in 2 regions, which is what prompted my follow-up. They said it would be rolling out world-wide by end of last/this week. So if it's not yet fixed in a particular region, suggest folks on thread re-try until around the 15th.

@nijave
Copy link
Contributor Author

nijave commented Jan 11, 2021 via email

@caphrim007
Copy link

caphrim007 commented Jan 11, 2021

Do you know what regions?

sa-east-1 and us-east-1 per an email thread i had with them end of last week

@caphrim007
Copy link

caphrim007 commented Jan 11, 2021

but @moserke sees it working in us-west-2 so it must have been rolled out even further

@nijave
Copy link
Contributor Author

nijave commented Jan 11, 2021

but @moserke sees it working in us-west-2 so it must have been rolled out even further

I asked our TAMs if they can provide more info. Afaik they have some sort of internal issue tracker they can check and attach impacted customers to

@dmerrick
Copy link

I'm still experiencing this issue on us-east-1

@caphrim007
Copy link

@dmerrick weird. I wasn't seeing the issue on us-east-1. Can you use the AWS cli to see if you get incorrect data returned?

If you have more than 1 association in your VPN, the following command will return all of them if its still broken. A fixed command will only return 1 association (the one you specify)

aws ec2 describe-client-vpn-target-networks --client-vpn-endpoint-id my-endpoint-id --association-ids my-association-id

@nijave
Copy link
Contributor Author

nijave commented Jan 11, 2021

Here's a simple bash script you can use to see if it's returning the same items (set endpoint to your vpn endpoint id first)

for id in $(aws ec2 describe-client-vpn-target-networks --client-vpn-endpoint-id $endpoint | jq -r ".ClientVpnTargetNetworks[].AssociationId"); do 
printf "%s %s\n" $id $(aws ec2 describe-client-vpn-target-networks --client-vpn-endpoint-id $endpoint --association-ids $id | jq -r ".ClientVpnTargetNetworks[0].AssociationId");
done

It outputs like

expected-id api-result

@dmerrick
Copy link

$ aws ec2 describe-client-vpn-target-networks --client-vpn-endpoint-id my-endpoint-id --association-ids my-association-id
{
    "ClientVpnTargetNetworks": [
        {
            "AssociationId": "cvpn-assoc-[redacted]",
            "VpcId": "vpc-[redacted]",
            "TargetNetworkId": "subnet-[redacted]",
            "ClientVpnEndpointId": "cvpn-endpoint-[redacted]",
            "Status": {
                "Code": "associated"
            },
            "SecurityGroups": [
                "sg-[redacted]"
            ]
        }
    ]
}

Just the one, which is expected (as I have only one association)

$ for id in $(aws ec2 describe-client-vpn-target-networks --client-vpn-endpoint-id $endpoint | jq -r ".ClientVpnTargetNetworks[].AssociationId"); do 
printf "%s %s\n" $id $(aws ec2 describe-client-vpn-target-networks --client-vpn-endpoint-id $endpoint --association-ids $id | jq -r ".ClientVpnTargetNetworks[0].AssociationId");
done
cvpn-assoc-[redacted] cvpn-assoc-[redacted]

The two associations returned are identical, which I think is what is expected as well.

But when I run terraform apply I get:

Error: Error reading Client VPN network association: InvalidClientVpnAssociationIdNotFound: AssociationId 'cvpn-assoc-[redacted]' does not exist

And indeed, the associationId in that response is not the correct one. Should I terraform state rm the association and terraform import the correct one?

@caphrim007
Copy link

@dmerrick if you can tolerate doing that then yeah I would try.

@dmerrick
Copy link

Aight that was scary but it resolved my issue

@OliverBailey
Copy link

@caphrim007 I've never had an issue with this beforehand, until it cropped up today on eu-west-2 which is then stopping my deployment. Just working on how to bypass this at the moment

@caphrim007
Copy link

@OliverBailey are you saying that it used to work on eu-west-2 and then stopped working?

@OliverBailey
Copy link

Nit entirely sure if it is related to this. I had two associations as such

resource "aws_ec2_client_vpn_endpoint" "redacted_VPN" {
    description = "VPN-for-redacted"
    server_certificate_arn = data.aws_acm_certificate.vpn_server_cert.arn
    client_cidr_block = "192.168.24.0/22"
    dns_servers = ["10.0.0.2"]

    authentication_options {
        type = "certificate-authentication"
        root_certificate_chain_arn = data.aws_acm_certificate.vpn_root_cert.arn
    }

    connection_log_options {
    enabled = false
  }
}

data "aws_acm_certificate" "vpn_server_cert" {
    domain = "server"
    statuses = ["ISSUED"]
}

data "aws_acm_certificate" "vpn_root_cert" {
    domain = "test2.tld"
    statuses = ["ISSUED"]
}

resource "aws_ec2_client_vpn_network_association" "redacted_VPN_network_association" {
  client_vpn_endpoint_id = aws_ec2_client_vpn_endpoint.redacted_VPN.id
  subnet_id = aws_subnet.Prod_private[0].id
  security_groups = [aws_security_group.vpn-sg.id]
}

resource "aws_ec2_client_vpn_network_association" "redacted_VPN_network_association-1" {
  client_vpn_endpoint_id = aws_ec2_client_vpn_endpoint.redacted_VPN.id
  subnet_id = aws_subnet.Prod_private[1].id
  security_groups = [aws_security_group.vpn-sg.id]
}

Yet one of them somehow managed to completely dissapear, and in turn meant that the entire terraform build failed. Appears that it was the first one redacted_VPN_network_Association which in turn then meant any subsequent runs after the fact failed as the endpoint didn't exist.

Looking through logs, for some reason, one of them decided to destroy itself without any reason/cause and no change within the tfstate or .tf file which I found bizzare.

I had to terraform state rm aws_ec2_client_vpn_network_association.redacted_VPN_network_association
and then re-run the terraform apply via codebuild to have it apply as intended.

Base automatically changed from master to main January 23, 2021 00:58
@breathingdust breathingdust requested a review from a team as a code owner January 23, 2021 00:58
@bill-rich
Copy link
Contributor

Since this was fixed on the AWS side, I have created a new PR based off of this one that just includes the acceptance test. The new PR is #18199. Thanks for the work on this!

@ghost
Copy link

ghost commented Apr 18, 2021

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

@ghost ghost locked as resolved and limited conversation to collaborators Apr 18, 2021
@breathingdust breathingdust removed the needs-triage Waiting for first response or review from a maintainer. label Sep 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
service/ec2 Issues and PRs that pertain to the ec2 service. size/M Managed by automation to categorize the size of a PR. tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Faulty Read of Client VPN Network associations break state