Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NE-1324: E2E tests for Assume Role in Shared VPC Cluster #198

Merged
merged 2 commits into from
Oct 26, 2023

Conversation

gcs278
Copy link
Contributor

@gcs278 gcs278 commented Sep 8, 2023

Add E2E tests for AssumeRole functionality added in #195. It creates a ExternalDNS object with AssumeRole configured inside of a Shared VPC cluster. It then confirms that an example DNS record gets created in a targeted AWS account's hosted zone as well as it queries the DNS record inside the cluster to test end-to-end DNS record functionality.

This E2E test update is also dependent on openshift/release#42894 which adds a new CI test job e2e-aws-shared-vpc-phz-operator to this repo. Without this, this new E2E test will be skipped.

Dependent on #195 and will need rebase when it merges.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 8, 2023
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 8, 2023

@gcs278: This pull request references NE-1324 which is a valid jira issue.

In response to this:

WIP
Dependent on #195 and will need rebase when it merges.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 8, 2023
@openshift-ci openshift-ci bot requested review from candita and knobunc September 8, 2023 23:47
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 8, 2023

@gcs278: This pull request references NE-1324 which is a valid jira issue.

In response to this:

WIP
Dependent on #195 and will need rebase when it merges.

TODO:

  • So the particular case when spec.provider.credentials.name has to be empty is not tested in this PR. Will it be in a follow up e2e PR?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@gcs278 gcs278 force-pushed the shared-vpc-e2e branch 2 times, most recently from e50251b to 9abefdb Compare September 11, 2023 21:41
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 11, 2023

@gcs278: This pull request references NE-1324 which is a valid jira issue.

In response to this:

Add E2E tests for AssumeRole functionality added in #195. It creates a ExternalDNS object with AssumeRole configured inside of a Shared VPC cluster. It then confirms that an example DNS record gets created in a targeted AWS account's hosted zone as well as it queries the DNS record inside the cluster to test end-to-end DNS record functionality.

This E2E test update is also dependent on openshift/release#42894 which adds a new CI test job e2e-aws-shared-vpc-phz-operator to this repo. Without this, this new E2E test will be skipped.

WIP: Dependent on #195 and will need rebase when it merges.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@gcs278 gcs278 force-pushed the shared-vpc-e2e branch 2 times, most recently from 8135714 to 8d93253 Compare September 12, 2023 15:54
@gcs278 gcs278 changed the title [WIP] NE-1324: E2E tests for Assume Role in Shared VPC Cluster NE-1324: E2E tests for Assume Role in Shared VPC Cluster Sep 12, 2023
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 12, 2023
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 12, 2023

@gcs278: This pull request references NE-1324 which is a valid jira issue.

In response to this:

Add E2E tests for AssumeRole functionality added in #195. It creates a ExternalDNS object with AssumeRole configured inside of a Shared VPC cluster. It then confirms that an example DNS record gets created in a targeted AWS account's hosted zone as well as it queries the DNS record inside the cluster to test end-to-end DNS record functionality.

This E2E test update is also dependent on openshift/release#42894 which adds a new CI test job e2e-aws-shared-vpc-phz-operator to this repo. Without this, this new E2E test will be skipped.

Dependent on #195 and will need rebase when it merges.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@alebedev87
Copy link
Contributor

/assign

@gcs278
Copy link
Contributor Author

gcs278 commented Sep 19, 2023

/test e2e-aws-shared-vpc-phz-operator

@gcs278
Copy link
Contributor Author

gcs278 commented Sep 19, 2023

/retest

@gcs278
Copy link
Contributor Author

gcs278 commented Sep 19, 2023

An error occurred (HostedZoneAlreadyExists) when calling the CreateHostedZone operation: A hosted zone has already been created with the specified caller reference.: Seems like the last failed job didn't clean up after itself. Will look into it.

/test e2e-aws-shared-vpc-phz-operator

@gcs278
Copy link
Contributor Author

gcs278 commented Sep 20, 2023

Round 1 - Success:
/test e2e-aws-shared-vpc-phz-operator

@gcs278
Copy link
Contributor Author

gcs278 commented Sep 20, 2023

cluster install fail
/test e2e-azure-operator

@gcs278
Copy link
Contributor Author

gcs278 commented Sep 20, 2023

openshift/release#43517 merged. testing again.
/test e2e-aws-shared-vpc-phz-operator

t.Errorf("Failed to get dns 'cluster': %v\n", err)
}
if dnsConfig.Spec.Platform.AWS == nil || dnsConfig.Spec.Platform.AWS.PrivateZoneIAMRole == "" {
t.Skipf("Test skipped on non-shared-VPC cluster")
Copy link
Contributor

@alebedev87 alebedev87 Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I start a detailed review I would like to discuss the test/code organization.

The SharedVPC/AssumeRole support is different from other test cases. All the other test cases try to fit into the main pattern:

  • create DNZ zone
  • do platform agnostic tests
  • delete DNS zone

In the case of SharedVPC I see some patterns that stand alone:

  • DNZ zone is created by the CI
  • DNZ zone is private
  • Platform dependent - PrivateZoneRole is implemented only for AWS

I see that your code tries to balance in between "be like other test case" and "be specific". From one hand providerTestHelper interface is updated meaning that something is meant for all the providers while from the other hand the new test case is skipped for all the providers but AWS.

To make things more coherent I see 2 approaches:

  1. Make an effort and generalize AssumeRole test case:
  • add helper.buildExternalDNSWithRole method to providerTestHelper interface: AWS would get the role from the cluster's DNS config, other platforms would call the existingbuildExternalDNS().
  • keep the DNS record listing as part of providerTestHelper interface but make the listing against different DNS zones: for AWS - from the other account, for the rest - from the same account.
  • dig probing can be kept as is
  1. Move the new test case into a different file and update CI to run only the new test case in the shared-vpc CI job.

The first approach assumes that SharedVPC/AssumeRole features will come to the other providers too but makes a lot of unnecessary moves . The second approach is not generic but targets precisely the new feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good discussion. A lot of ways to proceed. I struggled a bit with this concept when writing this E2E. Couple of thoughts and questions:

  • Sounds like the core question is whether it could ever be extended to other platforms. I see evidence (Unable to use IAM Service Account on GKE kubernetes-sigs/external-dns#509) that folks are using service accounts to fulfill the assuming of a role in GCP.
    • It's possible that it could be extended, right? Or maybe there is a fundamental design difference that assume-role embedded in External DNS only makes sense in AWS.
  • In addition to testing our new assume role functionality, is it reasonable to test non-assume-role functionality on a Shared VPC cluster?
    • At first, I thought this would a good test case, but the more I think about it, the more redundant it seems. Testing External DNS on a Shared VPC AWS cluster without using the Assume Role is not different than testing it in a non-Shared VPC vanilla AWS cluster.

add helper.buildExternalDNSWithRole method to providerTestHelper interface: AWS would get the role from the cluster's DNS config, other platforms would call the existingbuildExternalDNS().

👍 Except, I think you mean AWS Shared VPC instead of just AWS. Are you implying we should run the existing E2E tests under the AWS Shared VPC test job? It's possible, but messy. We are using a private hosted zone, so we'd need to change the logic to validate the test differently.

keep the DNS record listing as part of providerTestHelper interface but make the listing against different DNS zones: for AWS - from the other account, for the rest - from the same account.

Are you suggesting to treat AWS Shared VPC as it's own platform (or provider)? i.e. add a new awsAssumeRoleHelper with a new file aws_assume_role.go? This could work, but we'd have to turn off the other non-assume-role E2E tests (like I mentioned above) and only run TestExternalDNSAssumeRole (due to private zone limitation).

I think my questions are:

  1. Are you saying we should run the existing E2E tests under the AWS Shared VPC test job? And basically remove TestExternalDNSAssumeRole as a specific test?
  2. Should AWS Shared VPC be treated as a it's own platform? I.e. in
    func initProviderHelper(openshiftCI bool, platformType string) (providerTestHelper, error) {
    switch platformType {
    case string(configv1.AWSPlatformType):
    return newAWSHelper(openshiftCI, kubeClient)
    case string(configv1.AzurePlatformType):
    return newAzureHelper(kubeClient)
    case string(configv1.GCPPlatformType):
    return newGCPHelper(openshiftCI, kubeClient)
    case infobloxDNSProvider:
    return newInfobloxHelper(kubeClient)
    default:
    return nil, fmt.Errorf("unsupported provider: %q", platformType)
    }

Copy link
Contributor

@alebedev87 alebedev87 Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like the core question is whether it could ever be extended to other platforms. I see evidence (Unable to use IAM Service Account on GKE kubernetes-sigs/external-dns#509) that folks are using service accounts to fulfill the assuming of a role in GCP.

It's possible that it could be extended, right? Or maybe there is a fundamental design difference that assume-role embedded in External DNS only makes sense in AWS.

Indeed, GCP seems to provide SharedVPC feature too with a possibility to make cross-projects DNS updates. However the way ExternalDNS is granted the permission seems to be different. The credentials are supposed to be from the other (not cluster's) project, they can be packed as a secret and feed to ExternalDNS through the credentials field. CredentialsRequest cannot be used here as it will provision credentials for the cluster's project. However I don't fully exclude the possibility that AssumeRole API will not be propagated (possibly in a different form) to other providers. I didn't see Azure but even GCP may be somehow adopted if not upstream then in ExtDNS Operator.

Except, I think you mean AWS Shared VPC instead of just AWS. Are you implying we should run the existing E2E tests under the AWS Shared VPC test job? It's possible, but messy. We are using a private hosted zone, so we'd need to change the logic to validate the test differently.

No, I meant the AWS test helper. Only the new test scenario would use the new buildExternalDNSWithRole method. The rest of the scenarios would still be using the zone created the e2e test in TestMain. Even in the current implementation that is what's happening - all the old test cases are still using the public DNS zone from the cluster's account.

Are you saying we should run the existing E2E tests under the AWS Shared VPC test job? And basically remove TestExternalDNSAssumeRole as a specific test?
Should AWS Shared VPC be treated as a it's own platform?

Neither. The CI job you added is AWS platform (from the e2e point of view), it would run the same tests as any other platform. Only TestExternalDNSAssumeRole test case would be different - it will read the private DNZ zone ID and IAM role from the DNSConfig and return ExternalDNS with AssumeRole field filled in. The old AWS CI job would still run the old test plus the new TestExternalDNSAssumeRole but since there will be no private DNS zone ID or role in the DNSConfig the ExternalDNS instance returned will not have AssumeRole field.
So this way the old AWS CI job and the one you added are almost the same except for 1 test case. Redundant but TestExternalDNSAssumeRole will fit into the test battery.

I think that I start to lean more towards the approach where your new CI job runs only 1 test - TestExternalDNSAssumeRole and this test is not even added to operator_test.go file but put into a dedicated "AWS specific" file. TestExternalDNSAssumeRole can be copied as is, it still can use all the utils and constants, it just doesn't need to be part of the framework build in TestMain.

return nil, fmt.Errorf("failed to create pod %s/%s: %v", clientPod.Namespace, clientPod.Name, err)
}
defer func() {
waitForDeletion(ctx, t, cl, clientPod, 5*time.Minute)
Copy link
Contributor

@alebedev87 alebedev87 Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we are waiting for the dig pod to disappear. But if it's still there after 5 mins we don't do anything. This doesn't seem to be a lot different from a simple background Delete API call we had before in the deferred function. Or I'm missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the motivation was to ensure the pod is gone before starting another dig pod query attempt, to avoid a naming collision (since they use the same name). I believe the API delete call isn't blocking.

I've updated to use a random pod name suffix (to ensure no naming collisions), and to call the delete without blocking.

@gcs278
Copy link
Contributor Author

gcs278 commented Oct 11, 2023

@alebedev87 Addressed latest comments, I also made couple of clean ups: https://github.com/openshift/external-dns-operator/compare/e2330ab8695bfd3d3e0fd803bc95a547772e230a..cd5ab109630a647b08c711a96e5ba12e0e39a699

  • Delete logic for the service (removed waitForDeletion)
  • Using KubeClient in common, not really necessary to pass it around.
  • Added check for DNS record after service is deleted
  • GetPlatformType had error return type first
  • bunch of t.Logf or t.Fatalf when they could have been t.Log or t.Fatal

Copy link
Contributor

@alebedev87 alebedev87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me, just a couple of last minute nit picks and 1 comment about the checking of the DNS record removal.

test/common/common.go Outdated Show resolved Hide resolved
test/e2e_awssharedvpc/aws_shared_vpc_test.go Outdated Show resolved Hide resolved
test/e2e_awssharedvpc/aws_shared_vpc_test.go Outdated Show resolved Hide resolved
@gcs278 gcs278 force-pushed the shared-vpc-e2e branch 2 times, most recently from 3c11944 to bfd6593 Compare October 12, 2023 18:17
@gcs278
Copy link
Contributor Author

gcs278 commented Oct 12, 2023

@alebedev87 ready for another round. Fixed your review comments, but also added a make test-e2e-sharedvpc alias for easy run of sharedvpc e2e tests.

Also, I renamed the directory name from e2e_awssharedvpc to e2e_sharedvpc. The package name was already e2e_sharedvpc and I think it's probably better to be platform generic in hopes maybe it will be supported on other platforms.

Makefile Outdated Show resolved Hide resolved
@alebedev87
Copy link
Contributor

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 12, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 12, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alebedev87

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 12, 2023
@gcs278
Copy link
Contributor Author

gcs278 commented Oct 12, 2023

* 2023-10-12T22:10:23Z 3x kubelet: Failed to create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded

/test unit
/test verify

@gcs278
Copy link
Contributor Author

gcs278 commented Oct 13, 2023

test "verify" failed: pod pending for more than 30m0s: containers have not started in 32m26.844693716s: ci-scheduling-dns-wait, place-entrypoint, sidecar, test:

/test verify
/test unit

@gcs278
Copy link
Contributor Author

gcs278 commented Oct 13, 2023

cluster install
/test e2e-azure-infoblox-operator

New e2e_sharedvpc E2E test package which contains the E2E tests for
AWS shared VPC clusters. These tests are isolated because of their
unique requirement for a specific cluster configuration. A common
package has been added to support both types of E2E tests. The new E2E
test validates the use of `spec.provider.aws.assumeRole.arn` in a shared
VPC cluster.
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 23, 2023
@alebedev87
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 25, 2023
@gcs278
Copy link
Contributor Author

gcs278 commented Oct 26, 2023

Applying the labels myself as:

  1. NE-1323: Add AWS RoleARN for Shared VPC support #195 as already ack'ed by everyone.
  2. This is an E2E test, that has no user-facing changes to document or test.

/label docs-approved
/label px-approved
/label qe-approved

@openshift-ci openshift-ci bot added docs-approved Signifies that Docs has signed off on this PR px-approved Signifies that Product Support has signed off on this PR qe-approved Signifies that QE has signed off on this PR labels Oct 26, 2023
@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 26, 2023

@gcs278: This pull request references NE-1324 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

Add E2E tests for AssumeRole functionality added in #195. It creates a ExternalDNS object with AssumeRole configured inside of a Shared VPC cluster. It then confirms that an example DNS record gets created in a targeted AWS account's hosted zone as well as it queries the DNS record inside the cluster to test end-to-end DNS record functionality.

This E2E test update is also dependent on openshift/release#42894 which adds a new CI test job e2e-aws-shared-vpc-phz-operator to this repo. Without this, this new E2E test will be skipped.

Dependent on #195 and will need rebase when it merges.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 26, 2023

@gcs278: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci openshift-ci bot merged commit 77106ac into openshift:main Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. docs-approved Signifies that Docs has signed off on this PR jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. px-approved Signifies that Product Support has signed off on this PR qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants