-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NE-1324: E2E tests for Assume Role in Shared VPC Cluster #198
Conversation
@gcs278: This pull request references NE-1324 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
1142c46
to
e17d10d
Compare
@gcs278: This pull request references NE-1324 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
e50251b
to
9abefdb
Compare
@gcs278: This pull request references NE-1324 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
8135714
to
8d93253
Compare
@gcs278: This pull request references NE-1324 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
8d93253
to
5fc3fcc
Compare
/assign |
24aedfb
to
829cd86
Compare
/test e2e-aws-shared-vpc-phz-operator |
/retest |
/test e2e-aws-shared-vpc-phz-operator |
Round 1 - Success: |
cluster install fail |
openshift/release#43517 merged. testing again. |
test/e2e/operator_test.go
Outdated
t.Errorf("Failed to get dns 'cluster': %v\n", err) | ||
} | ||
if dnsConfig.Spec.Platform.AWS == nil || dnsConfig.Spec.Platform.AWS.PrivateZoneIAMRole == "" { | ||
t.Skipf("Test skipped on non-shared-VPC cluster") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before I start a detailed review I would like to discuss the test/code organization.
The SharedVPC/AssumeRole support is different from other test cases. All the other test cases try to fit into the main pattern:
- create DNZ zone
- do platform agnostic tests
- delete DNS zone
In the case of SharedVPC I see some patterns that stand alone:
- DNZ zone is created by the CI
- DNZ zone is private
- Platform dependent - PrivateZoneRole is implemented only for AWS
I see that your code tries to balance in between "be like other test case" and "be specific". From one hand providerTestHelper
interface is updated meaning that something is meant for all the providers while from the other hand the new test case is skipped for all the providers but AWS.
To make things more coherent I see 2 approaches:
- Make an effort and generalize AssumeRole test case:
- add
helper.buildExternalDNSWithRole
method toproviderTestHelper
interface: AWS would get the role from the cluster's DNS config, other platforms would call the existingbuildExternalDNS()
. - keep the DNS record listing as part of
providerTestHelper
interface but make the listing against different DNS zones: for AWS - from the other account, for the rest - from the same account. dig
probing can be kept as is
- Move the new test case into a different file and update CI to run only the new test case in the shared-vpc CI job.
The first approach assumes that SharedVPC/AssumeRole features will come to the other providers too but makes a lot of unnecessary moves . The second approach is not generic but targets precisely the new feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good discussion. A lot of ways to proceed. I struggled a bit with this concept when writing this E2E. Couple of thoughts and questions:
- Sounds like the core question is whether it could ever be extended to other platforms. I see evidence (Unable to use IAM Service Account on GKE kubernetes-sigs/external-dns#509) that folks are using service accounts to fulfill the assuming of a role in GCP.
- It's possible that it could be extended, right? Or maybe there is a fundamental design difference that assume-role embedded in External DNS only makes sense in AWS.
- In addition to testing our new assume role functionality, is it reasonable to test non-assume-role functionality on a Shared VPC cluster?
- At first, I thought this would a good test case, but the more I think about it, the more redundant it seems. Testing External DNS on a Shared VPC AWS cluster without using the Assume Role is not different than testing it in a non-Shared VPC vanilla AWS cluster.
add helper.buildExternalDNSWithRole method to providerTestHelper interface: AWS would get the role from the cluster's DNS config, other platforms would call the existingbuildExternalDNS().
👍 Except, I think you mean AWS Shared VPC
instead of just AWS
. Are you implying we should run the existing E2E tests under the AWS Shared VPC
test job? It's possible, but messy. We are using a private hosted zone, so we'd need to change the logic to validate the test differently.
keep the DNS record listing as part of providerTestHelper interface but make the listing against different DNS zones: for AWS - from the other account, for the rest - from the same account.
Are you suggesting to treat AWS Shared VPC as it's own platform (or provider)? i.e. add a new awsAssumeRoleHelper
with a new file aws_assume_role.go
? This could work, but we'd have to turn off the other non-assume-role E2E tests (like I mentioned above) and only run TestExternalDNSAssumeRole
(due to private zone limitation).
I think my questions are:
- Are you saying we should run the existing E2E tests under the
AWS Shared VPC
test job? And basically removeTestExternalDNSAssumeRole
as a specific test? - Should
AWS Shared VPC
be treated as a it's own platform? I.e. inexternal-dns-operator/test/e2e/operator_test.go
Lines 108 to 120 in 829cd86
func initProviderHelper(openshiftCI bool, platformType string) (providerTestHelper, error) { switch platformType { case string(configv1.AWSPlatformType): return newAWSHelper(openshiftCI, kubeClient) case string(configv1.AzurePlatformType): return newAzureHelper(kubeClient) case string(configv1.GCPPlatformType): return newGCPHelper(openshiftCI, kubeClient) case infobloxDNSProvider: return newInfobloxHelper(kubeClient) default: return nil, fmt.Errorf("unsupported provider: %q", platformType) }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like the core question is whether it could ever be extended to other platforms. I see evidence (Unable to use IAM Service Account on GKE kubernetes-sigs/external-dns#509) that folks are using service accounts to fulfill the assuming of a role in GCP.
It's possible that it could be extended, right? Or maybe there is a fundamental design difference that assume-role embedded in External DNS only makes sense in AWS.
Indeed, GCP seems to provide SharedVPC feature too with a possibility to make cross-projects DNS updates. However the way ExternalDNS is granted the permission seems to be different. The credentials are supposed to be from the other (not cluster's) project, they can be packed as a secret and feed to ExternalDNS through the credentials field. CredentialsRequest
cannot be used here as it will provision credentials for the cluster's project. However I don't fully exclude the possibility that AssumeRole API will not be propagated (possibly in a different form) to other providers. I didn't see Azure but even GCP may be somehow adopted if not upstream then in ExtDNS Operator.
Except, I think you mean AWS Shared VPC instead of just AWS. Are you implying we should run the existing E2E tests under the AWS Shared VPC test job? It's possible, but messy. We are using a private hosted zone, so we'd need to change the logic to validate the test differently.
No, I meant the AWS test helper. Only the new test scenario would use the new buildExternalDNSWithRole
method. The rest of the scenarios would still be using the zone created the e2e test in TestMain
. Even in the current implementation that is what's happening - all the old test cases are still using the public DNS zone from the cluster's account.
Are you saying we should run the existing E2E tests under the AWS Shared VPC test job? And basically remove TestExternalDNSAssumeRole as a specific test?
Should AWS Shared VPC be treated as a it's own platform?
Neither. The CI job you added is AWS platform (from the e2e point of view), it would run the same tests as any other platform. Only TestExternalDNSAssumeRole
test case would be different - it will read the private DNZ zone ID and IAM role from the DNSConfig and return ExternalDNS
with AssumeRole
field filled in. The old AWS CI job would still run the old test plus the new TestExternalDNSAssumeRole
but since there will be no private DNS zone ID or role in the DNSConfig the ExternalDNS
instance returned will not have AssumeRole
field.
So this way the old AWS CI job and the one you added are almost the same except for 1 test case. Redundant but TestExternalDNSAssumeRole
will fit into the test battery.
I think that I start to lean more towards the approach where your new CI job runs only 1 test - TestExternalDNSAssumeRole
and this test is not even added to operator_test.go
file but put into a dedicated "AWS specific" file. TestExternalDNSAssumeRole
can be copied as is, it still can use all the utils and constants, it just doesn't need to be part of the framework build in TestMain
.
test/common/common.go
Outdated
return nil, fmt.Errorf("failed to create pod %s/%s: %v", clientPod.Namespace, clientPod.Name, err) | ||
} | ||
defer func() { | ||
waitForDeletion(ctx, t, cl, clientPod, 5*time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we are waiting for the dig pod to disappear. But if it's still there after 5 mins we don't do anything. This doesn't seem to be a lot different from a simple background Delete
API call we had before in the deferred function. Or I'm missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the motivation was to ensure the pod is gone before starting another dig pod query attempt, to avoid a naming collision (since they use the same name). I believe the API delete call isn't blocking.
I've updated to use a random pod name suffix (to ensure no naming collisions), and to call the delete without blocking.
e2330ab
to
cd5ab10
Compare
@alebedev87 Addressed latest comments, I also made couple of clean ups: https://github.com/openshift/external-dns-operator/compare/e2330ab8695bfd3d3e0fd803bc95a547772e230a..cd5ab109630a647b08c711a96e5ba12e0e39a699
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me, just a couple of last minute nit picks and 1 comment about the checking of the DNS record removal.
3c11944
to
bfd6593
Compare
@alebedev87 ready for another round. Fixed your review comments, but also added a Also, I renamed the directory name from |
bfd6593
to
fe8617d
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alebedev87 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test unit |
/test verify |
cluster install |
New e2e_sharedvpc E2E test package which contains the E2E tests for AWS shared VPC clusters. These tests are isolated because of their unique requirement for a specific cluster configuration. A common package has been added to support both types of E2E tests. The new E2E test validates the use of `spec.provider.aws.assumeRole.arn` in a shared VPC cluster.
fe8617d
to
a808e02
Compare
/lgtm |
Applying the labels myself as:
/label docs-approved |
@gcs278: This pull request references NE-1324 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@gcs278: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Add E2E tests for AssumeRole functionality added in #195. It creates a ExternalDNS object with AssumeRole configured inside of a Shared VPC cluster. It then confirms that an example DNS record gets created in a targeted AWS account's hosted zone as well as it queries the DNS record inside the cluster to test end-to-end DNS record functionality.
This E2E test update is also dependent on openshift/release#42894 which adds a new CI test job
e2e-aws-shared-vpc-phz-operator
to this repo. Without this, this new E2E test will be skipped.Dependent on #195 and will need rebase when it merges.