Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NE-1324: Make caller-reference unique for AWS PHZ creation #43517

Conversation

gcs278
Copy link
Contributor

@gcs278 gcs278 commented Sep 19, 2023

The caller-reference when creating a private hosted zone needs to be unique. In some pre-submit job run situations for external-dns-operator ( e.g. openshift/external-dns-operator#198), it reuses the cluster name, causing caller-reference to reused. This solution simply adds a timestamp to caller reference to always ensure it's unique.

As an example job 1 (failed) and job 2 (failed) had the same name which produced the error on the job 2 run such as:

[36mINFO�[0m[2023-09-19T05:06:39Z] Using shared account to create PHZ. Account No: 176500***
creating route53 hosted zone: ci-op-v5xh5zs0-63435.origin-ci-int-aws.dev.rhcloud.com

An error occurred (HostedZoneAlreadyExists) when calling the CreateHostedZone operation: A hosted zone has already been created with the specified caller reference.

It appears to reuse the cluster name upon some types of failures, but I have had a successful run since this naming collision with caller-reference.

PR that introduced e2e-aws-shared-vpc-phz-operator: #42894

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 19, 2023
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Sep 19, 2023

@gcs278: This pull request references NE-1324 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

The caller-reference when creating a private hosted zone needs to be unique. Since this step is now used in a pre-submit job ( e.g. openshift/external-dns-operator#198), the cluster name and base domain are the same between CI runs. Despite the hosted zone getting deleted, caller-references can never be reused.

PR that introduced e2e-aws-shared-vpc-phz-operator:#42894

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@gcs278
Copy link
Contributor Author

gcs278 commented Sep 19, 2023

/pj-rehearse pull-ci-openshift-external-dns-operator-main-e2e-aws-shared-vpc-phz-operator

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Sep 19, 2023

@gcs278: This pull request references NE-1324 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

The caller-reference when creating a private hosted zone needs to be unique. Since this step is now used in a pre-submit job ( e.g. openshift/external-dns-operator#198), the cluster name and base domain are the same between CI runs. Despite the hosted zone getting deleted, caller-references can never be reused.

You'll get an error on your second job run such as:

[36mINFO�[0m[2023-09-19T05:06:39Z] Using shared account to create PHZ. Account No: 176500***
creating route53 hosted zone: ci-op-v5xh5zs0-63435.origin-ci-int-aws.dev.rhcloud.com

An error occurred (HostedZoneAlreadyExists) when calling the CreateHostedZone operation: A hosted zone has already been created with the specified caller reference.

PR that introduced e2e-aws-shared-vpc-phz-operator:#42894

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@gcs278
Copy link
Contributor Author

gcs278 commented Sep 19, 2023

@patrickdillon Could you take a quick look at this and give lgtm? It's pretty straightforward, but is a change to your aws-provision-route53-private-hosted-zone step which our Shared VPC periodic jobs run.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Sep 19, 2023

@gcs278: This pull request references NE-1324 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

The caller-reference when creating a private hosted zone needs to be unique. Since this step is now used in a pre-submit job for external-dns-operator ( e.g. openshift/external-dns-operator#198), the cluster name and base domain are the same between CI runs. Despite the hosted zone getting deleted, caller-references can never be reused.

You'll get an error on your second job run such as:

[36mINFO�[0m[2023-09-19T05:06:39Z] Using shared account to create PHZ. Account No: 176500***
creating route53 hosted zone: ci-op-v5xh5zs0-63435.origin-ci-int-aws.dev.rhcloud.com

An error occurred (HostedZoneAlreadyExists) when calling the CreateHostedZone operation: A hosted zone has already been created with the specified caller reference.

PR that introduced e2e-aws-shared-vpc-phz-operator:#42894

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@gcs278 gcs278 force-pushed the shared-vpc-e2e-caller-reference-fix branch from 85626f0 to 15868d4 Compare September 19, 2023 20:52
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Sep 19, 2023

@gcs278: This pull request references NE-1324 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

The caller-reference when creating a private hosted zone needs to be unique. Since this step is now used in a pre-submit job or external-dns-operator ( e.g. openshift/external-dns-operator#198), it exposed the fact that we aren't getting the entire cluster name, but just a subset of it that is not unique to pre-submit jobs.

You'll get an error on your second job run such as:

[36mINFO�[0m[2023-09-19T05:06:39Z] Using shared account to create PHZ. Account No: 176500***
creating route53 hosted zone: ci-op-v5xh5zs0-63435.origin-ci-int-aws.dev.rhcloud.com

An error occurred (HostedZoneAlreadyExists) when calling the CreateHostedZone operation: A hosted zone has already been created with the specified caller reference.

PR that introduced e2e-aws-shared-vpc-phz-operator:#42894

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@gcs278 gcs278 force-pushed the shared-vpc-e2e-caller-reference-fix branch 2 times, most recently from 45a9ed7 to e31fbc4 Compare September 19, 2023 20:54
@gcs278
Copy link
Contributor Author

gcs278 commented Sep 19, 2023

/pj-rehearse pull-ci-openshift-external-dns-operator-main-e2e-aws-shared-vpc-phz-operator

Just trying random jobs
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-nightly-aws-ipi-byo-route53-f14
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-aws-ipi-private-shared-vpc-phz-sts-f14
/pj-rehearse periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-ovn-shared-vpc-phz-techpreview

@alebedev87
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 19, 2023
@gcs278
Copy link
Contributor Author

gcs278 commented Sep 19, 2023

/pj-rehearse periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-ovn-shared-vpc-phz-techpreview

@gcs278
Copy link
Contributor Author

gcs278 commented Sep 19, 2023

/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-nightly-aws-ipi-byo-route53-f14

@gcs278
Copy link
Contributor Author

gcs278 commented Sep 19, 2023

/hold

cat: /tmp/secret/cluster-id: No such file or directory

I've must of missed something.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 19, 2023
@gcs278 gcs278 force-pushed the shared-vpc-e2e-caller-reference-fix branch from e31fbc4 to 403955e Compare September 19, 2023 23:31
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Sep 19, 2023
@gcs278
Copy link
Contributor Author

gcs278 commented Sep 19, 2023

/pj-rehearse pull-ci-openshift-external-dns-operator-main-e2e-aws-shared-vpc-phz-operator

@gcs278
Copy link
Contributor Author

gcs278 commented Sep 20, 2023

jq: error: Could not open file /tmp/secret/metadata.json: No such file or directory

This failed too. I'm not sure why I don't have this files, but other steps do.

@gcs278
Copy link
Contributor Author

gcs278 commented Sep 20, 2023

/assign @alebedev87

@gcs278 gcs278 force-pushed the shared-vpc-e2e-caller-reference-fix branch from 403955e to c04511e Compare September 20, 2023 16:18
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Sep 20, 2023

@gcs278: This pull request references NE-1324 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

The caller-reference when creating a private hosted zone needs to be unique. In some pre-submit job run situations for external-dns-operator ( e.g. openshift/external-dns-operator#198), it reuses the cluster name, causing caller-reference to reused. This solution simply adds a timestamp to caller reference to always ensure it's unique.

As an example job 1 (failed) and job 2 (failed) had the same name which produced the error on the job 2 run such as:

[36mINFO�[0m[2023-09-19T05:06:39Z] Using shared account to create PHZ. Account No: 176500***
creating route53 hosted zone: ci-op-v5xh5zs0-63435.origin-ci-int-aws.dev.rhcloud.com

An error occurred (HostedZoneAlreadyExists) when calling the CreateHostedZone operation: A hosted zone has already been created with the specified caller reference.

It appears to reuse the cluster name upon some types of failures, but I have had a successful run since this naming collusion with caller-reference.

PR that introduced e2e-aws-shared-vpc-phz-operator: #42894

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

The caller-reference when creating a private hosted zone needs to be
unique. In some pre-submit job run situations, it reuses the cluster
name, causing caller-reference to reused. This solution simply adds a
timestamp to caller reference to always ensure it's unique.
@gcs278 gcs278 force-pushed the shared-vpc-e2e-caller-reference-fix branch from c04511e to 42ad710 Compare September 20, 2023 16:27
@openshift-ci-robot
Copy link
Contributor

[REHEARSALNOTIFIER]
@gcs278: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-openshift-external-dns-operator-main-e2e-aws-shared-vpc-phz-operator openshift/external-dns-operator presubmit Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.14-arm64-nightly-aws-ipi-byo-route53-f28-destructive N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-nightly-aws-ipi-private-shared-vpc-phz-sts-f28-destructive N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-aws-ipi-shared-vpc-phz-f14 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.11-arm64-nightly-4.11-upgrade-from-stable-4.11-aws-ipi-byo-route53-f360 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.11-amd64-nightly-aws-ipi-byo-route53-f28 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-nightly-4.14-upgrade-from-stable-4.13-aws-ipi-shared-vpc-phz-sts-fips-f14 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.13-amd64-nightly-4.13-upgrade-from-stable-4.13-aws-ipi-byo-route53-f28 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.13-amd64-nightly-aws-ipi-shared-vpc-phz-sts-fips-f14 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-aws-ipi-byo-route53-f14 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-nightly-aws-ipi-shared-vpc-phz-f14 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-4.15-upgrade-from-stable-4.14-aws-ipi-shared-vpc-phz-sts-fips-f14 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-aws-ipi-shared-vpc-phz-sts-fips-f14 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.13-amd64-nightly-aws-ipi-shared-vpc-phz-sts-fips-f28-destructive N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.13-arm64-nightly-4.13-upgrade-from-stable-4.13-aws-ipi-byo-route53-f28 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.10-amd64-nightly-aws-ipi-byo-route53-f28-destructive N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-aws-ipi-private-shared-vpc-phz-sts-f14 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.11-amd64-nightly-4.11-upgrade-from-stable-4.10-aws-ipi-byo-route53-f28 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.12-amd64-nightly-4.12-upgrade-from-stable-4.11-aws-ipi-byo-route53-f28 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.12-arm64-nightly-4.12-upgrade-from-stable-4.12-aws-ipi-byo-route53-f28 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-4.15-upgrade-from-stable-4.14-aws-ipi-byo-route53-f28 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.13-arm64-nightly-aws-ipi-byo-route53-f14 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.13-arm64-nightly-aws-ipi-byo-route53-f28-destructive N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.10-arm64-nightly-aws-ipi-byo-route53-f28 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.13-amd64-nightly-aws-ipi-byo-route53-f14 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.13-amd64-nightly-aws-ipi-private-shared-vpc-phz-sts-f14 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-nightly-aws-ipi-shared-vpc-phz-f28-destructive N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.13-amd64-nightly-4.13-upgrade-from-stable-4.12-aws-ipi-shared-vpc-phz-f14 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.12-amd64-nightly-aws-ipi-byo-route53-f28-destructive N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.13-amd64-stable-aws-ipi-shared-vpc-phz-sts-fips-f14 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-4.15-upgrade-from-stable-4.14-aws-ipi-shared-vpc-phz-f14 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.14-arm64-nightly-4.14-upgrade-from-stable-4.14-aws-ipi-byo-route53-f28 N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.11-amd64-nightly-aws-ipi-byo-route53-f28-destructive N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.12-amd64-nightly-aws-ipi-private-shared-vpc-phz-sts-f28-destructive N/A periodic Registry content changed
periodic-ci-openshift-openshift-tests-private-release-4.12-amd64-nightly-aws-ipi-shared-vpc-phz-f14 N/A periodic Registry content changed

A total of 78 jobs have been affected by this change. The above listing is non-exhaustive and limited to 35 jobs.

A full list of affected jobs can be found here

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 10 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 20 rehearsals
Comment: /pj-rehearse max to run up to 35 rehearsals
Comment: /pj-rehearse auto-ack to run up to 10 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse abort to abort all active rehearsals

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@gcs278
Copy link
Contributor Author

gcs278 commented Sep 20, 2023

/pj-rehearse pull-ci-openshift-external-dns-operator-main-e2e-aws-shared-vpc-phz-operator

@gcs278
Copy link
Contributor Author

gcs278 commented Sep 20, 2023

/pj-rehearse periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-ovn-shared-vpc-phz-techpreview

@gcs278
Copy link
Contributor Author

gcs278 commented Sep 20, 2023

/hold cancel
Appears to be working correctly as both test jobs made it past aws-provision-route53-private-hosted-zone. Ready for review.

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 20, 2023
@patrickdillon
Copy link
Contributor

/approve

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 20, 2023
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Sep 20, 2023

@gcs278: This pull request references NE-1324 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

The caller-reference when creating a private hosted zone needs to be unique. In some pre-submit job run situations for external-dns-operator ( e.g. openshift/external-dns-operator#198), it reuses the cluster name, causing caller-reference to reused. This solution simply adds a timestamp to caller reference to always ensure it's unique.

As an example job 1 (failed) and job 2 (failed) had the same name which produced the error on the job 2 run such as:

[36mINFO�[0m[2023-09-19T05:06:39Z] Using shared account to create PHZ. Account No: 176500***
creating route53 hosted zone: ci-op-v5xh5zs0-63435.origin-ci-int-aws.dev.rhcloud.com

An error occurred (HostedZoneAlreadyExists) when calling the CreateHostedZone operation: A hosted zone has already been created with the specified caller reference.

It appears to reuse the cluster name upon some types of failures, but I have had a successful run since this naming collision with caller-reference.

PR that introduced e2e-aws-shared-vpc-phz-operator: #42894

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@gcs278
Copy link
Contributor Author

gcs278 commented Sep 20, 2023

/pj-rehearse ack

@openshift-ci-robot openshift-ci-robot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Sep 20, 2023
@alebedev87
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 20, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 20, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alebedev87, gcs278, patrickdillon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 20, 2023

@gcs278: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-nightly-aws-ipi-byo-route53-f14 e31fbc4c8ebbe18664c8f0ff0c25c094cb432ae6 link unknown /pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-nightly-aws-ipi-byo-route53-f14

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. rehearsals-ack Signifies that rehearsal jobs have been acknowledged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants