Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSASINFRA-3642: openstack: support setting external LB floating IP #1147

Merged
merged 4 commits into from
Nov 15, 2024

Conversation

EmilienM
Copy link
Member

@EmilienM EmilienM commented Sep 16, 2024

This PR introduces enhancements to the Cluster Ingress Operator to support setting a specific IP address for load balancers on the OpenStack platform.

Changes:

Load Balancer IP Management:

  • Updated the logic to handle LoadBalancer IP settings for OpenStack in the controller.go and load_balancer_service.go files.
  • Added conditions to support external load balancers with floating IPs and ensure proper IP assignment and validation.

Provider Parameters:

  • Added support for OpenStack provider parameters ensuring the correct configuration of load balancers.

Test Cases:

  • Added new test cases in load_balancer_service_test.go to verify the behavior with OpenStack load balancer configurations.

Co-Authored: Grant Spence [email protected]

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 16, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Sep 16, 2024

@EmilienM: This pull request references RFE-6242 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the feature request to target the "4.18.0" version, but no target version was set.

In response to this:

Add support for setting the (deprecated) Service field: loadBalancerIP on platform OpenStack when the LB is not internal.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@EmilienM
Copy link
Member Author

/cc mdbooth Miciah

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 17, 2024
@EmilienM EmilienM force-pushed the RFE-6242 branch 2 times, most recently from cc8fbd1 to 4e71b22 Compare September 30, 2024 19:33
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 30, 2024
Copy link
Contributor

@Miciah Miciah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should set effectiveStrategy.LoadBalancer.DNSManagementPolicy = operatorv1.UnmanagedLoadBalancerDNS when the platform is OpenStack. You can add a check around the existing if !domainMatchesBaseDomain and if platformStatus.GCP.CloudLoadBalancerConfig.DNSType == configv1.ClusterHostedDNSType checks near the top of setDefaultPublishingStrategy.

At some point, we should probably refactor this logic because setDefaultPublishingStrategy is already quite large, and the logic for determining the value of DNSManagementPolicy somewhat couples the operator's ingress and dns controllers. We can do that refactoring in a follow-up.

@EmilienM
Copy link
Member Author

EmilienM commented Oct 2, 2024

WIP for now because the API is being changed: openshift/api#2051

@Miciah
Copy link
Contributor

Miciah commented Oct 2, 2024

/assign

@Miciah
Copy link
Contributor

Miciah commented Oct 2, 2024

Could you also add some test cases to TestSetDefaultPublishingStrategySetsPlatformDefaults and TestSetDefaultPublishingStrategyHandlesUpdates in pkg/operator/controller/ingress/controller_test.go and to Test_desiredLoadBalancerService in pkg/operator/controller/ingress/load_balancer_service_test.go?

@EmilienM
Copy link
Member Author

EmilienM commented Oct 2, 2024

Could you also add some test cases to TestSetDefaultPublishingStrategySetsPlatformDefaults and TestSetDefaultPublishingStrategyHandlesUpdates in pkg/operator/controller/ingress/controller_test.go and to Test_desiredLoadBalancerService in pkg/operator/controller/ingress/load_balancer_service_test.go?

For TestSetDefaultPublishingStrategySetsPlatformDefaults I think it's not possible because the default is not an LB but a HostNetwork endpoint strategy. But I'll add tests for others.

@candita
Copy link
Contributor

candita commented Oct 2, 2024

@gcs278 assigning to you for visibility, since this impacts one of your pending PRs.

/assign @gcs278

@EmilienM EmilienM changed the title RFE-6242: WIP - openstack: support setting LB IP RFE-6242: openstack: support setting external LB floating IP Oct 4, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 4, 2024

@EmilienM: This pull request references RFE-6242 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the feature request to target the "4.18.0" version, but no target version was set.

In response to this:

Add support for setting the (deprecated) Service field: floatingIP on platform OpenStack when the LB is not internal.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@EmilienM
Copy link
Member Author

EmilienM commented Oct 4, 2024

The tests will fail (at least they did locally for me) because I bumped k8s to 1.31.1 and more work seems needed.

@Miciah
Copy link
Contributor

Miciah commented Nov 15, 2024

I don't think the CI failures are related. Before retesting these jobs, I'll let maintainers reviewing it.

e2e-azure-ovn failed because [sig-arch][Late] operators should not create watch channels very often failed:

{  fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:406]: Expected
    <[]string | len:1, cap:1>: [
        "Operator \"cluster-samples-operator\" produces more watch requests than expected: watchrequestcount=151, upperbound=118, ratio=1.28",
    ]
to be empty
Ginkgo exit error 1: exit with code 1}

I have filed OCPBUGS-44581 CI fails on "[sig-arch][Late] operators should not create watch channels very often" for this issue.

e2e-aws-ovn-techpreview failed because [bz-kube-apiserver][invariant] alert/KubeAPIErrorBudgetBurn should not be at or above info failed:

{  KubeAPIErrorBudgetBurn was at or above info for at least 1m58s on platformidentification.JobType{Release:"4.18", FromRelease:"", Platform:"aws", Architecture:"amd64", Network:"ovn", Topology:"ha"} (maxAllowed=0s): pending for 16m8s, firing for 1m58s:

Nov 14 17:26:14.852 - 118s  E namespace/openshift-kube-apiserver alert/KubeAPIErrorBudgetBurn alertstate/firing severity/critical ALERTS{alertname="KubeAPIErrorBudgetBurn", alertstate="firing", long="6h", namespace="openshift-kube-apiserver", prometheus="openshift-monitoring/k8s", severity="critical", short="30m"}}

I believe OCPBUGS-42083 is tracking this issue.

e2e-hypershift failed because TestCreateClusterV2/Main/break-glass-credentials/independent_signers failed:

{Failed  === RUN   TestCreateClusterV2/Main/break-glass-credentials/independent_signers
    control_plane_pki_operator.go:92: generating new break-glass credentials for more than one signer
    pki.go:75: loading certificate/key pair from disk for signer customer-break-glass, use $REGENERATE_PKI to generate new ones
    control_plane_pki_operator.go:201: creating CSR "o7ult40rg5ngulsfotl96hvl0hz0lm1qu7g9x7pz083" for signer "customer-break-glass", requesting client auth usages
    control_plane_pki_operator.go:211: creating CSRA e2e-clusters-f8jng-example-989zr/o7ult40rg5ngulsfotl96hvl0hz0lm1qu7g9x7pz083 to trigger automatic approval of the CSR
    control_plane_pki_operator.go:218: Successfully waited for CSR "o7ult40rg5ngulsfotl96hvl0hz0lm1qu7g9x7pz083" to be approved and signed in 1s
    control_plane_pki_operator.go:130: validating that the client certificate provides the appropriate access
    control_plane_pki_operator.go:116: amending the existing kubeconfig to use break-glass client certificate credentials
    control_plane_pki_operator.go:133: issuing SSR to identify the subject we are given using the client certificate
    control_plane_pki_operator.go:153: ensuring that the SSR identifies the client certificate as having system:masters power and correct username
    pki.go:75: loading certificate/key pair from disk for signer sre-break-glass, use $REGENERATE_PKI to generate new ones
    control_plane_pki_operator.go:201: creating CSR "zpxsfoj4y5ltot4gswx1k5t01p50a3bcz3um2rtgbi" for signer "sre-break-glass", requesting client auth usages
    control_plane_pki_operator.go:211: creating CSRA e2e-clusters-f8jng-example-989zr/zpxsfoj4y5ltot4gswx1k5t01p50a3bcz3um2rtgbi to trigger automatic approval of the CSR
    control_plane_pki_operator.go:218: Successfully waited for CSR "zpxsfoj4y5ltot4gswx1k5t01p50a3bcz3um2rtgbi" to be approved and signed in 1s
    control_plane_pki_operator.go:130: validating that the client certificate provides the appropriate access
    control_plane_pki_operator.go:116: amending the existing kubeconfig to use break-glass client certificate credentials
    control_plane_pki_operator.go:133: issuing SSR to identify the subject we are given using the client certificate
    control_plane_pki_operator.go:153: ensuring that the SSR identifies the client certificate as having system:masters power and correct username
    control_plane_pki_operator.go:96: revoking the "customer-break-glass" signer
    pki.go:75: loading certificate/key pair from disk for signer customer-break-glass, use $REGENERATE_PKI to generate new ones
    control_plane_pki_operator.go:253: creating CRR e2e-clusters-f8jng-example-989zr/o7ult40rg5ngulsfotl96hvl0hz0lm1qu7g9x7pz083 to trigger signer certificate revocation
    eventually.go:100: Failed to get *v1alpha1.CertificateRevocationRequest: context deadline exceeded
    control_plane_pki_operator.go:260: Failed to wait for CRR e2e-clusters-f8jng-example-989zr/o7ult40rg5ngulsfotl96hvl0hz0lm1qu7g9x7pz083 to complete in 10m0s: context deadline exceeded
    eventually.go:220: observed *v1alpha1.CertificateRevocationRequest e2e-clusters-f8jng-example-989zr/o7ult40rg5ngulsfotl96hvl0hz0lm1qu7g9x7pz083 invalid at RV 78205 after 10m0s: incorrect condition: wanted PreviousCertificatesRevoked=True, got PreviousCertificatesRevoked=False: WaitingForAvailable(Previous signer certificate not yet revoked.)
    control_plane_pki_operator.go:260: *v1alpha1.CertificateRevocationRequest e2e-clusters-f8jng-example-989zr/o7ult40rg5ngulsfotl96hvl0hz0lm1qu7g9x7pz083 conditions:
    control_plane_pki_operator.go:260: LeafCertificatesRegenerated=True: AsExpected(All leaf certificates are re-generated.)
    control_plane_pki_operator.go:260: PreviousCertificatesRevoked=False: WaitingForAvailable(Previous signer certificate not yet revoked.)
    control_plane_pki_operator.go:260: NewCertificatesTrusted=True: AsExpected(New signer certificate e2e-clusters-f8jng-example-989zr/customer-system-admin-signer trusted.)
    control_plane_pki_operator.go:260: RootCertificatesRegenerated=True: AsExpected(Signer certificate e2e-clusters-f8jng-example-989zr/customer-system-admin-signer regenerated.)
    control_plane_pki_operator.go:260: SignerClassValid=True: AsExpected(Signer class "customer-break-glass" known.)
            --- FAIL: TestCreateClusterV2/Main/break-glass-credentials/independent_signers (602.07s)
}

I have filed OCPBUGS-44582 for this issue.

* Update openshift/api into a commit that contains the floatingIP
  change needed by this PR.
* Run `go mod tidy && go mod vendor`.
* Run `make update`.
@Miciah
Copy link
Contributor

Miciah commented Nov 15, 2024

Thanks!
/approve
/lgtm

Copy link
Contributor

openshift-ci bot commented Nov 15, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Miciah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 15, 2024
@EmilienM EmilienM requested a review from Miciah November 15, 2024 01:40
This commit introduces enhancements to the Cluster Ingress Operator to support
setting a specific IP address for load balancers on the OpenStack platform.

* Updated the logic to handle LoadBalancer IP settings for OpenStack in the controller.go and load_balancer_service.go files.
* Added conditions to support external load balancers with floating IPs and ensure proper IP assignment and validation.
* Added support for OpenStack provider parameters ensuring the correct configuration of load balancers.
* Added new test cases in load_balancer_service_test.go to verify the behavior with OpenStack load balancer configurations.

https://issues.redhat.com/browse/OSASINFRA-3642
Co-Authored: Grant Spence <[email protected]>
These tests were missing, now we have them.
@Miciah
Copy link
Contributor

Miciah commented Nov 15, 2024

Thanks again!
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 15, 2024
@lihongan
Copy link
Contributor

/test e2e-openstack-operator

@EmilienM
Copy link
Member Author

/test e2e-openstack-operator

the job will fail. I plan to work on that early 2025.

@EmilienM
Copy link
Member Author

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 15, 2024
@Miciah
Copy link
Contributor

Miciah commented Nov 15, 2024

/test e2e-openstack-operator

the job will fail. I plan to work on that early 2025.

It's still worth checking to make sure it fails as expected (same tests failing with similar errors).

@Miciah
Copy link
Contributor

Miciah commented Nov 15, 2024

#1048 (comment) will give us a baseline for e2e-openstack-operator.

@lihongan
Copy link
Contributor

@EmilienM Is that possible to do pre-merge test with the PR? If yes, then any requirements to setup the cluster on OSP (Octavia required?), any configuration and function should be checked? Thanks

Copy link
Contributor

openshift-ci bot commented Nov 15, 2024

@EmilienM: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-gatewayapi 46a8f14 link false /test e2e-aws-gatewayapi
ci/prow/e2e-openstack-operator 46a8f14 link false /test e2e-openstack-operator
ci/prow/e2e-aws-ovn-single-node 46a8f14 link false /test e2e-aws-ovn-single-node
ci/prow/e2e-azure-ovn 46a8f14 link false /test e2e-azure-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 0b561f8 into openshift:master Nov 15, 2024
15 of 19 checks passed
@EmilienM EmilienM deleted the RFE-6242 branch November 15, 2024 12:02
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-cluster-ingress-operator
This PR has been included in build ose-cluster-ingress-operator-container-v4.18.0-202411151139.p0.g0b561f8.assembly.stream.el9.
All builds following this will include this PR.

@EmilienM
Copy link
Member Author

@EmilienM Is that possible to do pre-merge test with the PR? If yes, then any requirements to setup the cluster on OSP (Octavia required?), any configuration and function should be checked? Thanks

Sorry, this just merged. I'll do some testing again today and let people know if something's wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants