Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] OCPBUGS-43692: Fix LB Type Defaulting with ProviderParameters Set #1160

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gcs278
Copy link
Contributor

@gcs278 gcs278 commented Oct 22, 2024

WIP: Found more potential nil pointer dereferences.

In the case that alreadyAdmitted was true and the IngressController's spec had providerParameters set, setDefaultProviderParameters incorrectly forced the LB type to Classic.

This fix removes alreadyAdmitted as a parameter from setDefaultProviderParameters and completely prevents it from being invoked when alreadyAdmitted, preventing the update of defaulted values after an IngressController has been admitted.

Previously, setDefaultProviderParameters always ensured that specLB.ProviderParameters.AWS was initialized. Now that the function is conditionally invoked, an additional nil check has been added to handle cases when it's not called.

An unit test has been added for this scenario to prevent regression.

@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 22, 2024
@openshift-ci-robot
Copy link
Contributor

@gcs278: This pull request references Jira Issue OCPBUGS-43692, which is invalid:

  • expected the bug to target the "4.18.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Most of the logic in setDefaultProviderParameters didn't execute if alreadyAdmitted is true. However, in the case alreadyAdmitted was true and the IngressController's spec had providerParameters set, setDefaultProviderParameters incorrectly forced the LB type to Classic.

There is no clear advantage to running
setDefaultProviderParameters when alreadyAdmitted is true. This fix removes alreadyAdmitted as a parameter from
setDefaultProviderParameters and instead prevents it from being invoked entirely when alreadyAdmitted is true.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from knobunc and Thealisyed October 22, 2024 20:00
Copy link
Contributor

openshift-ci bot commented Oct 22, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from gcs278. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 22, 2024
@openshift-ci-robot
Copy link
Contributor

@gcs278: This pull request references Jira Issue OCPBUGS-43692, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.18.0) matches configured target version for branch (4.18.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @lihongan

In response to this:

Most of the logic in setDefaultProviderParameters didn't execute if alreadyAdmitted is true. However, in the case alreadyAdmitted was true and the IngressController's spec had providerParameters set, setDefaultProviderParameters incorrectly forced the LB type to Classic.

There is no clear advantage to running setDefaultProviderParameters when alreadyAdmitted is true. This fix removes alreadyAdmitted as a parameter from setDefaultProviderParameters and instead prevents it from being invoked entirely when alreadyAdmitted is true.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from lihongan October 22, 2024 20:00
@gcs278 gcs278 changed the title OCPBUGS-43692: Fix LB Type Defaulting with ProviderParameters Set [WIP] OCPBUGS-43692: Fix LB Type Defaulting with ProviderParameters Set Oct 22, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 22, 2024
@gcs278

This comment was marked as resolved.

@gcs278 gcs278 force-pushed the OCPBUGS-43692-lb-type-defaulting-providerParms branch from 8935f7a to 304b7b6 Compare October 22, 2024 20:49
@openshift-ci-robot
Copy link
Contributor

@gcs278: This pull request references Jira Issue OCPBUGS-43692, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.18.0) matches configured target version for branch (4.18.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @lihongan

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

In the case that alreadyAdmitted was true and the IngressController's spec had providerParameters set, setDefaultProviderParameters incorrectly forced the LB type to Classic.

This fix relocates the alreadyAdmitted check to the logic responsible for mutating the LB type, rather than applying it to the logic that determines the defaultLBType value.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

@gcs278: This pull request references Jira Issue OCPBUGS-43692, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.18.0) matches configured target version for branch (4.18.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @melvinjoseph86

In response to this:

WIP: Still working through issues.

In the case that alreadyAdmitted was true and the IngressController's spec had providerParameters set, setDefaultProviderParameters incorrectly forced the LB type to Classic.

This fix relocates the alreadyAdmitted check to the logic responsible for mutating the LB type, rather than applying it to the logic that determines the defaultLBType value.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from melvinjoseph86 October 23, 2024 03:51
@gcs278 gcs278 force-pushed the OCPBUGS-43692-lb-type-defaulting-providerParms branch 3 times, most recently from 072c35a to fefdc79 Compare October 23, 2024 17:23
@gcs278 gcs278 changed the title [WIP] OCPBUGS-43692: Fix LB Type Defaulting with ProviderParameters Set OCPBUGS-43692: Fix LB Type Defaulting with ProviderParameters Set Oct 23, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 23, 2024
@gcs278 gcs278 changed the title OCPBUGS-43692: Fix LB Type Defaulting with ProviderParameters Set [WIP] OCPBUGS-43692: Fix LB Type Defaulting with ProviderParameters Set Oct 23, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 23, 2024
@gcs278
Copy link
Contributor Author

gcs278 commented Oct 23, 2024

I've spun my wheels a bit on a couple different ways to approach this problem. All are valid, but it depends on what approach we'd like to take with setDefaultProviderParameters. I'll document here for thoughts.

Option 1: Don't Invoke setDefaultProviderParameters when alreadyAdmitted=True

  • Previously, when alreadyAdmitted was True, setDefaultProviderParameters didn't update any default values (as intended), but it did initialize empty structs in effectiveStrategy.loadBalancer
    • Later code in setDefaultPublishingStrategy depends on these initializations
  • Initially I went this route, but then realized I created a couple of nil deference exceptions
  • Seems like a change in the purpose of setDefaultProviderParameters now that it doesn't always ensure defaults.
  • Solution: master...gcs278:cluster-ingress-operator:OCPBUGS-43692-lb-type-defaulting-providerParms_condition

Option 2: Fix setDefaultProviderParameters to not update LB Type when alreadyAdmitted=True

  • The alreadyAdmitted check could be moved to here so that setDefaultProviderParameters never changes the LB Type after admission
    • However, the issue is that setDefaultProviderParameters still initializes lbs.ProviderParameters.AWS, and AWS.type is required, and the default value for AWS.type is "".
    • So, it's a bit confusing to have lbs.ProviderParameters.AWS.type as "" (empty string) in our code, which then needs to be protected against later in setDefaultPublishingStrategy, because it will try to update the status LB type to "" which causes an error. "" is not a valid AWS.type value.
    • I think our code should try to make sure our values are valid in the API (i.e. consistency with API) so future developers don't get confused.
  • Solution: master...gcs278:cluster-ingress-operator:OCPBUGS-43692-lb-type-defaulting-providerParms_donotupdatelbtype

Option 3: Pass current LB Status into setDefaultProviderParameters and set effective LB Type to status when alreadyAdmitted=True

I think I lean towards option 3 as the most "elegant" solution.

@gcs278 gcs278 force-pushed the OCPBUGS-43692-lb-type-defaulting-providerParms branch from fefdc79 to 82e4b34 Compare October 23, 2024 20:33
@melvinjoseph86
Copy link

/retest

Approach: pass in LoadBalanacerStrategy status so that LB Type can be
defaulted to the current value when already admitted.
@gcs278 gcs278 force-pushed the OCPBUGS-43692-lb-type-defaulting-providerParms branch from 82e4b34 to 3a6ac72 Compare November 20, 2024 14:14
@melvinjoseph86
Copy link

Tested using cluster bot

mjoseph@mjoseph-mac Downloads % oc get clusterversion
NAME      VERSION                                                   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         82m     Cluster version is 4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest
mjoseph@mjoseph-mac Downloads %  oc edit ingresses.config.openshift.io cluster
ingress.config.openshift.io/cluster edited
mjoseph@mjoseph-mac Downloads %  oc get dns.config cluster -ojson | jq '.spec.baseDomain'
"ci-ln-zby948b-76ef8.aws-2.ci.openshift.org"
mjoseph@mjoseph-mac Downloads % oc apply -f - <<EOF
apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: nlb
  namespace: openshift-ingress-operator
spec:
  domain: nlb.ci-ln-zby948b-76ef8.aws-2.ci.openshift.org
  replicas: 1
  endpointPublishingStrategy:
    loadBalancer:
      scope: External
      dnsManagementPolicy: Managed
      providerParameters:
        type: AWS
    type: LoadBalancerService
  nodePlacement:
    nodeSelector:
      matchLabels:
        node-role.kubernetes.io/worker: ""
EOF
ingresscontroller.operator.openshift.io/nlb created

mjoseph@mjoseph-mac Downloads % oc get po -n openshift-ingress
NAME                             READY   STATUS    RESTARTS      AGE
router-default-969bfb959-9bnrt   1/1     Running   1 (80m ago)   88m
router-default-969bfb959-rr8zq   1/1     Running   0             5m20s
router-nlb-5b548cf4d4-hpnnb      1/1     Running   0             21s


mjoseph@mjoseph-mac Downloads % oc get co                                                          
NAME                                       VERSION                                                   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      63m     
baremetal                                  4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      88m     
cloud-controller-manager                   4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      90m     
cloud-credential                           4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      91m     
cluster-autoscaler                         4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      88m     
config-operator                            4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      89m     
console                                    4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      75m     
control-plane-machine-set                  4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      87m     
csi-snapshot-controller                    4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      88m     
dns                                        4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      88m     
etcd                                       4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      87m     
image-registry                             4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      79m     
ingress                                    4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      80m     
insights                                   4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      88m     
kube-apiserver                             4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      85m     
kube-controller-manager                    4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      85m     
kube-scheduler                             4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      85m     
kube-storage-version-migrator              4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      89m     
machine-api                                4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      83m     
machine-approver                           4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      88m     
machine-config                             4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      86m     
marketplace                                4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      88m     
monitoring                                 4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      77m     
network                                    4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      90m     
node-tuning                                4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      84m     
openshift-apiserver                        4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      81m     
openshift-controller-manager               4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      84m     
openshift-samples                          4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      80m     
operator-lifecycle-manager                 4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      88m     
operator-lifecycle-manager-catalog         4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      88m     
operator-lifecycle-manager-packageserver   4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      84m     
service-ca                                 4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      89m     
storage                                    4.18.0-0.ci.test-2024-11-21-055206-ci-ln-zby948b-latest   True        False         False      88m     
mjoseph@mjoseph-mac Downloads % 
mjoseph@mjoseph-mac Downloads % 
mjoseph@mjoseph-mac Downloads %  oc -n openshift-ingress-operator get ingresscontroller nlb -o yaml
apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"operator.openshift.io/v1","kind":"IngressController","metadata":{"annotations":{},"name":"nlb","namespace":"openshift-ingress-operator"},"spec":{"domain":"nlb.ci-ln-zby948b-76ef8.aws-2.ci.openshift.org","endpointPublishingStrategy":{"loadBalancer":{"dnsManagementPolicy":"Managed","providerParameters":{"type":"AWS"},"scope":"External"},"type":"LoadBalancerService"},"nodePlacement":{"nodeSelector":{"matchLabels":{"node-role.kubernetes.io/worker":""}}},"replicas":1}}
  creationTimestamp: "2024-11-21T07:39:26Z"
  finalizers:
  - ingresscontroller.operator.openshift.io/finalizer-ingresscontroller
  generation: 2
  name: nlb
  namespace: openshift-ingress-operator
  resourceVersion: "53521"
  uid: 2f51a71f-df57-4980-80ce-3d84eb2cb02b
spec:
  clientTLS:
    clientCA:
      name: ""
    clientCertificatePolicy: ""
  domain: nlb.ci-ln-zby948b-76ef8.aws-2.ci.openshift.org
  endpointPublishingStrategy:
    loadBalancer:
      dnsManagementPolicy: Managed
      providerParameters:
        type: AWS
      scope: External
    type: LoadBalancerService
  httpCompression: {}
  httpEmptyRequestsPolicy: Respond
  httpErrorCodePages:
    name: ""
  nodePlacement:
    nodeSelector:
      matchLabels:
        node-role.kubernetes.io/worker: ""
  replicas: 1
  tuningOptions:
    reloadInterval: 0s
  unsupportedConfigOverrides: null
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2024-11-21T07:39:26Z"
    reason: Valid
    status: "True"
    type: Admitted
  - lastTransitionTime: "2024-11-21T07:40:01Z"
    message: The deployment has Available status condition set to True
    reason: DeploymentAvailable
    status: "True"
    type: DeploymentAvailable
  - lastTransitionTime: "2024-11-21T07:40:01Z"
    message: Minimum replicas requirement is met
    reason: DeploymentMinimumReplicasMet
    status: "True"
    type: DeploymentReplicasMinAvailable
  - lastTransitionTime: "2024-11-21T07:40:01Z"
    message: All replicas are available
    reason: DeploymentReplicasAvailable
    status: "True"
    type: DeploymentReplicasAllAvailable
  - lastTransitionTime: "2024-11-21T07:40:01Z"
    message: Deployment is not actively rolling out
    reason: DeploymentNotRollingOut
    status: "False"
    type: DeploymentRollingOut
  - lastTransitionTime: "2024-11-21T07:39:26Z"
    message: The endpoint publishing strategy supports a managed load balancer
    reason: WantedByEndpointPublishingStrategy
    status: "True"
    type: LoadBalancerManaged
  - lastTransitionTime: "2024-11-21T07:39:29Z"
    message: The LoadBalancer service is provisioned
    reason: LoadBalancerProvisioned
    status: "True"
    type: LoadBalancerReady
  - lastTransitionTime: "2024-11-21T07:39:26Z"
    message: LoadBalancer is not progressing
    reason: LoadBalancerNotProgressing
    status: "False"
    type: LoadBalancerProgressing
  - lastTransitionTime: "2024-11-21T07:39:26Z"
    message: DNS management is supported and zones are specified in the cluster DNS
      config.
    reason: Normal
    status: "True"
    type: DNSManaged
  - lastTransitionTime: "2024-11-21T07:39:30Z"
    message: The record is provisioned in all reported zones.
    reason: NoFailedZones
    status: "True"
    type: DNSReady
  - lastTransitionTime: "2024-11-21T07:40:01Z"
    status: "True"
    type: Available
  - lastTransitionTime: "2024-11-21T07:40:01Z"
    status: "False"
    type: Progressing
  - lastTransitionTime: "2024-11-21T07:40:01Z"
    status: "False"
    type: Degraded
  - lastTransitionTime: "2024-11-21T07:39:26Z"
    message: IngressController is upgradeable.
    reason: Upgradeable
    status: "True"
    type: Upgradeable
  - lastTransitionTime: "2024-11-21T07:39:26Z"
    message: No evaluation condition is detected.
    reason: NoEvaluationCondition
    status: "False"
    type: EvaluationConditionsDetected
  domain: nlb.ci-ln-zby948b-76ef8.aws-2.ci.openshift.org
  endpointPublishingStrategy:
    loadBalancer:
      dnsManagementPolicy: Managed
      providerParameters:
        aws:
          networkLoadBalancer: {}
          type: NLB            <----------------------------
        type: AWS
      scope: External
    type: LoadBalancerService
  observedGeneration: 2
  selector: ingresscontroller.operator.openshift.io/deployment-ingresscontroller=nlb
  tlsProfile:
    ciphers:
    - ECDHE-ECDSA-AES128-GCM-SHA256
    - ECDHE-RSA-AES128-GCM-SHA256
    - ECDHE-ECDSA-AES256-GCM-SHA384
    - ECDHE-RSA-AES256-GCM-SHA384
    - ECDHE-ECDSA-CHACHA20-POLY1305
    - ECDHE-RSA-CHACHA20-POLY1305
    - DHE-RSA-AES128-GCM-SHA256
    - DHE-RSA-AES256-GCM-SHA384
    - TLS_AES_128_GCM_SHA256
    - TLS_AES_256_GCM_SHA384
    - TLS_CHACHA20_POLY1305_SHA256
    minTLSVersion: VersionTLS12
mjoseph@mjoseph-mac Downloads % 

Hence marking as verified

@melvinjoseph86
Copy link

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Nov 21, 2024
@melvinjoseph86
Copy link

/retest

Copy link
Contributor

openshift-ci bot commented Nov 21, 2024

@gcs278: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants