openstack: Don't Delete LB in Case of Security Group Reconciliation Errors #82264

multi-io · 2019-09-03T10:52:11Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

This fixes the legacy Openstack cloud provider's lbaas control loop so the EnsureLoadBalancer() function no longer deletes the LB if something went wrong when reconciling the LB's security groups. With the current master, if you have an LB service and associated LB already up and running and working fine, and then during a reconcile loop (which shouldn't change anything) e.g. the OpenStack API is down temporarily at the wrong moment (i.e. if it's still up during the LB and listener reconciliation, but then down during the SG reconciliation), then the whole LB will be deleted. We saw this exact thing happen in a real world customer application, which went offline because of if (the LB is recreated shortly after, but likely with a different floating IP).

Deleting the LB in case of errors in a "reconcile" (rather than "create") function seems just wrong, and all the other parts of EnsureLoadBalancer() don't do it either: E.g. if a transient error occurs when creating a listener, we just return it and leave the LB in a half-created state (

kubernetes/staging/src/k8s.io/legacy-cloud-providers/openstack/openstack_loadbalancer.go

Lines 785 to 788 in c7c89f8

    
           if err != nil { 
        
           	// Unknown error, retry later 
        
           	return nil, fmt.Errorf("error creating LB listener: %v", err) 
        
           }

), and the service controller will catch that error and re-queue the work item (

kubernetes/pkg/controller/service/service_controller.go

Lines 255 to 256 in 3fe7a57

    
           runtime.HandleError(fmt.Errorf("error processing service %v (will retry): %v", key, err)) 
        
           s.queue.AddRateLimited(key)

) so the LB creation will go through eventually.

This PR just fixes the SG reconciliation to follow the same pattern. It seems to me that the current "delete LB in case an an error" approach was originally not part of a "reconcile" function but of a "create" function, where it would've made more sense.

The same bug is present in the new out-of-tree openstack cloud provider; I've submitted a corresponding PR there (kubernetes/cloud-provider-openstack#743). We'd still like to fix this error in-tree as well and also have the fix backported to 1.15 and 1.14 (please?) because our migration to cloud controller manager is still in the early planning stages and will take more time.

Which issue(s) this PR fixes:

Fixes #35056

Release note:

Openstack: Do not delete managed LB in case of security group reconciliation errors

k8s-ci-robot · 2019-09-03T10:52:19Z

Hi @multi-io. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

…errors

alvaroaleman · 2019-09-04T13:21:50Z

/ok-to-test

alvaroaleman · 2019-09-04T16:13:35Z

/assign @dims

bashofmann · 2019-10-10T09:06:49Z

@dims Is there anything that blocks merging this? In the out-of-tree openstack cloud provider the fix has already been merged: kubernetes/cloud-provider-openstack#743

dims · 2019-10-10T10:44:47Z

@multi-io Ack!

/approve
/lgtm

k8s-ci-robot · 2019-10-10T10:45:24Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims, multi-io

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~staging/src/k8s.io/legacy-cloud-providers/openstack/OWNERS~~ [dims]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…ete_lb_on_errors openstack: Don't Delete LB in Case of Security Group Reconciliation Errors

…2264-upstream-release-1.16 Automated cherry pick of #82264: openstack: do not delete LB in case of security group

justaugustus · 2020-02-08T19:32:50Z

/kind bug
/priority important-soon

…2264-upstream-release-1.15 Automated cherry pick of #82264: openstack: do not delete LB in case of security group

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 3, 2019

k8s-ci-robot requested review from anguslees and FengyunPan2 September 3, 2019 10:53

k8s-ci-robot added area/cloudprovider sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 3, 2019

multi-io mentioned this pull request Sep 3, 2019

Do not Delete LB in Case of Security Group Reconciliation Errors kubernetes/cloud-provider-openstack#743

Merged

openstack: do not delete LB in case of security group reconciliation …

7a3f15a

…errors

multi-io force-pushed the openstack_dont_delete_lb_on_errors branch from eeee973 to 7a3f15a Compare September 3, 2019 12:16

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Sep 4, 2019

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 4, 2019

k8s-ci-robot assigned dims Sep 4, 2019

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 10, 2019

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 10, 2019

k8s-ci-robot merged commit 8ad7e78 into kubernetes:master Oct 10, 2019

k8s-ci-robot added this to the v1.17 milestone Oct 10, 2019

ohsewon pushed a commit to ohsewon/kubernetes that referenced this pull request Oct 16, 2019

Merge pull request kubernetes#82264 from syseleven/openstack_dont_del…

82c7203

…ete_lb_on_errors openstack: Don't Delete LB in Case of Security Group Reconciliation Errors

k8s-ci-robot added a commit that referenced this pull request Feb 8, 2020

Merge pull request #83737 from bashofmann/automated-cherry-pick-of-#8…

a30889b

…2264-upstream-release-1.16 Automated cherry pick of #82264: openstack: do not delete LB in case of security group

k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 8, 2020

k8s-ci-robot added a commit that referenced this pull request Feb 8, 2020

Merge pull request #83738 from bashofmann/automated-cherry-pick-of-#8…

01860e7

…2264-upstream-release-1.15 Automated cherry pick of #82264: openstack: do not delete LB in case of security group

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

openstack: Don't Delete LB in Case of Security Group Reconciliation Errors #82264

openstack: Don't Delete LB in Case of Security Group Reconciliation Errors #82264

multi-io commented Sep 3, 2019 •

edited

Loading

k8s-ci-robot commented Sep 3, 2019

alvaroaleman commented Sep 4, 2019

alvaroaleman commented Sep 4, 2019

bashofmann commented Oct 10, 2019

dims commented Oct 10, 2019

k8s-ci-robot commented Oct 10, 2019

justaugustus commented Feb 8, 2020

	if err != nil {
	// Unknown error, retry later
	return nil, fmt.Errorf("error creating LB listener: %v", err)
	}

	runtime.HandleError(fmt.Errorf("error processing service %v (will retry): %v", key, err))
	s.queue.AddRateLimited(key)

openstack: Don't Delete LB in Case of Security Group Reconciliation Errors #82264

openstack: Don't Delete LB in Case of Security Group Reconciliation Errors #82264

Conversation

multi-io commented Sep 3, 2019 • edited Loading

k8s-ci-robot commented Sep 3, 2019

alvaroaleman commented Sep 4, 2019

alvaroaleman commented Sep 4, 2019

bashofmann commented Oct 10, 2019

dims commented Oct 10, 2019

k8s-ci-robot commented Oct 10, 2019

justaugustus commented Feb 8, 2020

multi-io commented Sep 3, 2019 •

edited

Loading