Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: Delete unused GCE load-balancer resources on ingress spec change #894

Merged
merged 1 commit into from
Nov 13, 2019

Conversation

skmatti
Copy link
Contributor

@skmatti skmatti commented Oct 16, 2019

Fixes: #32, #465, #764

/assign @bowei @freehan

The Ensure workflow of Loadbalancer pool is modified in order to delete unused GCE resources.
This looks at user specified ingress options/annotations to determine if we need to delete specific frontend resources.

  • AllowHTTP is set to false:
    if existing ingress annotations has http resources, then those will be deleted.
  • SSL is not configured:
    if existing ingress annotations has https resources, then those will be deleted.
  • If User specifies an IP:
    We try to delete Ingress managed static IP if exists. This is performed only when both http and https are enabled.

Note that the existing behavior of load-balancer pool does not update frontend resource annotations for the above mentioned cases. This modifies this behavior to update these annotations which will be used for determining whether we need to delete frontend resources.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 16, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @skmatti. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Oct 16, 2019
@skmatti skmatti force-pushed the resource-leak-fix branch 2 times, most recently from 12d20d1 to e91b460 Compare October 16, 2019 01:28
@skmatti skmatti changed the title Delete unused GCE load-balancer resources on ingress spec change Bugfix: Delete unused GCE load-balancer resources on ingress spec change Oct 16, 2019
@skmatti skmatti force-pushed the resource-leak-fix branch 4 times, most recently from 57438f3 to 44ce7f8 Compare October 18, 2019 20:53
@freehan
Copy link
Contributor

freehan commented Oct 18, 2019

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 18, 2019
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 21, 2019
@skmatti
Copy link
Contributor Author

skmatti commented Oct 21, 2019

/assign @MrHohn

@skmatti skmatti force-pushed the resource-leak-fix branch 2 times, most recently from 604ffda to 39471f2 Compare November 4, 2019 18:23
} else {
delete(existing, annotations.TargetHttpsProxyKey)
}
if l.ip != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not delete it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IP is retained for use after it's promoted to static. This will be deleted when load-balancer is deleted or when use specifies a different IP. Added a comment to describe that.

@skmatti
Copy link
Contributor Author

skmatti commented Nov 12, 2019

/assign @rramkumar1

@skmatti skmatti force-pushed the resource-leak-fix branch 2 times, most recently from 18f6887 to 29c25b4 Compare November 12, 2019 18:15
Copy link
Contributor

@rramkumar1 rramkumar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round of comments.

@@ -77,6 +77,25 @@ const (
// - annotations:
// networking.gke.io/v1beta1.FrontendConfig: 'my-frontendconfig'
FrontendConfigKey = "networking.gke.io/v1beta1.FrontendConfig"

// UrlMapKey is the annotation key used by controller to record GCP URL map.
UrlMapKey = StatusPrefix + "/url-map"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does StatusPrefix need to be exported anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are couple of other places where this is referenced.

resourceName, _ = ingAnnotations[fmt.Sprintf("%v/%v", annotations.StatusPrefix, resourceName)]

pkg/loadbalancers/l7.go Outdated Show resolved Hide resolved
annotations.TargetHttpProxyKey,
}...)
default:
klog.Fatalf("Invalid frontend resource protocol %v", protocol)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would technically be a programming error to ever get to this point so ideally unit tests should catch this but if not, do we want the behavior to be klog.Fatalf? I wonder if we should log this but not do anything since I'm not sure we want the controller to crash because of this?. Worth thinking about alternatives like returning an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, modified this to surface an error instead.

return nil
}
staticIPName := l.namer.ForwardingRule(namer.HTTPProtocol)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the motivation for changing the name of this var?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to move this line to the top of the loop. At that point, we are still nor sure if this is the static IP name. getEffectiveIP may return a different name for static IP.

if key, err = l.CreateKey(fwsName); err != nil {
// deleteHttp deletes http forwarding rule and target http proxy.
func (l *L7) deleteHttp(versions *features.ResourceVersions) error {
frName := l.namer.ForwardingRule(namer.HTTPProtocol)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first couple lines of this func and deleteHttps are the same. Maybe consider wrapping that logic in a func with unit tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

versions := features.GAResourceVersions
certName1 := feNamer.SSLCertName(GetCertHash("cert1"))

testCases := []struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a description to the test case struct itself. That is the typical way it is done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@freehan
Copy link
Contributor

freehan commented Nov 12, 2019

Discussed offline, put it here for reference:

The code added here is quite embedded into the workflow. My main concern was that there is no easy way to flag gate this behavior. I would recommend gathering all the logic and move it into a trim function and call it after edgeHop with flag gating.

@rramkumar1
Copy link
Contributor

Discussed offline, put it here for reference:

The code added here is quite embedded into the workflow. My main concern was that there is no easy way to flag gate this behavior. I would recommend gathering all the logic and move it into a trim function and call it after edgeHop with flag gating.

+1

@skmatti
Copy link
Contributor Author

skmatti commented Nov 12, 2019

Discussed offline, put it here for reference:
The code added here is quite embedded into the workflow. My main concern was that there is no easy way to flag gate this behavior. I would recommend gathering all the logic and move it into a trim function and call it after edgeHop with flag gating.

+1

Done

pkg/loadbalancers/l7.go Outdated Show resolved Hide resolved
if certErr != nil {
return certErr
if err := utils.IgnoreHTTPNotFound(composite.DeleteSslCertificate(l.cloud, key, versions.SslCertificate)); err != nil {
klog.Errorf("Old cert delete failed - %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to log here if the error is returned? Is the error logged higher up in the stack?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not logged higher up in the stack. Also, this was just refactored from the existing workflow. May be we want to keep this?

@rramkumar1
Copy link
Contributor

Looks good to me. Will leave final approval to @freehan

Copy link
Contributor

@freehan freehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 13, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: freehan, skmatti

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 13, 2019
@k8s-ci-robot k8s-ci-robot merged commit 4741042 into kubernetes:master Nov 13, 2019
@skmatti skmatti deleted the resource-leak-fix branch November 19, 2019 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[GLBC] Changing front-end configuration does not remove unnecessary target proxies/ssl-certs
6 participants