Remove backend from external backends if same backend name #8430

freddyesteban · 2022-04-04T20:41:55Z

What this PR does / why we need it:

Remove backend from external backends if new backend has the same name, prevents using old cached external backend when type of backend is changed.

When changing Service object type from ExternalName to ClusterIP, the backend in backends_with_external_name in the lua balancer is never removed causing the External backend to keep serving traffic. This PR removes the balancer from backends_with_external_name if it has the same name.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation only

Which issue/s this PR fixes

fixes #8440

How Has This Been Tested?

Update balancer.lua code with fix
Run make dev-env
Created a Deployment to serve a static page using tag nginx:latest
Created Service object of type ExternalName pointing to an external website, traffic is routed appropriately
Changed Service object to type ClusterIP to route traffic to Deployment pod, traffic is routed appropriately (prior to this PR, traffic was still routed using external backend).

Checklist:

My change requires a change to the documentation.
I have updated the documentation accordingly.
I've read the CONTRIBUTION guide
I have added tests to cover my changes.
All new and existing tests passed.

linux-foundation-easycla · 2022-04-04T20:41:58Z

The committers listed above are authorized under a signed CLA.

✅ login: freddyesteban / name: Freddy Esteban Perez (a826cea, fb2b55b)

k8s-ci-robot · 2022-04-04T20:42:02Z

@freddyesteban: This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2022-04-04T20:42:04Z

Hi @freddyesteban. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

longwuyuan · 2022-04-05T02:15:56Z

I think, there is too much assumption here
It will be nice to see how a user lands in the situation where there is a backend lingering.
I think there was a issue raised about a similar situation but there too, a detailed description and step-by-step instruction to reproduce the problem of lingering backend was not provided.
It could be very likely true that there is lingering backend that needs to be manually removed, but this comment post is to get clarity on the simple fact as to why someone would first create a ingress with a externalName type service as backend in the first place. And then follow it up with editing it instead of deleting the Ingress and creating a new one.

freddyesteban · 2022-04-05T17:01:04Z

I think, there is too much assumption here

It will be nice to see how a user lands in the situation where there is a backend lingering.

I think there was a issue raised about a similar situation but there too, a detailed description and step-by-step instruction to reproduce the problem of lingering backend was not provided.

It could be very likely true that there is lingering backend that needs to be manually removed, but this comment post is to get clarity on the simple fact as to why someone would first create a ingress with a externalName type service as backend in the first place. And then follow it up with editing it instead of deleting the Ingress and creating a new one.

@longwuyuan thank you for taking the time to look at our PR.

Our use case for changing the Service type in flight is to spin down all pods if not in use (spin to zero), and point to an external service to signals the user that pod is spun down ( a please wait page) until it wakes up. Our automation "wakes up" the pods by scaling the deployment replicas back to 1 and the Service Object is switched to use the ClusterIP. This particular behavior worked on an older version of the controller, the nginx controller would cause a reload for the change though. In this version of the controller, it attempt to perform the change dynamically using the Lua balancer which is great so we can hopefully avoid the reload.

You've suggested why not delete the ingress object and we could do that but found that just updating the Service type avoids a reload. Our clusters are relative big and this could be very advantageous to our use case.

To replicate the issue, I've put together a step-by-step guide here.

To see the difference with our change, follow step-by-step guide here.

Why fixing removing the cached external backend with the same name is important to us?
Changing Service type to ClusterIP is a dynamic reconfiguration without reload. If you look at logs I provided here.

We're aware that we could delete the ingress object or even update it in place and that would work but that causes a reload. Which it could be ok since the old ingress controller object was doing this but we'd like to take advantage of the dynamic reload our fix is providing.

Here are two step-by-step guide for the approach of deleting the ingress object and just recreating, the logs will show a backend reload, see here.

Here are two step-by-step guide for the approach of updating the ingress object to use a separate Service object, the logs will show a backend reload, see here.

longwuyuan · 2022-04-06T00:45:32Z

Hi @freddyesteban ,
Thanks for the detailed explaining and the reproduce procedure. It helps.
My first request is that you create a new issue and put all the details you explained here in that issue.

Please explode all the data related to reproducing the problem in that issue.
Link that issue here with the string fixes <pound/hash symbol> <issuenumber>

My next request is please write tests. I think there should be some assurance that just checking for a pre-existing stale backend in Lua does not interfere with any other codepath. Please write tests that you think will provide this assurance. I din't expect any panic for a if condition check, but a test should confirm that for users who don't create a externalName type service and edit it later, there is not going to be any impact. I don't even know if the externalName type existing but not being pointed to the internet but to a custom destination make sa difference or not. Basically, please write all the tests that will provide the assurance needed.

longwuyuan · 2022-04-07T04:45:26Z

/kind feature

longwuyuan · 2022-04-07T04:52:19Z

@freddyesteban on a very different note, if. you have already tried https://kubernetes.github.io/ingress-nginx/examples/customization/custom-errors/#custom-errors, please write a note on why custom-errors are not a preferred solution for the use case you described. It would be such a clean and supported solution to server the "please wait" page from custom-backend.

tao12345666333 · 2022-04-07T15:10:13Z

Please sign CLA

tao12345666333 · 2022-04-07T15:18:33Z

/assign

freddyesteban · 2022-04-07T15:38:22Z

Please sign CLA

@tao12345666333 We were under the impression that we as a company had already done that but I think the project moved to EasyCLA and that's no longer the case. My manager has filed a ticket to get that fixed. Thank you.

freddyesteban · 2022-04-07T15:41:29Z

@longwuyuan

Thank you. I created the issue and linked it to this PR. In regards to testing, the change would only remove the backend from backends_with_external_name after the sync_backend of a non-external backend sync, removing it from the externals would not affect a user not having an external backend. The lookup of the backend name in the table is safe and would not panic. If there's change between external types, e.g. external backend is reconfigures to point to a custom destination, it is not affected because in order for that code path, the user would have to change the type of the Service to non-external. I've attempted to add test before but in order to test the table backends_with_external_name, I'd have to break encapsulation because the function updating the table and the table itself are not exported. I'm working on it anyways atm. Thoughts on exporting the sync_backends and backends_with_external_name for testing purposes?

I'm new to Lua, apologies if there's a better way to approach the testing and if you have any suggestions or could point me in the right direction, I'd appreciate that.

freddyesteban · 2022-04-07T15:51:37Z

@freddyesteban on a very different note, if. you have already tried https://kubernetes.github.io/ingress-nginx/examples/customization/custom-errors/#custom-errors, please write a note on why custom-errors are not a preferred solution for the use case you described. It would be such a clean and supported solution to server the "please wait" page from custom-backend.

For us at least, It's more of routing to a particular external service when the Service changes rather than creating a default backend that could handle our particular use case. With enough work, I think we could find multiple solutions including your suggestion of deleting ingress objects. We'd like to have our please wait service decoupled from the nginx controller deployments as it serves multiple clusters, that's just one factor.

…e, prevents using old cached external backend

…or unit test purposes

freddyesteban · 2022-04-13T17:59:46Z

@longwuyuan @tao12345666333 thoughts on the changes?

rikatz · 2022-05-01T21:28:31Z

/ok-to-test
@tao12345666333 can you please take a look. It does make sense to me, but I'm a bit worried every time I mess with Lua code ;)

Thanks

…e, prevents using old cached external backend

…or unit test purposes

…-main Fix linting and rebase kubernetes main

tao12345666333 · 2022-05-04T06:31:04Z

Sure. It's on my queue.

tao12345666333 · 2022-05-04T06:31:33Z

/test pull-ingress-nginx-test-lua

tao12345666333 · 2022-05-04T06:33:25Z

/retest

tao12345666333 · 2022-05-04T06:38:18Z

Errors in CI are not related to code changes, which may have something to do with test-infra. I will do a code review

https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/ingress-nginx/8430/pull-ingress-nginx-test-lua/1521739764124880896/build-log.txt

build/run-in-docker.sh: line 65: USER: unbound variable
build/run-in-docker.sh: line 65: USER: unbound variable
build/run-in-docker.sh: line 65: docker: command not found
make: *** [Makefile:146: lua-test] Error 127

longwuyuan · 2022-05-04T06:51:29Z

Errors in CI are not related to code changes, which may have something to do with test-infra. I will do a code review

Ricardo had to remove a if condition that checks for DIND, because prow was failing e2e but local/laptop e2e was working.
Now you are reporting run-in-docker.sh related error message. Hope we are aware of any underlying infra/prow changes to avoid spiralling out of control. There was no announcement though and I had a success with e2e on laptop in the last 24 hours so surely its related to prow.

tao12345666333 · 2022-05-04T13:07:08Z

/retest

freddyesteban · 2022-06-07T17:28:45Z

@tao12345666333 any updates on this or anything I should be doing to help get this over the line? TIA

k8s-triage-robot · 2022-09-05T17:47:34Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

freddyesteban · 2022-09-11T12:42:30Z

/remove-lifecycle stale

k8s-triage-robot · 2022-12-10T13:03:53Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

freddyesteban · 2022-12-12T18:31:09Z

/remove-lifecycle stale

Co-authored-by: William Quade <[email protected]>

add codeowners

k8s-ci-robot · 2023-04-25T16:14:55Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: freddyesteban
Once this PR has been reviewed and has the lgtm label, please ask for approval from tao12345666333. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2023-04-25T16:15:39Z

@freddyesteban: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-ingress-nginx-test-lua	`39ecf8b`	link	true	`/test pull-ingress-nginx-test-lua`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Apr 4, 2022

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 4, 2022

k8s-ci-robot added needs-priority size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. area/lua Issues or PRs related to lua code labels Apr 4, 2022

k8s-ci-robot requested review from rikatz and tao12345666333 April 4, 2022 20:42

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Apr 7, 2022

k8s-ci-robot assigned tao12345666333 Apr 7, 2022

freddyesteban added 2 commits April 7, 2022 13:01

Remove backend from external backends if new backend has the same nam…

a826cea

…e, prevents using old cached external backend

Export table backends_with_external_name and function sync_backends f…

fb2b55b

…or unit test purposes

freddyesteban force-pushed the main branch from cf435ba to fb2b55b Compare April 7, 2022 18:02

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 7, 2022

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 1, 2022

freddyesteban and others added 5 commits May 1, 2022 17:16

Remove backend from external backends if new backend has the same nam…

23929ef

…e, prevents using old cached external backend

Export table backends_with_external_name and function sync_backends f…

4f299ea

…or unit test purposes

fix linting issue

4f6d00b

Merge branch 'main' into fix-linting-and-rebase-kubernetes-main

24e474c

Merge pull request #4 from wpengine/fix-linting-and-rebase-kubernetes…

f844a65

…-main Fix linting and rebase kubernetes main

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 5, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 11, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 10, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 12, 2022

freddyesteban and others added 3 commits April 25, 2023 10:04

add codeowners

0771d05

Update .github/CODEOWNERS

5c814dd

Co-authored-by: William Quade <[email protected]>

Merge pull request #5 from wpengine/add-code-owners

39ecf8b

add codeowners

freddyesteban closed this Apr 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove backend from external backends if same backend name #8430

Remove backend from external backends if same backend name #8430

freddyesteban commented Apr 4, 2022 •

edited

Loading

linux-foundation-easycla bot commented Apr 4, 2022 •

edited

Loading

k8s-ci-robot commented Apr 4, 2022

k8s-ci-robot commented Apr 4, 2022

longwuyuan commented Apr 5, 2022

freddyesteban commented Apr 5, 2022

longwuyuan commented Apr 6, 2022

longwuyuan commented Apr 7, 2022

longwuyuan commented Apr 7, 2022

tao12345666333 commented Apr 7, 2022

tao12345666333 commented Apr 7, 2022

freddyesteban commented Apr 7, 2022

freddyesteban commented Apr 7, 2022 •

edited

Loading

freddyesteban commented Apr 7, 2022

freddyesteban commented Apr 13, 2022

rikatz commented May 1, 2022

tao12345666333 commented May 4, 2022

tao12345666333 commented May 4, 2022

tao12345666333 commented May 4, 2022

tao12345666333 commented May 4, 2022

longwuyuan commented May 4, 2022

tao12345666333 commented May 4, 2022

freddyesteban commented Jun 7, 2022

k8s-triage-robot commented Sep 5, 2022

freddyesteban commented Sep 11, 2022

k8s-triage-robot commented Dec 10, 2022

freddyesteban commented Dec 12, 2022

k8s-ci-robot commented Apr 25, 2023

k8s-ci-robot commented Apr 25, 2023

Remove backend from external backends if same backend name #8430

Remove backend from external backends if same backend name #8430

Conversation

freddyesteban commented Apr 4, 2022 • edited Loading

What this PR does / why we need it:

Types of changes

Which issue/s this PR fixes

How Has This Been Tested?

Checklist:

linux-foundation-easycla bot commented Apr 4, 2022 • edited Loading

k8s-ci-robot commented Apr 4, 2022

k8s-ci-robot commented Apr 4, 2022

longwuyuan commented Apr 5, 2022

freddyesteban commented Apr 5, 2022

longwuyuan commented Apr 6, 2022

longwuyuan commented Apr 7, 2022

longwuyuan commented Apr 7, 2022

tao12345666333 commented Apr 7, 2022

tao12345666333 commented Apr 7, 2022

freddyesteban commented Apr 7, 2022

freddyesteban commented Apr 7, 2022 • edited Loading

freddyesteban commented Apr 7, 2022

freddyesteban commented Apr 13, 2022

rikatz commented May 1, 2022

tao12345666333 commented May 4, 2022

tao12345666333 commented May 4, 2022

tao12345666333 commented May 4, 2022

tao12345666333 commented May 4, 2022

longwuyuan commented May 4, 2022

tao12345666333 commented May 4, 2022

freddyesteban commented Jun 7, 2022

k8s-triage-robot commented Sep 5, 2022

freddyesteban commented Sep 11, 2022

k8s-triage-robot commented Dec 10, 2022

freddyesteban commented Dec 12, 2022

k8s-ci-robot commented Apr 25, 2023

k8s-ci-robot commented Apr 25, 2023

freddyesteban commented Apr 4, 2022 •

edited

Loading

linux-foundation-easycla bot commented Apr 4, 2022 •

edited

Loading

freddyesteban commented Apr 7, 2022 •

edited

Loading