New option to set a shutdown grace period #5855

bcarlsson · 2020-07-06T20:07:59Z

What this PR does / why we need it:

When running the ingress controller behind a load balancer in AWS, we need to have a grace period for the shutdown process in case of a scale down event. If not, the pod might be terminated before it has been removed from the load balancer target group.

This PR gives the user an option to set a grace period for when the shutdown phase has been triggered and the actual shutdown begins.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Which issue/s this PR fixes

fixes #4726

How Has This Been Tested?

It has been deployed in a running Kubernetes cluster and verified by monitoring the /healthz endpoint.

Checklist:

My change requires a change to the documentation.
I have updated the documentation accordingly.
I've read the CONTRIBUTION guide
I have added tests to cover my changes.
All new and existing tests passed.

k8s-ci-robot · 2020-07-06T20:08:03Z

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
If you have done the above and are still having issues with the CLA being reported as unsigned, please log a ticket with the Linux Foundation Helpdesk: https://support.linuxfoundation.org/
Should you encounter any issues with the Linux Foundation Helpdesk, send a message to the backup e-mail support address at: [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot · 2020-07-06T20:08:07Z

Welcome @bcarlsson!

It looks like this is your first PR to kubernetes/ingress-nginx 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/ingress-nginx has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2020-07-06T20:08:07Z

Hi @bcarlsson. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

aledbf · 2020-07-06T22:45:10Z

@bcarlsson please add e2e tests.
How this change interacts with the wait-shutdown preStop ?

aledbf · 2020-07-06T22:46:20Z

/ok-to-test

aledbf · 2020-07-06T22:46:26Z

/hold

k8s-ci-robot · 2020-07-06T22:48:44Z

@bcarlsson: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
pull-ingress-nginx-e2e-1-15	806423e51132815971513683bd485e77fe6d5741	link	`/test pull-ingress-nginx-e2e-1-15`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

bcarlsson · 2020-07-07T09:49:40Z

@bcarlsson please add e2e tests.
How this change interacts with the wait-shutdown preStop ?

/wait-shutdown preStop will trigger /healthz endpoint to return not ok while /livez remains ok during the shutdown grace period.
We'll look into e2e tests.

bcarlsson · 2020-08-28T11:37:11Z

@aledbf I've now added e2e tests to verify both endpoints during the shutdown grace period.

bcarlsson · 2020-09-04T12:35:56Z

/assign @bowei

aledbf · 2020-09-04T12:42:46Z

@bcarlsson please don't assign PRs manually

bcarlsson · 2020-09-04T12:54:12Z

@aledbf ok sorry, tried to follow an earlier comment by the robot.

codecov-io · 2020-10-07T10:15:51Z

Codecov Report

Merging #5855 (54b13bd) into master (2f08e87) will increase coverage by 0.07%.
The diff coverage is 64.22%.

@@            Coverage Diff             @@
##           master    #5855      +/-   ##
==========================================
+ Coverage   55.75%   55.82%   +0.07%     
==========================================
  Files          91       94       +3     
  Lines        6451     6579     +128     
==========================================
+ Hits         3597     3673      +76     
- Misses       2411     2447      +36     
- Partials      443      459      +16

Impacted Files	Coverage Δ
internal/ingress/controller/nginx.go	`28.71% <0.00%> (-0.15%)`	⬇️
internal/ingress/controller/template/configmap.go	`75.84% <ø> (ø)`
internal/ingress/sslcert.go	`0.00% <0.00%> (ø)`
internal/ingress/types.go	`0.00% <0.00%> (ø)`
internal/ingress/types_equals.go	`14.15% <0.00%> (-0.18%)`	⬇️
...ternal/ingress/annotations/globalratelimit/main.go	`64.70% <64.70%> (ø)`
internal/ingress/controller/controller.go	`46.68% <71.42%> (+0.07%)`	⬆️
internal/ingress/controller/template/template.go	`76.82% <79.59%> (+0.19%)`	⬆️
internal/net/ipnet.go	`92.00% <81.81%> (+6.28%)`	⬆️
cmd/nginx/flags.go	`81.63% <100.00%> (+0.18%)`	⬆️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 032fccb...54b13bd. Read the comment docs.

bcarlsson · 2020-10-07T10:23:16Z

@aledbf as I wrote in an earlier comment, we do no longer see the point of a new /livez endpoint since the pod would already be in a shutting down state.
I've reverted the /livez commits and added a test for only the shutdown grace period.

aledbf · 2020-10-07T23:44:35Z

@bcarlsson please squash the commits

bcarlsson · 2020-10-08T15:50:21Z

@aledbf I've squashed the commits.

jeroenj · 2020-12-10T08:15:23Z

It looks like this is good to go? (except for the PR title)

We've been eagerly waiting to get this merged in as on clusters that scale heavily we get some failed requests when the ingress-nginx pods are shutting down but the (GKE) load balancers are still sending traffic to it.

Zhann · 2021-01-25T14:26:42Z

What's the state of this? Anything blocking?

aledbf · 2021-01-25T16:57:07Z

This change is missing the update of the helm chart values to show how to configure the endpoint and also an e2e test using it.

bcarlsson · 2021-01-26T07:45:45Z

@aledbf I apologize for any confusion. As I wrote in a comment earlier on September 14th, we realized that a new /livez endpoint did not add any value since the pod would already be in a "shutting down" state and would be terminated after TerminationGracePeriodSeconds had elapsed.

As commented on Oct 7, I did revert all changes for a new /livez endpoint but kept the feature to set a grace period before the actual shutdown would begin.

Is a update of the helm chart needed for this as there are no changes to a /livez endpoint anymore?

I know that only the last part of the PR description reflects the actual change now. Do you want me to update the title and description?

jeroenj · 2021-01-26T09:40:47Z

I know that only the last part of the PR description reflects the actual change now. Do you want me to update the title and description?

Personally I'd update the description (and title) to reflect the actual change. That said I do think it's gook to keep the initial description around for historic reference as to how this started out and how it came to the current state. That's something we do internally too.

aledbf · 2021-01-26T22:14:36Z

Do you want me to update the title and description?

Yes, please.

bcarlsson · 2021-01-27T15:41:22Z

Do you want me to update the title and description?

Yes, please.

I've now updated the title and description.

aledbf · 2021-01-27T16:57:38Z

/test pull-ingress-nginx-e2e-1-19
/test pull-ingress-nginx-e2e-1-20

aledbf · 2021-01-27T16:59:25Z

/lgtm
/approve

aledbf · 2021-01-27T16:59:32Z

@bcarlsson thanks!

k8s-ci-robot · 2021-01-27T16:59:40Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aledbf, bcarlsson

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [aledbf]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

aledbf · 2021-01-27T17:00:20Z

@bcarlsson please remove the "Old description for reference" section. There is no code related to that description. Even if that was the initial intention, is confusing.

bcarlsson · 2021-01-27T17:16:18Z

@bcarlsson please remove the "Old description for reference" section. There is no code related to that description. Even if that was the initial intention, is confusing.

Done

aledbf · 2021-01-27T18:25:32Z

/hold cancel

k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Jul 6, 2020

k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 6, 2020

k8s-ci-robot requested review from cmluciano and ElvinEfendi July 6, 2020 20:08

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 6, 2020

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 6, 2020

bcarlsson force-pushed the master branch from 806423e to 33fc6ce Compare August 28, 2020 09:01

bcarlsson force-pushed the master branch from 33fc6ce to 3b9657a Compare August 28, 2020 09:08

k8s-ci-robot assigned bowei Sep 4, 2020

aledbf assigned aledbf and unassigned bowei Sep 4, 2020

aledbf mentioned this pull request Sep 9, 2020

Add wait-before-shutdown command line option #5237

Closed

8 tasks

bcarlsson force-pushed the master branch from 3b9657a to beca338 Compare October 7, 2020 10:09

bcarlsson force-pushed the master branch from beca338 to 55a6684 Compare October 8, 2020 08:36

calvinbui mentioned this pull request Nov 11, 2020

Zero downtime upgrade without race conditions #6105

Closed

Add flag to allow setting a shutdown grace period

54b13bd

bcarlsson changed the title ~~New livez endpoint added with option to set shutdown grace period~~ New option to set a shutdown grace period Jan 27, 2021

bcarlsson force-pushed the master branch from 55a6684 to 54b13bd Compare January 27, 2021 15:40

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 27, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 27, 2021

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 27, 2021

k8s-ci-robot merged commit 54b6843 into kubernetes:master Jan 27, 2021

ravisharda mentioned this pull request Jan 29, 2021

How to avoid connection resets upon scale-down of NGINX Ingress pods? #6791

Closed

kubernetes deleted a comment from aledbf May 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New option to set a shutdown grace period #5855

New option to set a shutdown grace period #5855

bcarlsson commented Jul 6, 2020 •

edited

Loading

k8s-ci-robot commented Jul 6, 2020

k8s-ci-robot commented Jul 6, 2020

k8s-ci-robot commented Jul 6, 2020

aledbf commented Jul 6, 2020 •

edited

Loading

aledbf commented Jul 6, 2020

aledbf commented Jul 6, 2020

k8s-ci-robot commented Jul 6, 2020 •

edited

Loading

bcarlsson commented Jul 7, 2020 •

edited

Loading

bcarlsson commented Aug 28, 2020

bcarlsson commented Sep 4, 2020

aledbf commented Sep 4, 2020

bcarlsson commented Sep 4, 2020

codecov-io commented Oct 7, 2020 •

edited

Loading

bcarlsson commented Oct 7, 2020

aledbf commented Oct 7, 2020

bcarlsson commented Oct 8, 2020

jeroenj commented Dec 10, 2020

Zhann commented Jan 25, 2021

aledbf commented Jan 25, 2021

bcarlsson commented Jan 26, 2021

jeroenj commented Jan 26, 2021

aledbf commented Jan 26, 2021

bcarlsson commented Jan 27, 2021

aledbf commented Jan 27, 2021

aledbf commented Jan 27, 2021

aledbf commented Jan 27, 2021

k8s-ci-robot commented Jan 27, 2021

aledbf commented Jan 27, 2021

bcarlsson commented Jan 27, 2021

aledbf commented Jan 27, 2021

New option to set a shutdown grace period #5855

New option to set a shutdown grace period #5855

Conversation

bcarlsson commented Jul 6, 2020 • edited Loading

What this PR does / why we need it:

Types of changes

Which issue/s this PR fixes

How Has This Been Tested?

Checklist:

k8s-ci-robot commented Jul 6, 2020

k8s-ci-robot commented Jul 6, 2020

k8s-ci-robot commented Jul 6, 2020

aledbf commented Jul 6, 2020 • edited Loading

aledbf commented Jul 6, 2020

aledbf commented Jul 6, 2020

k8s-ci-robot commented Jul 6, 2020 • edited Loading

bcarlsson commented Jul 7, 2020 • edited Loading

bcarlsson commented Aug 28, 2020

bcarlsson commented Sep 4, 2020

aledbf commented Sep 4, 2020

bcarlsson commented Sep 4, 2020

codecov-io commented Oct 7, 2020 • edited Loading

Codecov Report

bcarlsson commented Oct 7, 2020

aledbf commented Oct 7, 2020

bcarlsson commented Oct 8, 2020

jeroenj commented Dec 10, 2020

Zhann commented Jan 25, 2021

aledbf commented Jan 25, 2021

bcarlsson commented Jan 26, 2021

jeroenj commented Jan 26, 2021

aledbf commented Jan 26, 2021

bcarlsson commented Jan 27, 2021

aledbf commented Jan 27, 2021

aledbf commented Jan 27, 2021

aledbf commented Jan 27, 2021

k8s-ci-robot commented Jan 27, 2021

aledbf commented Jan 27, 2021

bcarlsson commented Jan 27, 2021

aledbf commented Jan 27, 2021

bcarlsson commented Jul 6, 2020 •

edited

Loading

aledbf commented Jul 6, 2020 •

edited

Loading

k8s-ci-robot commented Jul 6, 2020 •

edited

Loading

bcarlsson commented Jul 7, 2020 •

edited

Loading

codecov-io commented Oct 7, 2020 •

edited

Loading