Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Drain based on annotation #4188

Closed
anton-johansson opened this issue Jun 11, 2019 · 23 comments
Closed

Feature request: Drain based on annotation #4188

anton-johansson opened this issue Jun 11, 2019 · 23 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@anton-johansson
Copy link

Is this a request for help?
No

What keywords did you search in NGINX Ingress controller issues before filing this one?
nginx ingress controller drain annotation
nginx ingress controller drain label

I found #2322, which is a similar request, which was closed because drain is commercial only. I understand that, but maybe we can work out a solution that works without the NGINX function (as done with sticky sessions already). If I understand things correctly, we use LUA-scripting to handle the balancing and sticky sessions. It should be possible to check upstream pod annotations here to decide whether or not pods should be considered for new sessions.


Is this a BUG REPORT or FEATURE REQUEST?
Feature request

NGINX Ingress controller version:

0.24.1

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:11:31Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:02:58Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: Bare metal, 3 masters and 5 worker nodes.
  • OS: Ubuntu 18.04.2 LTS (Bionic Beaver)
  • Kernel: 4.15.0-50-generic
  • Install tools: I use Ansible to install Kubernetes services, as described here.
  • Others:

Feature request:

I have a scenario similar to the one described in issue #2322. Our application does not have session replication and we need a better way of running version rollouts. We need a way to tell NGINX to not send new sessions to older deployments. I was thinking that we could use an annotation for this on Pod-level, maybe:

nginx.ingress.kubernetes.io/drain: true

... or this one (to not mix this with NGINX Plus' built in functionality):

nginx.ingress.kubernetes.io/accept-new-sessions: false

This would of course only have any effect at all if using sticky sessions, which could be a little confusing.

Looking around in the code, I assume that this functionality would take place somewhere in the pick_new_upstream function of sticky.lua.

Thoughts? Ideas? I could try and see if I can develop the changes myself, but I need to know if it's something that is actually wanted and if it's the right approach.

@anton-johansson anton-johansson changed the title Drain based on annotation Feature request: Drain based on annotation Jun 13, 2019
@aledbf aledbf added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 11, 2019
@aledbf
Copy link
Member

aledbf commented Sep 3, 2019

@anton-johansson please check #4514 (not merged yet)

@anton-johansson
Copy link
Author

anton-johansson commented Sep 11, 2019

@aledbf That looks interersting indeed. But I'm not sure if it's related to marking certain pods as draining (i.e. not receiving new sessions).

@ElvinEfendi
Copy link
Member

@anton-johansson can you not edit specific pod manifest and change readiness probe in a way that it fails. Then ingress-nginx will remove that pod from its list and stop proxying new connections to it (it'll still process the existing ones).

@anton-johansson
Copy link
Author

Are you sure that's how it works? I get the feeling that pods that turn Unready will be removed from load balancing all together, including existing sessions.

If I'm wrong though, your solution seems perfectly valid and is surely something that I'd like to try out. :D

@ElvinEfendi
Copy link
Member

Give it a try ;)

@wknapik
Copy link

wknapik commented Sep 27, 2019

We have this exact problem. Stateful app, no way to migrate sessions, need draining until sessions expire when updating.

I implemented a PoC based on nginx-ingress 0.25.1 and it seems to be working, but I had to patch sticky.lua to modify pick_new_upstream. I really want to avoid this, since it will require endless maintenance - I can already see the patch will not work on upstream code from master.

So... Ideally, I'd like to see session draining based on annotations implemented in nginx-ingress, but if that's not going to happen, I'd at least like to be able to implement it myself without the need for patching upstream code.

It could be done with minimal effort - say, pick_new_upstream could exclude upstreams from a shared nginx dict with a specific name. Or maybe we could provide a lua snippet via a configmap that would return a dict of upstreams to exclude (like get_drained_upstreams, mimicking get_failed_upstreams) ? Anything like that would work.

Once I'm done with turning the PoC into something production-ready, I'd be happy to open a PR, but it would only be a PR for the hook I described above, since every complete implementation would/coud be different.

@anton-johansson
Copy link
Author

@wknapik: Cool! I'd love to see your patch. I don't want to run a patched version either, but I'm still interested in seeing your solution to it.

Either way, an annotation based solution seems like the optimal solution.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 26, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 25, 2020
@wknapik
Copy link

wknapik commented Jan 25, 2020

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 25, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 24, 2020
@wknapik
Copy link

wknapik commented Apr 26, 2020

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 26, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 25, 2020
@wknapik
Copy link

wknapik commented Jul 27, 2020

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 27, 2020
@ArronaxKP
Copy link

This would be a nice feature to have. The only ingress I have found that supports this without an additional cost/deployment outside of a cluster is: jcmoraisjr/haproxy-ingress

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 17, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 16, 2021
@evenh
Copy link

evenh commented Jan 16, 2021

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 16, 2021
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 16, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 16, 2021
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@angeloxx
Copy link

angeloxx commented Sep 10, 2024

Even the if issue is closed (and outdated) I add our experience in these days. We used the canary deployment strategy; you can create a new ingress with the same host, that points to a service with a reduced set of pods (in order to select only pods where new users has to land) and:

nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-weight: "100"

All new users are directed to the subset, old users keeps their session to old pods and when the migration has finished, you can remove the canary ingress. The persistence cookie will be still valid and the standard ingress will honor it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

9 participants