Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for draining node during pod termination #1473

Closed
gavinbunney opened this issue Apr 26, 2019 · 36 comments
Closed

Support for draining node during pod termination #1473

gavinbunney opened this issue Apr 26, 2019 · 36 comments
Labels
legacy Problems that require legacy mode to work around pinned Issue cannot be marked stale
Milestone

Comments

@gavinbunney
Copy link

gavinbunney commented Apr 26, 2019

Please describe your use case / problem.

We have a legacy very-stateful application (tomcat/java), which needs sticky sessions. When we deploy new versions of our applications, or a scale down event happens, we need to stop sending new connections to a server, while sending bound sessions to the old server. Please note: this is not referring to in-flight requests, we're needing the active tomcat sessions to expire, which normally takes about an hour.

We have this working currently in Kubernetes with HAProxy-Ingress by setting its drain-support flag, which drains a pod when it transitions to Terminating, but keeps existing sessions attached to that pod. We then have a preStop hook which blocks the shutdown until users have finished their sessions.

When looking at Ambassador to fulfill this ingress role, we see the pod being removed from routing as soon as it transitions to Terminating. This was tested with a simple application returning the hostname of the pod:

  • Getting two sessions against different pods - Pod A and Pod B
  • Scale the deployment down to 1 - Pod A moved to Terminating, Pod B still active
  • Session immediately switched over to the remaining pod - that is, Pod A session is dropped

Describe the solution you'd like

We'd love for ambassador to drain a pod when it is in Terminating, and keep existing connections alive, whilst routing new connections to other active Pods.

Additional context

Looking at the Kubernetes docs, it looks like they are dropping the pods from the Service registry, as soon as they pod moves to terminating. As per: https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods

3. Pod shows up as “Terminating” when listed in client commands

4. (simultaneous with 3) When the Kubelet sees that a Pod has been marked as terminating because the time in 2 has been set, it begins the pod shutdown process.
...
5. (simultaneous with 3) Pod is removed from endpoints list for service, and are no longer considered part of the set of running pods for replication controllers. Pods that shutdown slowly cannot continue to serve traffic as load balancers (like the service proxy) remove them from their rotations.

Test Details

Ambassador version: 0.60.1

Ambassador Service Configuration

getambassador.io/config: |
      ---
      apiVersion: ambassador/v1
      kind: KubernetesEndpointResolver
      name: my-resolver
      ---
      apiVersion: ambassador/v1
      kind:  Module
      name:  ambassador
      config:
        resolver: my-resolver
        load_balancer:
          policy: round_robin

Ambassador Target Service Configuration:

apiVersion: v1
kind: Service
metadata:
  name: sticky
  labels:
    app: sticky
  annotations:
    getambassador.io/config: |
      ---
      apiVersion: ambassador/v1
      kind:  Mapping
      name:  sticky_mapping
      prefix: /sticky/
      service: sticky
      resolver: my-resolver
      load_balancer:
        policy: maglev
        cookie:
          name: sticky-cookie
          ttl: 300s
spec:
  ports:
  - name: http
    port: 80
  selector:
    app: sticky
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sticky
spec:
  replicas: 2
  selector:
    matchLabels:
      app: sticky
  template:
    metadata:
      annotations:
        sidecar.istio.io/inject: "false"
      labels:
        app: sticky
    spec:
      terminationGracePeriodSeconds: 120
      containers:
      - name: hello
        image: nginxdemos/hello
        lifecycle:
          preStop:
            exec:
              command:
              - /bin/sleep
              - '60'
        ports:
        - containerPort: 80
@gavinbunney
Copy link
Author

Related issue for nginx ingress controller: nginx/kubernetes-ingress#267

@richarddli
Copy link
Contributor

richarddli commented Apr 29, 2019

Somewhat related use-case cc: @AaronTriplett:

Could someone answer the following question regarding the use of ring_hash load-balancing with a header? If we’re using a header (ex: user-id) to create session affinity between a specific user and a pod, is the affinity immediately cleared once that pod is downed by k8s (in case of a failure or new deployment)?

We’re using multiple long-lived gRPC connections for each user (Chat app) and have the idea of using session affinity to ensure any connections for a specific user end up on the same pod. When set up in this fashion, it would allow us to have those server / pod instances reroute data to the appropriate pod in which is affinitzed to a specific user by setting the user-id header to the “destination” user and essentially calling their own route through Ambassador.

It seems like its worth a try, but I want to ensure that if a pod is downed, all of the user-id header values affinitized to said pod get evicted and the next incoming request with that header is forwarded and affinitized to a new pod. (edited)

Aaron Triplett [11:45 PM]
Looks like I found the answer in an envoy issue: envoyproxy/envoy#2819
Long-lived connections would stay connected but new requests would hit a new pod. Split-brain issue, and likely not a good idea until Envoy has connection termination support.

@AaronTriplett
Copy link
Contributor

I've also opened this on the envoy side: envoyproxy/envoy#6730

@AaronTriplett
Copy link
Contributor

Wanted to note this needs to happen on any host change, scaling up, scaling down, and pod termination or failure.

@concaf concaf self-assigned this May 8, 2019
@kflynn kflynn added this to the parc-güell milestone May 9, 2019
@kflynn
Copy link
Member

kflynn commented May 9, 2019

I'm estimating this at 5 story points because it involves Envoy and Kubernetes and Ambassador. Honestly fitting this into parc-güell may be a bit aggressive.

@kflynn kflynn modified the milestones: parc-güell, pedrera May 13, 2019
@kflynn kflynn modified the milestones: pedrera, sagrada-familia Jun 6, 2019
@concaf
Copy link
Contributor

concaf commented Jun 9, 2019

I'm currently working on this, and this is at a state where the terminating pods have made their way to Envoy - but indicating Envoy that this connection is being drained is proving to be a problem.

I've opened an issue in Envoy regarding this at envoyproxy/envoy#7218

Also, from Slack -

I've tried the following approaches -

Whenever a pod goes into terminating, Ambassador captures it and puts the terminating pod's IP right back there with the endpoints with -
1. setting `load_balancing_weight: 1` (allowed values are 1-128) - but this does not works as expected! The new requests are not routed to this IP - but the older sticky sessions do not make their way to the terminating pod. I _thought_ this was working but scaling the pods to more than 2 busted this myth. I'm surprised as this is the implementation used by haproxy-ingress - but envoy behaves differently.

2. setting `'health_status': 'DRAINING'` for the endpoint (See https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/core/health_check.proto#envoy-api-enum-core-healthstatus). This makes envoy believe that the endpoint is unhealthy so no requests are routed to this, but again, the older sticky sessions do not persist.

I've tested with websocket connections as well as normal ones, but no luck. Now there could be something wrong with my testing as well. In subsequent `curl` requests from the command line, `curl` does not re-use the same connection again - which I think is fine as long as the sticky conditions are met i.e. the right cookie is set. Is the right?

Also, I've noticed that when a pod with a preStop hook preventing it to die goes into terminating state, the connection between host and the pod remains intact even though Ambassador (and hence Envoy) removes the host from the endpoint list - which makes me think that this issue is for subsequent sticky connections than for single persistent connections.

Any leads/pointers appreciated.

@stale
Copy link

stale bot commented Aug 17, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Issue is stale and will be closed label Aug 17, 2019
@richarddli
Copy link
Contributor

not stale

@stale stale bot removed the stale Issue is stale and will be closed label Aug 17, 2019
@stale
Copy link

stale bot commented Oct 16, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Issue is stale and will be closed label Oct 16, 2019
@concaf
Copy link
Contributor

concaf commented Oct 17, 2019

not stale

@stale stale bot removed the stale Issue is stale and will be closed label Oct 17, 2019
@stale
Copy link

stale bot commented Dec 16, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Issue is stale and will be closed label Dec 16, 2019
@jwm
Copy link
Contributor

jwm commented Dec 16, 2019

not stale

@stale stale bot removed the stale Issue is stale and will be closed label Dec 16, 2019
@jwm
Copy link
Contributor

jwm commented Apr 16, 2020

not stale

@stale stale bot removed the stale Issue is stale and will be closed label Apr 16, 2020
@stale
Copy link

stale bot commented Jun 15, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Issue is stale and will be closed label Jun 15, 2020
@daveharding
Copy link

not stale

@stale stale bot removed the stale Issue is stale and will be closed label Jun 15, 2020
@hugocarrasco
Copy link

Hi, Would this issue include other active connections? Like draining TCP connections

@imranismail
Copy link
Contributor

Contour already has this and their implementation seems pretty solid projectcontour/contour#145

@psalaberria002
Copy link
Contributor

Any updates on this one? I see Contour has added a shutdown manager sidecar for handling clean shutdowns in https://github.com/projectcontour/contour/pull/2227/files . I tried to to add a preStop hook for draining connections and failing envoy healthchecks, but 8877/ambassador/v0/check_ready was still reporting 200 OK, while 8001/ready was reporting 503s (as it should). I also tried setting the health checks in 8001/ready, but it seems Envoy's admin port does not bind to 0.0.0.0.

@concaf concaf added the pinned Issue cannot be marked stale label Jul 31, 2020
@rbtcollins
Copy link

We have a patch we'll put up soon for this in legacy mode, but non-legacy is looking a bit tricky, because dgroup makes assumptions - in particular that there are no actions accessible to clients of dgroup during graceful shutdowns: https://github.com/datawire/dlib/blob/master/dgroup/group.go#L294

One possible way is to permit fallible user supplied calls to be called-back-to by the graceful shutdown logic. But also dgroup explicitly doesn't support dependencies (https://github.com/datawire/dlib/blob/master/dgroup/group.go#L19) I note ambex that similarly needs to manage TERM logic and doesn't use dgroup.

Another possibility would be a config option that disables signal handling for TERM, allowing the TERM logic to be written outside of dgroup, rather than the all-or-nothing signal handling present today.

Some guidance on this would be needed before we start investing in coding.

@LukeShu
Copy link
Contributor

LukeShu commented Nov 9, 2021

@rbtcollins

Yeah, I've been growing increasingly dissatisfied with dgroup's signal
handling. Not so much as an excuse, but as explanation: It's more or
less exactly what the amb-sidecar part of AES needed for signal
handling, since dgroup was originally factored out from AES.

That said, I'm not sure I follow what you need dgroup to do that it
won't let you. The way to have graceful shutdown code would be have a
goroutine that just blocks on <-ctx.Done() before doing anything:

grp.Go("drainer", func(ctx context.Context) error {
	<-ctx.Done() // wait for graceful shutdown to start
	ctx = dcontext.HardContext(ctx)

	// graceful shutdown code goes here

})

That said, if you're still unsure, but you've got a working patch for
legacy mode, I'd say to go ahead and just PR that to release/v1.14
(legacy mode is "gone" in master and release/v2.0). From there,
I'd happily do/assist in porting it over to non-legacy mode and v2.

@rbtcollins
Copy link

Ah, thanks for the hint. Yes, we'll get the legacy patch up in a day or two , just sorting attribution stuff internally and validating the most recent forward port - its currently against master, since all the code is still there as of late last week, but we'll see :)

@LukeShu
Copy link
Contributor

LukeShu commented Nov 10, 2021

A lot of the legacy-mode code is still on master, but a lot of it has been deleted (in particular: cmd/watt and pkg/supervisor).

@rbtcollins
Copy link

@LukeShu the problem is that dexec.CommandContext kills off things when the graceful shutdown starts. Which is the opposite of graceful when dealing with longer lived sockets in a subprocess.

We could perhaps write a layered Context.Background() :

    group.Go("envoy", func(ctx context.Context) error {
        child_ctx,cancel  := context.WithCancel(Context.Background())
        go runEnvoy(ctx, envoyHUP)
        <- ctx.Done()
        ctx = dcontext.HardContext(ctx)
        ... graceful drain of envoy
       cancel()
    })

This might also need the same done for ambex, to ensure that envoy doesn't get configuration errors during draining.

Thoughts?

@rbtcollins
Copy link

Righto, PR up at #3926

@rbtcollins
Copy link

@LukeShu ping [on the PR and the golang thoughts here]

@marianafranco
Copy link

Any updates on this one? We were seeing 5XXs during pod restarts which we were able to fix by calling localhost:8001/drain_listeners?graceful on a preStop lifecycle, but we prefer to have this handled/fixed by ambassador. We are using the AMBASSADOR_LEGACY_MODE.

@KatieLo
Copy link

KatieLo commented Nov 30, 2021

Hey @marianafranco and thread, apologies for the delay, the team will get back to you on this shortly!

@rbtcollins
Copy link

@marianafranco you could use our patch from #3926 for legacy mode. We haven't got a patch for non-legacy together yet.

@khussey khussey added the legacy Problems that require legacy mode to work around label Dec 2, 2021
@concaf concaf removed their assignment Dec 3, 2021
@marianafranco
Copy link

marianafranco commented Dec 4, 2021

@rbtcollins Thanks for this fix! I tested it in one of our staging environments and it's working (no 5XXs during pod restarts/termination) when AMBASSADOR_LEGACY_MODE=true is set.

Unfortunately, I was mistaken and we are not using the AMBASSADOR_LEGACY_MODE in production today. Do you know what would be the downside of move back to the AMBASSADOR_LEGACY_MODE vs the default one (we are using v1.14.2)?

@rbtcollins
Copy link

@rbtcollins Thanks for this fix! I tested it in one of our staging environments and it's working (no 5XXs during pod restarts/termination) when AMBASSADOR_LEGACY_MODE=true is set.

Unfortunately, I was mistaken and we are not using the AMBASSADOR_LEGACY_MODE in production today. Do you know what would be the downside of move back to the AMBASSADOR_LEGACY_MODE vs the default one (we are using v1.14.2)?

I don't know - needless to say we're using legacy mode just fine :).

@khussey khussey added this to the 2022 Cycle 1 milestone Jan 4, 2022
@mohitreddy1996
Copy link

Hi, any updates if this is supported on the non-legacy versions?

If not, curious to understand if switching to LEGACY mode will have any consequences.

@cindymullins-dw
Copy link
Contributor

Hi @mohitreddy1996 , the docs indicate that AMBASSADOR_LEGACY_MODE is not recommended when performance is critical. Closing as it appears the original issue has been addressed.

@goodgravy
Copy link

@cindymullins-dw I'm not sure this has been addressed…

The patch @rbtcollins shared is for legacy mode only, and as you point out legacy mode is not recommended if one needs high performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
legacy Problems that require legacy mode to work around pinned Issue cannot be marked stale
Projects
None yet
Development

No branches or pull requests