Question: How to Gracefully Shutdown Workers? #190

Diasporism · 2023-08-21T23:23:54Z

Hello, this is partially an Elixir question (and please forgive me for my ignorance) and partially (I believe) a question particularly relevant to how this library is structured.

tl;dr: I'm looking for guidance on the best way to take a SIGTERM or System.stop() and

toggle disable_fetch config setting from false to true, live.
allow currently busy FaktoryWorker.Job processes to finish their work
shutdown the FaktoryWorker supervisor

in that general order.

Long version:

I run a Faktory client on Kubernetes and perform rolling updates where a new pod/container spins up and then Kubernetes sends a SIGTERM to the old pod/container telling it to shutdown.

My goal is to have the old container gracefully finish in-flight jobs before shutting down. My jobs involve transferring large amounts of data. Arbitrarily killing/restarting during deploys can cause downstream problems (resumable streams are not always an option).

Excluding all of the Kubernetes configs to make graceful, rolling deployments happen, I'd like to know how to take a SIGTERM to my Elixir application and in turn disable the fetching of new jobs from Faktory, as well as trap the kill message in my FaktoryWorker.Job processes to give them enough time to finish.

All the examples I've found online for doing this relate to simple GenServers and I'm not positive how to translate them to this library.

For context:
This video shows my desired outcome, but for HTTP requests and GenServers. https://www.youtube.com/watch?v=cbCgB9F6RrM

This forum posts talks about trapping exit messages, but I think this library hardcodes a :brutal_kill signal, which wouldn't work? https://elixirforum.com/t/graceful-shutdown-on-sigterm/23780/2

Any and all help is appreciated!

The text was updated successfully, but these errors were encountered:

Diasporism · 2023-08-22T02:36:01Z

After digging into the source code a bit more, it seems like the disable_fetch toggle might happen automatically when a worker receives a kill signal (I think? Please correct me if I'm wrong). In that case, is there a way to customize the worker shutdown grace period to give it enough time to finish longer runner jobs?

If so, then I suppose it'd be a matter of configuring Kubernetes to have a long enough terminationGracePeriodSeconds (among other things like a PodDisruptionBudget), let it send the SIGTERM to the container, and assume this library:

immediately stops taking new jobs off the Faktory queue(s)
gives each worker N amount of time to finish by trapping the exit and preventing immediate shutdown, where N is configurable by the application itself
assume any jobs that haven't finished by N will be killed when Kubernetes follows up with a SIGKILL

jeremyowensboggs · 2023-08-22T13:01:52Z

Hi, you might find this article helpful. https://ellispritchard.medium.com/graceful-shutdown-on-kubernetes-with-signals-erlang-otp-20-a22325e8ae98
it describes something similar that we do in one of our apps.

Diasporism · 2023-08-22T16:02:36Z

Hi, you might find this article helpful. https://ellispritchard.medium.com/graceful-shutdown-on-kubernetes-with-signals-erlang-otp-20-a22325e8ae98 it describes something similar that we do in one of our apps.

Thanks @jeremyowensboggs, that works nicely for apps exposed via a Kubernetes Ingress or Service. Failing the health check probes automatically prevents more incoming traffic, which allows processes to drain.

It's still unclear to me how to achieve a similar result with faktory_worker and background jobs. To be clear, I'm talking about shutting down a FaktoryWorker client, not the Faktory server itself.

Diasporism · 2023-09-07T23:12:23Z

Just checking back in, is there a way to configure how long a job has to finish it's work when the workers receive an exit/down signal before it's forcefully terminated?

jeremyowensboggs · 2023-09-08T13:13:29Z

Hi, no, that hasn't been built in to the library yet.

Ch4s3 · 2023-09-15T15:45:07Z

Just checking back in, is there a way to configure how long a job has to finish it's work when the workers receive an exit/down signal before it's forcefully terminated?

I am open to a PR on this.

Diasporism · 2023-09-18T18:59:02Z

Thanks @Ch4s3, I'll see if I can't cook one up one of these nights/weekends when I find some spare time. I think it'd be a great addition to the library.

Ch4s3 · 2023-09-19T18:14:24Z

awesome, let us know. I'll try to keep an eye out

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: How to Gracefully Shutdown Workers? #190

Question: How to Gracefully Shutdown Workers? #190

Diasporism commented Aug 21, 2023 •

edited

Loading

Diasporism commented Aug 22, 2023 •

edited

Loading

jeremyowensboggs commented Aug 22, 2023

Diasporism commented Aug 22, 2023 •

edited

Loading

Diasporism commented Sep 7, 2023

jeremyowensboggs commented Sep 8, 2023

Ch4s3 commented Sep 15, 2023

Diasporism commented Sep 18, 2023

Ch4s3 commented Sep 19, 2023

Question: How to Gracefully Shutdown Workers? #190

Question: How to Gracefully Shutdown Workers? #190

Comments

Diasporism commented Aug 21, 2023 • edited Loading

Diasporism commented Aug 22, 2023 • edited Loading

jeremyowensboggs commented Aug 22, 2023

Diasporism commented Aug 22, 2023 • edited Loading

Diasporism commented Sep 7, 2023

jeremyowensboggs commented Sep 8, 2023

Ch4s3 commented Sep 15, 2023

Diasporism commented Sep 18, 2023

Ch4s3 commented Sep 19, 2023

Diasporism commented Aug 21, 2023 •

edited

Loading

Diasporism commented Aug 22, 2023 •

edited

Loading

Diasporism commented Aug 22, 2023 •

edited

Loading