Shut down gracefully on SIGTERM #269

j4mie · 2019-11-20T22:01:11Z

At the moment, Waitress handles SIGINT nicely enough (it waits 5 seconds for any existing requests to finish before exiting).

However, it doesn't handle SIGTERM - it just dies straight away. Heroku sends SIGTERM to processes when it is stopping a dyno, and indeed the 12 Factor methodology specifically mentions SIGTERM as the signal that should be used to perform a graceful shutdown.

This PR adds a signal handler which just raises SystemExit when SIGTERM is received. This exception is caught by the existing except block around asyncore.loop which then calls self.task_dispatcher.shutdown().

I really wasn't sure how to test this. I borrowed the threading idea from this stackoverflow answer but it doesn't seem great. On the other hand, there don't seem to be any specific tests for the existing SIGINT behaviour (unless I've just missed them) so maybe this is overkill?

j4mie · 2019-11-20T22:17:29Z

I'm not sure I understand that test failure..

digitalresistor · 2019-11-28T03:01:07Z

Waitress does not support graceful shutdown. Even the shutdown that it does right now when you Ctrl + C is not graceful in any way shape or form.

The following things need to be fixed:

We need to keep track of all listening sockets
Upon shutdown we need to iterate over all listening sockets and shut them down
Make sure all listening sockets are removed from the map
Then we need to run asyncore.loop with a timeout that matches our expected length we want to allow the graceful shutdown to happen/or until all connections are drained and the map is empty
Then we can shut down the task dispatcher and threads

This change is inadequate to provide the graceful shutdown you are looking to achieve with SIGTERM, and right now it doesn't matter what signal you send waitress, there is no graceful shutdown in waitress.

j4mie · 2019-12-02T11:04:17Z

@bertjwregeer thanks! This sounds more complicated than I thought.

Just to check though - it does look like Waitress supports graceful-ish shutdown. I made a test app that just does sleep(5) and then responds. If I send SIGINT to Waitress while a request is in progress, it does wait for the request to finish before existing. If I send SIGTERM it just dies. So are you saying that Waitress is not graceful enough, or have I misunderstood?

digitalresistor · 2019-12-03T01:24:49Z

I'd love to see that test app, because I don't believe any data you return from your WSGI app is actually sent to the network.

Waitress right now waits for the thread to finish (up to a certain amount of time), but how useful is it for waitress to have the WSGI app finish doing work, but not actually send the data that was generated to the network? That doesn't seem very graceful...

j4mie · 2019-12-03T07:52:46Z

You are absolutely right. I was looking at the behaviour of the server and ignoring the fact that the client didn't get a response. Apologies for the confusion.

Implementing this fully sounds complicated. I have a loose understanding of the internals of Waitress but I'm not familiar enough to know exactly how to turn your list of steps into code. I'll have a look over the codebase when I get some time and try to figure it out. In the meantime, any further advice would be very welcome.

If I can't get this to work, I feel like I'd have to start looking for alternatives to Waitress, which would be a huge shame as we've been using it in production for 50+ client projects for many years. But we're now looking at autoscaling on Heroku for some projects, and having the web server drop connections whenever things are scaled down is not really workable.

Thanks again for your help!

digitalresistor · 2019-12-05T02:22:53Z

I feel like this is being solved at the wrong layer. Heroku knows when it is going to scale down, it should be able to pull the instance out of its load balancer, and not need to rely on a TCP RST packet to find out that an instance is being shut down. (since that would cause the load balancer to reach out, get a TCP RST, then have to retry the next IP in its list and so forth, if many instances are being terminated at once this leads to very bad behaviour on the part of the load balancer).

If the Heroku load balancer no longer sends traffic to an instance, and waits for all other connections to naturally drain, then whether Waitress receives a SIGTERM or other signal doesn't matter because it should no longer be processing any requests in the first place.

Anyway, this is a feature that other people want for other reasons so it'll happen eventually.

evandrocoan · 2020-04-11T23:08:59Z

How just to shut everything down? (it does not have to be graceful)

For example, for flask builtin server I could use this to shut down:

shutdown_function = flask.request.environ.get( 'werkzeug.server.shutdown' )

if shutdown_function is None:
    raise RuntimeError( 'Not running with the Werkzeug Server' )

shutdown_function()

mmerickel · 2020-04-11T23:56:04Z

Sorry @evandrocoan but please ask usage questions on the mailing list. Hijacking old issues gets an ornery message or usually just ignored.

evandrocoan · 2020-04-12T00:10:06Z

@mmerickel Thanks for the code on #290! (https://github.com/Pylons/webtest/blob/4b8a3ebf984185ff4fefb31b4d0cf82682e1fcf7/webtest/http.py#L93-L104)

    def shutdown(self):
        """Shutdown the server"""
        # avoid showing traceback related to asyncore
        self.was_shutdown = True
        self.logger.setLevel(logging.FATAL)
        while self._map:
            triggers = list(self._map.values())
            for trigger in triggers:
                trigger.handle_close()
        self.maintenance(0)
        self.task_dispatcher.shutdown()
        return True

This is my implementation: evandroforks/anki@14d4832

alexellis · 2023-05-31T09:52:22Z

@evandrocoan how would we use your approach within a standard Flask/Waitress Python app for a graceful drain?

Shut down gracefully on SIGTERM

c3d578a

digitalresistor mentioned this pull request Nov 28, 2019

Expose Shutdown Timeout as Configurable Parameter #134

Open

mmerickel closed this Feb 3, 2020

digitalresistor mentioned this pull request Mar 25, 2020

Added graceful shutdown on SIGHUP #48

Closed

digitalresistor mentioned this pull request Nov 6, 2020

Proposing an easy method for graceful application shutdown #198

Open

mig5 mentioned this pull request May 30, 2023

Update deps & switch to waitress for running flask servers onionshare/onionshare#1677

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shut down gracefully on SIGTERM #269

Shut down gracefully on SIGTERM #269

j4mie commented Nov 20, 2019

j4mie commented Nov 20, 2019

digitalresistor commented Nov 28, 2019

j4mie commented Dec 2, 2019

digitalresistor commented Dec 3, 2019

j4mie commented Dec 3, 2019

digitalresistor commented Dec 5, 2019

evandrocoan commented Apr 11, 2020 •

edited

Loading

mmerickel commented Apr 11, 2020

evandrocoan commented Apr 12, 2020 •

edited

Loading

alexellis commented May 31, 2023

Shut down gracefully on SIGTERM #269

Shut down gracefully on SIGTERM #269

Conversation

j4mie commented Nov 20, 2019

j4mie commented Nov 20, 2019

digitalresistor commented Nov 28, 2019

j4mie commented Dec 2, 2019

digitalresistor commented Dec 3, 2019

j4mie commented Dec 3, 2019

digitalresistor commented Dec 5, 2019

evandrocoan commented Apr 11, 2020 • edited Loading

mmerickel commented Apr 11, 2020

evandrocoan commented Apr 12, 2020 • edited Loading

alexellis commented May 31, 2023

evandrocoan commented Apr 11, 2020 •

edited

Loading

evandrocoan commented Apr 12, 2020 •

edited

Loading