Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shut down gracefully on SIGTERM #269

Closed
wants to merge 1 commit into from
Closed

Conversation

j4mie
Copy link
Contributor

@j4mie j4mie commented Nov 20, 2019

At the moment, Waitress handles SIGINT nicely enough (it waits 5 seconds for any existing requests to finish before exiting).

However, it doesn't handle SIGTERM - it just dies straight away. Heroku sends SIGTERM to processes when it is stopping a dyno, and indeed the 12 Factor methodology specifically mentions SIGTERM as the signal that should be used to perform a graceful shutdown.

This PR adds a signal handler which just raises SystemExit when SIGTERM is received. This exception is caught by the existing except block around asyncore.loop which then calls self.task_dispatcher.shutdown().

I really wasn't sure how to test this. I borrowed the threading idea from this stackoverflow answer but it doesn't seem great. On the other hand, there don't seem to be any specific tests for the existing SIGINT behaviour (unless I've just missed them) so maybe this is overkill?

@j4mie
Copy link
Contributor Author

j4mie commented Nov 20, 2019

I'm not sure I understand that test failure..

@digitalresistor
Copy link
Member

Waitress does not support graceful shutdown. Even the shutdown that it does right now when you Ctrl + C is not graceful in any way shape or form.

The following things need to be fixed:

  1. We need to keep track of all listening sockets
  2. Upon shutdown we need to iterate over all listening sockets and shut them down
  3. Make sure all listening sockets are removed from the map
  4. Then we need to run asyncore.loop with a timeout that matches our expected length we want to allow the graceful shutdown to happen/or until all connections are drained and the map is empty
  5. Then we can shut down the task dispatcher and threads

This change is inadequate to provide the graceful shutdown you are looking to achieve with SIGTERM, and right now it doesn't matter what signal you send waitress, there is no graceful shutdown in waitress.

@j4mie
Copy link
Contributor Author

j4mie commented Dec 2, 2019

@bertjwregeer thanks! This sounds more complicated than I thought.

Just to check though - it does look like Waitress supports graceful-ish shutdown. I made a test app that just does sleep(5) and then responds. If I send SIGINT to Waitress while a request is in progress, it does wait for the request to finish before existing. If I send SIGTERM it just dies. So are you saying that Waitress is not graceful enough, or have I misunderstood?

@digitalresistor
Copy link
Member

I'd love to see that test app, because I don't believe any data you return from your WSGI app is actually sent to the network.

Waitress right now waits for the thread to finish (up to a certain amount of time), but how useful is it for waitress to have the WSGI app finish doing work, but not actually send the data that was generated to the network? That doesn't seem very graceful...

@j4mie
Copy link
Contributor Author

j4mie commented Dec 3, 2019

You are absolutely right. I was looking at the behaviour of the server and ignoring the fact that the client didn't get a response. Apologies for the confusion.

Implementing this fully sounds complicated. I have a loose understanding of the internals of Waitress but I'm not familiar enough to know exactly how to turn your list of steps into code. I'll have a look over the codebase when I get some time and try to figure it out. In the meantime, any further advice would be very welcome.

If I can't get this to work, I feel like I'd have to start looking for alternatives to Waitress, which would be a huge shame as we've been using it in production for 50+ client projects for many years. But we're now looking at autoscaling on Heroku for some projects, and having the web server drop connections whenever things are scaled down is not really workable.

Thanks again for your help!

@digitalresistor
Copy link
Member

I feel like this is being solved at the wrong layer. Heroku knows when it is going to scale down, it should be able to pull the instance out of its load balancer, and not need to rely on a TCP RST packet to find out that an instance is being shut down. (since that would cause the load balancer to reach out, get a TCP RST, then have to retry the next IP in its list and so forth, if many instances are being terminated at once this leads to very bad behaviour on the part of the load balancer).

If the Heroku load balancer no longer sends traffic to an instance, and waits for all other connections to naturally drain, then whether Waitress receives a SIGTERM or other signal doesn't matter because it should no longer be processing any requests in the first place.

Anyway, this is a feature that other people want for other reasons so it'll happen eventually.

@evandrocoan
Copy link

evandrocoan commented Apr 11, 2020

How just to shut everything down? (it does not have to be graceful)

For example, for flask builtin server I could use this to shut down:

shutdown_function = flask.request.environ.get( 'werkzeug.server.shutdown' )

if shutdown_function is None:
    raise RuntimeError( 'Not running with the Werkzeug Server' )

shutdown_function()

@mmerickel
Copy link
Member

Sorry @evandrocoan but please ask usage questions on the mailing list. Hijacking old issues gets an ornery message or usually just ignored.

@evandrocoan
Copy link

evandrocoan commented Apr 12, 2020

@mmerickel Thanks for the code on #290! (https://github.com/Pylons/webtest/blob/4b8a3ebf984185ff4fefb31b4d0cf82682e1fcf7/webtest/http.py#L93-L104)

    def shutdown(self):
        """Shutdown the server"""
        # avoid showing traceback related to asyncore
        self.was_shutdown = True
        self.logger.setLevel(logging.FATAL)
        while self._map:
            triggers = list(self._map.values())
            for trigger in triggers:
                trigger.handle_close()
        self.maintenance(0)
        self.task_dispatcher.shutdown()
        return True

This is my implementation: evandroforks/anki@14d4832

@alexellis
Copy link

@evandrocoan how would we use your approach within a standard Flask/Waitress Python app for a graceful drain?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants