Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix eventlet graceful timeout handling #1725

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mjjbell
Copy link

@mjjbell mjjbell commented Mar 15, 2018

The StopServer exception can lead to a handler which is blocked waiting for an available greenthread to never be processed. The accepted connection is then terminated, leading to a 502 status code being received by the client due to an invalid response.

This is reproducible with the following settings and a sufficient number of concurrent requests.

    max_requests = 1
    graceful_timeout = 60
    worker_class = 'eventlet'
    worker_connections = 1
    workers = 1

This change ensures we attempt to handle any accepted socket connection
within the graceful timeout period by resubmitting to the pool after receiving the exception.

In addition, we close the socket after all handling is complete, to prevent other listener errors.

Copy link
Collaborator

@tilgovi tilgovi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Good find.

Only one change needs to be made. The sock.close() should be before pool.waitall(). Otherwise, the socket will remain open by the process and therefore by the OS, leaving half open connections in the queue. We close the socket quickly so that the OS starts rejecting connections and any load balancer in front of Gunicorn can fail-over without waiting for a timeout.

Closing the listening socket this way should be safe because:

  1. If other workers or another Gunicorn arbiter process have it open the OS will not shut it down.
  2. The connection is already accepted, so it has its own socket and does not need the listener.

@mjjbell mjjbell force-pushed the eventlet_graceful_shutdown_fix branch from 2dbfc3c to 70c3b91 Compare March 15, 2018 22:51
@mjjbell
Copy link
Author

mjjbell commented Mar 15, 2018

Thanks, that makes sense.

The reason I moved sock.close() before pool.waitall() was because I was still getting errors of this form:

[2018-03-15 18:05:32 +0000] [254] [ERROR] Socket error processing request.
Traceback (most recent call last):
  File "/gunicorn/gunicorn/workers/async.py", line 66, in handle
    six.reraise(*sys.exc_info())
  File "/gunicorn/gunicorn/six.py", line 625, in reraise
    raise value
  File "/gunicorn/gunicorn/workers/async.py", line 39, in handle
    listener_name = listener.getsockname()
OSError: [Errno 9] Bad file descriptor

This is because the AsyncWorker handle method is calling socket.getsocketname on the potentially closed socket during the graceful shutdown.

However, now I understand a bit better, I can see that the handle method only needs the socket to create a listener name. Therefore, I've made a slight refactoring to make it take a name as an argument instead of the socket itself.

I hope that makes sense.

@mjjbell mjjbell force-pushed the eventlet_graceful_shutdown_fix branch 3 times, most recently from 16badbe to e4db80a Compare March 16, 2018 10:59
tilgovi
tilgovi previously approved these changes Mar 16, 2018
Copy link
Collaborator

@tilgovi tilgovi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! @berkerpeksag could you take a quick look, in case I missed something?

Copy link
Collaborator

@berkerpeksag berkerpeksag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to change the signature of handle() methods? There are at least a couple of AsyncWorker subclasses in the wild and we can't know how they call super() in their implementations. There is a small chance that this may break third party code if they have a snippet like this:

def handle(self, listener, client, addr):
    ...

    super().handle(listener=listener, client=client, addr=addr)

@tilgovi
Copy link
Collaborator

tilgovi commented Mar 17, 2018

I have the same feeling, but I'm not sure what to do about it.

@mjjbell mjjbell force-pushed the eventlet_graceful_shutdown_fix branch 4 times, most recently from 77aa82e to e55a5e9 Compare March 20, 2018 12:53
@mjjbell
Copy link
Author

mjjbell commented Mar 20, 2018

If we use client rather than listener for creating listener_name, then we are able to retrieve the name inside handle() whilst maintaining the function signature (in fact listener becomes unused).

I'm not sure of the side effects of switching the socket object we retrieve the name from, but it appears to be used only for the WSGI SERVER_URL environ variable, so even if the name is different, it should still be something valid.

I've updated the PR to reflect this change. Let me know how to proceed.

@mjjbell mjjbell force-pushed the eventlet_graceful_shutdown_fix branch 2 times, most recently from faf5e43 to 982e36b Compare March 20, 2018 13:33
@mjjbell
Copy link
Author

mjjbell commented Mar 22, 2018

@tilgovi @berkerpeksag any comments on this last change?

@mjjbell mjjbell force-pushed the eventlet_graceful_shutdown_fix branch from 982e36b to ae5c078 Compare April 26, 2018 09:22
The `StopServer` exception can lead to a handler blocked waiting for
an available greenthread to never be processed.
This change ensures we attempt to handle any accepted socket connection
within the graceful timeout period.
@mjjbell mjjbell force-pushed the eventlet_graceful_shutdown_fix branch from ae5c078 to db5a7a1 Compare April 27, 2018 10:09
@tilgovi
Copy link
Collaborator

tilgovi commented Sep 24, 2018

I propose we remove the change in the base class and get this merged.

Then, we could break the API in a separate PR for the next major release, if we want to.

or

We could store a dict of {listener: name} when run() is called and use that to look up the name later, so we don't rely on getsockname() working.

@tilgovi
Copy link
Collaborator

tilgovi commented Apr 21, 2020

I would very much like to finish this up. The listeners should never change names so we could store them when we create the sockets and then pass them later from that cache.

Any other approaches that don't result in breaking changes?

We should make sure this works for every worker, not just eventlet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants