-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make graceful shut-down keep-alive behavior consistent #1236
Comments
@tilgovi do you need anything from me there? |
If you want to "describe the intended behavior" that would be helpful, otherwise I'll propose it. |
Sorry I missed your answer. Graceful shutdown only means we let a time for the requests to finish.
Speaking of keepalive connections I think we should stop the request loop when the signal is received instead of accepting any new requests. Thoughts? |
@benoitc Considering a sync worker, without threads, I believe no other connections are aborted/lost than the current request in-process, because connections are queued at the master process and not at the worker process level? If so, can you clarify what you mean by "all still running client connections are closed" - I assume you refer to threaded/async workers here (where multiple requests may be processed concurrently, compared to sync worker without threads)? |
@tuukkamustonen master doesn’t queue any connection. each workers is responsible to accept a connection. afaik connections are queued at the system level. When the master receive the hup signal it notify the worker about it and they stop to accept new connections. Then running connections (those already accepted) will have the graceful time to finish or be forcefully closed. |
Ah, I wonder how that works / where it's instructed... well, no need to go that deep :)
Ok. This summarizes it nicely. |
@tilgovi we probably should close that issue? |
I would like to keep this one open. I'm not convinced we have consistent behavior here yet. |
How to reproduce problem:
run apache benchmark:
... See > 4% failed requests just due to restarted workers (in this case by max-requests)
... See no failed requests tried on gunicorn up to 20.0.0 |
Probably it worth resolve problem in a following way: In case of graceful shutdown on keep-alive connection try to serve one more request after graceful shutdown request and send Connection: close in response to force sender not use this socket any more for next request, if no request arrived in reasonable timeframe (i.e. 1s) just close connection. Yes, there is small possibility for race (when server decides to close when client sends request), |
cc @tilgovi ^^ In fact there is two schools there imo
I'm in favour of 2 which may be more safe. Thoughts? |
I am very much in favor of Option 2. That was the behavior I've assumed and I've made some changes to this end in, linked from #922. I don't know which workers implement all of these behaviors, but we should check:
Any other behaviors we should describe before we audit? |
This is #922. I think it is done for all workers and the arbiter.
This is this ticket. We should make sure all workers do this. |
I think this is still done, but we have an new issue due to this at #1725. The same issue might exist for other worker types than eventlet.
I think this is now done for the threaded worker and the async workers in #2288, ebb41da and 4ae2a05. |
I'm going to close this issue because I think it's mostly addressed now. I don't think the tornado worker is implementing graceful shutdown, but that can be a separate ticket. |
I've opened #2317 for Tornado and I'll close this. |
Probably, I was not clear enough ... for keep-alive connection there are no way close connection "safe", So, only "safe" way will be either
|
The gevent and eventlet workers do not have any logic to close keepalive connections during graceful shutdown. Instead, they have logic to force "Connection: close" on requests that happen during graceful shutdown. So, I believe it is already the case that they will send a "Connection: close" before actually closing the connection. There is always a possibility that a long request ends close enough to the graceful timeout deadline that the client never gets to send another request and discover "Connection: close" before the server closes the connection forcefully. I don't see any way to avoid that. Set a longer graceful timeout to handle this. |
Re-opening until I can address issues in the eventlet and threaded server. Just to re-iterate, on graceful shutdown a worker should:
Right now, the eventlet worker cannot handle existing keep-alive connections because it fails on I'll work to get both PRs submitted this week. I apologize for not being able to do so sooner. |
gunicorn = "20.0.4" In my case, when gunicorn master receives SIGHUP signal (sent by consul-template to reload refreshed secrets written in a file on local disk), it creates a new worker and gracefully shuts down old worker. However, during the transition from old to the new worker, http connections cached b/w client and old worker (keep-alive connections) are stale now and any request sent by client to the server that happen to use stale socket will hung and eventually timeout. Essentially, the threaded worker is not able to handle existing keep-alive requests. |
Hi @tilgovi Can this issue be closed ? |
There are still issues as documented in my last comment. |
Hi, |
no activity since awhile. closing feel @tilgovi feel free to reopen if you still want to work on it :) |
Following on from #922, the handling of keep-alive connections during graceful shutdown is not really specified anywhere and may not be consistent among workers.
The text was updated successfully, but these errors were encountered: