-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graceful shutdown for tornado worker #2317
Comments
Proposed solution will not help with in case of keep-alive sockets: see proof in comment: #1236 (comment) To shutdown keep-alive socket gracefully - we should do ether one of:
or
p.s. |
i don't think it's expected by the spec to close gracefully such socket if
the request is not started. Client should normally expect to timeout. Do
you have any example of a client that bug with the current behavior?
On Tue 19 Jan 2021 at 21:29 vgrebenschikov ***@***.***> wrote:
Proposed solution will not help with in case of keep-alive sockets:
see proof in comment: of #1236
<#1236>: #1236 (comment)
<#1236 (comment)>
To shutdown keep-alive socket gracefully:
we should do ether one of:
1. first respond on request with connect:close (downgrade keepalive to
close connection), then close socket
or
1. catch a window when client does not yet send next request (to be in
time with closing until client fire next request, otherwise it will be
failed, which is bad)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2317 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAADRIT6WYG3HMCBUAR3BJ3S2XTSNANCNFSM4MNUK22A>
.
--
Sent from my Mobile
|
How to reproduce problem: Run gunicorn with keepalive (trivial app returning some data) $ ab -n 10000 -c 20 -s 1 -k -r 127.0.0.1:8000/ ... See > 4% failed requests just due to restarted workers (in this case by max-requests) Run gunicorn without keepalive (same app) ... See no failed requests Errors in first case caused by closing keep-alive socket by the server (due to max-requests) while next request in progress. |
As I answered in #1236 (comment) the design is to send "Connection: close" once the server is shutting down and to wait for the graceful timeout before forcing connection closed. I'm able to reproduce what you see. Based on the distribution of request times, it seems impossible that connections are getting closed during the keepalive timeout, and instead they must be getting closed by the shutdown before they receive "Connection: close". I'm not able to reproduce this with gevent, but I can reproduce it with eventlet and the threaded worker. I'll see if I can narrow down a bug in either of those. |
I've narrowed down the issue in eventlet to be a failing |
Alright, I've isolated the bug for the threaded worker, as well. I'll file issues or PRs to fix both tomorrow. |
Here's the gist if you want to check my work and try it out before I get a chance to look at it with eyes that have slept: diff --git a/gunicorn/workers/gthread.py b/gunicorn/workers/gthread.py
index d531811..ed70dcb 100644
--- a/gunicorn/workers/gthread.py
+++ b/gunicorn/workers/gthread.py
@@ -220,13 +220,37 @@ class ThreadWorker(base.Worker):
# handle keepalive timeouts
self.murder_keepalived()
- self.tpool.shutdown(False)
- self.poller.close()
-
+ # stop accepting new connections
for s in self.sockets:
+ self.poller.unregister(s)
s.close()
- futures.wait(self.futures, timeout=self.cfg.graceful_timeout)
+ # handle existing connections until the graceful timeout
+ deadline = time.time() + self.cfg.graceful_timeout
+ while self.nr_conns:
+ # notify the arbiter we are alive
+ self.notify()
+
+ timeout = deadline - time.time()
+ if timeout <= 0:
+ break
+
+ # clean up finished requests
+ result = futures.wait(self.futures, timeout=0)
+ for fut in result.done:
+ self.futures.remove(fut)
+
+ # handle keepalive requests
+ events = self.poller.select(1.0)
+ for key, _ in events:
+ callback = key.data
+ callback(key.fileobj)
+
+ # handle keepalive timeouts
+ self.murder_keepalived()
+
+ self.tpool.shutdown(False)
+ self.poller.close()
def finish_request(self, fs):
if fs.cancelled():
@@ -238,7 +262,7 @@ class ThreadWorker(base.Worker):
(keepalive, conn) = fs.result()
# if the connection should be kept alived add it
# to the eventloop and record it
- if keepalive and self.alive:
+ if keepalive:
# flag the socket as non blocked
conn.sock.setblocking(False) The first change is to ensure that the worker continues to handle keepalive requests in progress while shutting down. The second chunk fixes a bug preventing keepalive requests that started before the shutdown began from handling their next request and receiving the "Connection: close" that they need to shut down gracefully. |
Following from #1236, the tornado worker should behave like the other workers.
On graceful shutdown:
I think that the call to
server.stop()
is already stopping it from accepting, but I don't know if it closes the listener socket. I don't see any graceful timeout handling.We should also double checking that calling
server.stop()
prevents open connections from being reused for keep-alive requests.The text was updated successfully, but these errors were encountered: