-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify what/how timeout and graceful_timeout work #1493
Comments
Hmm, I think #1236 (comment) confirms my assumptions about |
Seems like how I interpreted However, it seems like if response is returned between I tested by import time
from flask import Flask
app = Flask(__name__)
@app.route('/foo')
def foo():
time.sleep(3)
return 'ok' Then:
Then I send |
It seems that on This exception gets thrown through Flask stack, and any In fact, I don't see Also what might be weird is that https://github.com/benoitc/gunicorn/blob/master/gunicorn/arbiter.py#L392 does not check |
@tuco86 The The timeout is here to prevent busy workers to block others requests. If they don't notify the arbiter in a time less than the |
Uh, alright. Does
I mean, what if the worker doesn't shutdown in
I assume you meant |
if the worker doesn't shutdown during the graceful time it will be killed without any other delay. |
Of course. Thanks for clarifying things up! |
@benoitc Asking in the context of this old ticket - what does the last sentence in
Not being a native english speaker, I have hard time understanding this. Does it mean that |
@tuukkamustonen Maybe it would be a good idea for us to change the name if other people find this confusing. |
@tilgovi
Would you have an example of when |
That's correct. An asynchronous worker that relies on an event loop core might perform a CPU intensive procedure that doesn't yield within the timeout. |
Not only a bug, in other words. Although, sometimes it may indicate a bug, such as a call to a blocking I/O function when an asyncio protocol would be more appropriate. |
Getting stuck in CPU intensive task is a good example, thanks. Calling blocking I/O in async code is one as well, but I'm not sure how it applies to this context - I'm running a traditional Flask app with blocking code but running it with an async worker ( |
Also, what is the heartbeat interval? What would be a sane value to use for |
The gthread worker is not asynchronous, but it does have a main thread for the heartbeat so it won't timeout either. In the case of that worker, you probably won't see a timeout unless the worker is very overloaded or, more likely, you call a C extension module the doesn't release the GIL. You probably don't have to change the timeout unless you start seeing worker timeouts. |
Alright. Just one more thing:
It may be a little bit confusing that In a nutshell, the |
Glad you asked! The threaded worker does not use asyncio and does not inherit from the base asynchronous worker class. We should clarify the documentation. I think it may have been listed as async because the worker timeout is handled concurrently, making it behave more like the async workers than the sync worker with respect to ability to handle long requests and concurrent requests. It would be great to clarify the documentation and make it more accurately describe all the workers. |
yeah the gthreads worker shouldn't listed in asyncio worker. maybe having a section that describe the design of each workers is better? |
Re-opening this so we can track it as work to clarify the section on worker types and timeouts. |
Is there a request timeout option available for async workers? In other words how to make arbiter kill a worker that did not process a request within a specified time? |
@aschatten there is not, unfortunately. See also #1658. |
As a worker may be processing multiple requests concurrently, killing whole worker because one request times out sounds pretty extreme. Wouldn't that result in all the other requests getting killed in vain? I recall uWSGI was planning to introduce thread-based killing in 2.1 or so, though probably even that applies for sync/threaded workers only (and my recollection on this is vague). |
The approach can be the same as for |
We are working on a release this week, at which point it may be time to branch for R20, where we plan to tackle a few major things. That might be the right time to make the current timeout into a proper request timeout for every worker type. |
Commenting here instead of filing a separate issue since I'm trying to understand how timeout is supposed to work and I'm not sure whether this is a bug or not. The IMO unexpected behaviour I'm seeing is this: Every max-requests'th request (the one after which worker will be restarted) is timeouted, whereas the other requests are completed successfully. In the below example 4 requests are performed, requests 1, 2, and 4 are successful, whereas request 3 fails. Relevant configuration:
import time
def app(environ, start_response):
start_response('200 OK', [('Content-type', 'text/plain; charset=utf-8')])
time.sleep(5)
return [b"Hello World\n"] gunicorn:
Client:
|
what should be the plan there? I have in mind the following:
Should it be 20.0 or could we postpone it? |
postponing. |
Hey, so this won't be part of 20.0?
|
clarified. @lucas03 it's unclear what a request timeout is there. please open a ticket if you need something specific?. |
Having read the discussion I have got a question. Does it make sense to keep |
they have different purposes. So yes. |
Interesting that Microsoft Azure docs use |
A crude implementation of request timeout for |
(Sorry for the monologue here: simple things got complicated and I ended up digging through the stack. Hopefully what I've documented is helpful for the reader, however.)
As I've understood, by default:
30
seconds (configurable withtimeout
) of request processing, gunicorn master process sendsSIGTERM
to the worker process, to initiate a graceful restart.30
seconds (configurable withgraceful_timeout
), master process sendsSIGKILL
. Seems like this signal is also sent when the worker does gracefully shutdown during thegraceful_timeout
period (d1a0973).The questions:
SIGTERM
signal - in practice, what happens during request processing? Does it just set a flag for the WSGI application (on werkzeug level) that it should shutdown after the request processing is complete? Or doesSIGTERM
already somehow affect ongoing request processing - kill IO connections or something to speed up request processing...?On
SIGKILL
, I guess request processing is just forcefully aborted.I could file a tiny PR to improve docs about this, if I get understanding how things actually work.
The text was updated successfully, but these errors were encountered: