-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gunicorn gthread deadlock #3146
Comments
when opening many sockets the worker gets frozen consuming 100% cpu to reproduce: ``` cd examples python standalone_app_gthread.py ``` and then using Apache Bench tool: ``` ab -n1000 -c100 http://0.0.0.0:8080/ ``` Fixes benoitc#3146
@JorisOnGithub could you please confirm if this patch is not breaking all the improvements needed to solve #2917 PS: I couldn't reproduce the issues you described, but I don't have WebView2, and I couldn't reproduce with Microsoft Edge for macos |
Hi @benoitc and team - First off, thank you all for what you do! I tried to contact [email protected] about this issue, but the email bounced. ![]() Since this issue is already out in the public it seemed appropriate to post here instead. This issue is a denial-of-service (DOS) vulnerability although not labeled as such in the original description. While the default of 1000 for worker_connections should hopefully mean that implementors that have reasonable rate limits in upstream reverse proxies/WAFs should be protected, anyone with a rate limit >1000 per IP address would make this trivial to exploit. Additionally, an effective DDoS with a relatively low number of IP addresses should be more plausible. I see that gunicorn has not historically used GitHub's security advisory feature. I just wanted to highlight that it makes it very easy to request a CVE in case you are not familiar with this feature. |
I've debugged the issue, and I think the smallest diff for fixing it (while maintaining the functionality of #2917) is: diff --git a/gunicorn/workers/gthread.py b/gunicorn/workers/gthread.py
index c9c42345..9c1a92ad 100644
--- a/gunicorn/workers/gthread.py
+++ b/gunicorn/workers/gthread.py
@@ -119,6 +119,9 @@ class ThreadWorker(base.Worker):
def accept(self, server, listener):
try:
+ if self.nr_conns >= self.worker_connections:
+ return
+
sock, client = listener.accept()
# initialize the connection object
conn = TConn(self.cfg, sock, client, server)
@@ -218,6 +221,10 @@ class ThreadWorker(base.Worker):
return_when=futures.FIRST_COMPLETED)
else:
# wait for a request to finish
+ events = self.poller.select(0.0)
+ for key, _ in events:
+ callback = key.data
+ callback(key.fileobj)
result = futures.wait(self.futures, timeout=1.0,
return_when=futures.FIRST_COMPLETED)
For those interested, I think the root cause of the bug is the following:
About the patch: by running I had some other clean-ups in mind which would fix this bug, but I'm just posting this minimal version early on in case @benoitc would like to incorporate it. |
This change-set aims change so that we do everything non-related to the request on the main thread. This way, we can hopefully avoid lots of strange race conditions and make it easier to reason about the code. This also fixes benoitc#3146, but there are leaner ways of doing that. Work in progress.
The main purpose is to remove complexity from gthread by: * Removing the lock for handling self._keep and self.poller. This is possible since we now do all such manipulation on the main thread instead. When a connection is done, it posts a callback through the PollableMethodCaller which gets executed on the main thread. * Having a single event queue (self.poller), as opposed to also managing a set of futures. This fixes benoitc#3146 (although there are more minimal ways of doing it). There are other more minor things as well: * Renaming some variables, e.g. self._keep to self.keepalived_conns. * Remove self-explanatory comments (what the code does, not why). * Remove fiddling with connection socket block/not blocked. Some complexity has been added to the shutdown sequence, but hopefully for good reason: it's to make sure that all already accepted connections are served within the grace period.
The main purpose is to remove complexity from gthread by: * Removing the lock for handling self._keep and self.poller. This is possible since we now do all such manipulation on the main thread instead. When a connection is done, it posts a callback through the PollableMethodCaller which gets executed on the main thread. * Having a single event queue (self.poller), as opposed to also managing a set of futures. This fixes benoitc#3146 (although there are more minimal ways of doing it). There are other more minor things as well: * Renaming some variables, e.g. self._keep to self.keepalived_conns. * Remove self-explanatory comments (what the code does, not why). * Just decide that socket is blocking. Some complexity has been added to the shutdown sequence, but hopefully for good reason: it's to make sure that all already accepted connections are served within the grace period.
The main purpose is to remove complexity from gthread by: * Removing the lock for handling self._keep and self.poller. This is possible since we now do all such manipulation on the main thread instead. When a connection is done, it posts a callback through the PollableMethodCaller which gets executed on the main thread. * Having a single event queue (self.poller), as opposed to also managing a set of futures. This fixes benoitc#3146 (although there are more minimal ways of doing it). There are other more minor things as well: * Renaming some variables, e.g. self._keep to self.keepalived_conns. * Remove self-explanatory comments (what the code does, not why). * Just decide that socket is blocking. Some complexity has been added to the shutdown sequence, but hopefully for good reason: it's to make sure that all already accepted connections are served within the grace period.
The main purpose is to remove complexity from gthread by: * Removing the lock for handling self._keep and self.poller. This is possible since we now do all such manipulation on the main thread instead. When a connection is done, it posts a callback through the PollableMethodCaller which gets executed on the main thread. * Having a single event queue (self.poller), as opposed to also managing a set of futures. This fixes benoitc#3146 (although there are more minimal ways of doing it). There are other more minor things as well: * Renaming some variables, e.g. self._keep to self.keepalived_conns. * Remove self-explanatory comments (what the code does, not why). * Just decide that socket is blocking. Some complexity has been added to the shutdown sequence, but hopefully for good reason: it's to make sure that all already accepted connections are served within the grace period.
The main purpose is to remove complexity from gthread by: * Removing the lock for handling self._keep and self.poller. This is possible since we now do all such manipulation on the main thread instead. When a connection is done, it posts a callback through the PollableMethodCaller which gets executed on the main thread. * Having a single event queue (self.poller), as opposed to also managing a set of futures. This fixes benoitc#3146 (although there are more minimal ways of doing it). There are other more minor things as well: * Renaming some variables, e.g. self._keep to self.keepalived_conns. * Remove self-explanatory comments (what the code does, not why). * Just decide that socket is blocking. * Use time.monotonic() for timeouts in gthread. Some complexity has been added to the shutdown sequence, but hopefully for good reason: it's to make sure that all already accepted connections are served within the grace period.
The main purpose is to remove complexity from gthread by: * Removing the lock for handling self._keep and self.poller. This is possible since we now do all such manipulation on the main thread instead. When a connection is done, it posts a callback through the PollableMethodCaller which gets executed on the main thread. * Having a single event queue (self.poller), as opposed to also managing a set of futures. This fixes benoitc#3146 (although there are more minimal ways of doing it). There are other more minor things as well: * Renaming some variables, e.g. self._keep to self.keepalived_conns. * Remove self-explanatory comments (what the code does, not why). * Just decide that socket is blocking. * Use time.monotonic() for timeouts in gthread. Some complexity has been added to the shutdown sequence, but hopefully for good reason: it's to make sure that all already accepted connections are served within the grace period.
The main purpose is to remove complexity from gthread by: * Removing the lock for handling self._keep and self.poller. This is possible since we now do all such manipulation on the main thread instead. When a connection is done, it posts a callback through the PollableMethodCaller which gets executed on the main thread. * Having a single event queue (self.poller), as opposed to also managing a set of futures. This fixes benoitc#3146 (although there are more minimal ways of doing it). There are other more minor things as well: * Renaming some variables, e.g. self._keep to self.keepalived_conns. * Remove self-explanatory comments (what the code does, not why). * Just decide that socket is blocking. * Use time.monotonic() for timeouts in gthread. Some complexity has been added to the shutdown sequence, but hopefully for good reason: it's to make sure that all already accepted connections are served within the grace period.
The main purpose is to remove complexity from gthread by: * Removing the lock for handling self._keep and self.poller. This is possible since we now do all such manipulation on the main thread instead. When a connection is done, it posts a callback through the PollableMethodCaller which gets executed on the main thread. * Having a single event queue (self.poller), as opposed to also managing a set of futures. This fixes benoitc#3146 (although there are more minimal ways of doing it). There are other more minor things as well: * Renaming some variables, e.g. self._keep to self.keepalived_conns. * Remove self-explanatory comments (what the code does, not why). * Just decide that socket is blocking. * Use time.monotonic() for timeouts in gthread. Some complexity has been added to the shutdown sequence, but hopefully for good reason: it's to make sure that all already accepted connections are served within the grace period.
@chris-griffin this has nothing to do with a security issue but a crtitical bug that need to be fixed asap. I missed this ticket in latest release. This need of course to be fixed ASAP. I am looking at it. |
The main purpose is to remove complexity from gthread by: * Removing the lock for handling self._keep and self.poller. This is possible since we now do all such manipulation on the main thread instead. When a connection is done, it posts a callback through the PollableMethodCaller which gets executed on the main thread. * Having a single event queue (self.poller), as opposed to also managing a set of futures. This fixes benoitc#3146 (although there are more minimal ways of doing it). There are other more minor things as well: * Renaming some variables, e.g. self._keep to self.keepalived_conns. * Remove self-explanatory comments (what the code does, not why). * Just decide that socket is blocking. * Use time.monotonic() for timeouts in gthread. Some complexity has been added to the shutdown sequence, but hopefully for good reason: it's to make sure that all already accepted connections are served within the grace period.
The main purpose is to remove complexity from gthread by: * Removing the lock for handling self._keep and self.poller. This is possible since we now do all such manipulation on the main thread instead. When a connection is done, it posts a callback through the PollableMethodCaller which gets executed on the main thread. * Having a single event queue (self.poller), as opposed to also managing a set of futures. This fixes benoitc#3146 (although there are more minimal ways of doing it). There are other more minor things as well: * Renaming some variables, e.g. self._keep to self.keepalived_conns. * Remove self-explanatory comments (what the code does, not why). * Just decide that socket is blocking. * Use time.monotonic() for timeouts in gthread. Some complexity has been added to the shutdown sequence, but hopefully for good reason: it's to make sure that all already accepted connections are served within the grace period.
The main purpose is to remove complexity from gthread by: * Removing the lock for handling self._keep and self.poller. This is possible since we now do all such manipulation on the main thread instead. When a connection is done, it posts a callback through the PollableMethodCaller which gets executed on the main thread. * Having a single event queue (self.poller), as opposed to also managing a set of futures. This fixes benoitc#3146 (although there are more minimal ways of doing it). There are other more minor things as well: * Renaming some variables, e.g. self._keep to self.keepalived_conns. * Remove self-explanatory comments (what the code does, not why). * Just decide that socket is blocking. * Use time.monotonic() for timeouts in gthread. Some complexity has been added to the shutdown sequence, but hopefully for good reason: it's to make sure that all already accepted connections are served within the grace period.
The main purpose is to remove complexity from gthread by: * Removing the lock for handling self._keep and self.poller. This is possible since we now do all such manipulation on the main thread instead. When a connection is done, it posts a callback through the PollableMethodCaller which gets executed on the main thread. * Having a single event queue (self.poller), as opposed to also managing a set of futures. This fixes benoitc#3146 (although there are more minimal ways of doing it). There are other more minor things as well: * Renaming some variables, e.g. self._keep to self.keepalived_conns. * Remove self-explanatory comments (what the code does, not why). * Just decide that socket is blocking. * Use time.monotonic() for timeouts in gthread. Some complexity has been added to the shutdown sequence, but hopefully for good reason: it's to make sure that all already accepted connections are served within the grace period.
hi @benoitc I guess this was deprioritised? I was wondering if you had a chance to verify the fix proposed by @sylt - as I mentioned here it seems to work #3157 (comment) |
apologies @benoitc I double checked and:
because all of the above I am closing this issue, feel free to reopen if you still want it Thanks a lot! :) |
Problem
On receiving multiple connections at the same time I observe a deadlock that causes gunicorn worker to completely freeze forever (the worker timeout does not kill the frozen worker).
This happens when the number of
worker_connections
is smaller than the number of concurrent connections received.Affected versions:
How to reproduce
Run a gunicorn app with gthread worker and these options:
Then run apache bench with enough concurrency (higher than 4):
ab -n1000 -c100 http://0.0.0.0:8080/
I am opening a PR soon with the fix and the script to reproduce this easily (UPDATE: PR #3147).
Proposed solution
~Partial revert of #2918 ~
#3157
The text was updated successfully, but these errors were encountered: