-
Notifications
You must be signed in to change notification settings - Fork 694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexplainable timeout error when using uwsgi behind nginx #1623
Comments
Does the problem still happen when using the current release? |
I'm seeing the same thing with uWSGI 2.0.15, with the same setup. nginx in front of uWSGI vending a Python 3 Flask app. |
For the requests that don't get a timeout, how long are they typically taking? Are they close to the timeout? Also, are there any logs on the uwsgi side? As to the meaning of those numbers in the nginx logs, you'd probably need to ask the nginx folks :) |
Not close at all - timeout is 60 seconds. Most requests are serviced in under 1 second. I see |
@funkybob , I haven't switched to a new version. I did eventually increase my This particular error is only ever triggered when one of my POST endpoints is hit (from another, upstream service). This endpoint is hit relatively few times compared to others. I'm not sure if I have response time data from that service unfortunately. I can say that the payload from that upstream service was received completely as I see this application's database updated ~ which makes the reason for erroring out more puzzling to me. On the uwsgi side, I'm seeing a lot of this:
around the time the error occured. I also see @dsully's uwsgi error message, but on a different endpoint. |
The "uwsgi_response_writev_headers_and_body_do" message comes when uWSGI is trying to send the headers and body of the response in one go. It is not, it seems, a timeout - that's caught later in the same function. So, yes, your app has handled the request, and returned a response for uWSGI to pass on, but something went wrong. Would be interesting to know exactly how long uWSGI is taking processing that request. I wrote a patch for someone recently that would log request details as it was parsed coming in... |
@bow that looks like a regular "the browser went away" message... most common cause is the user hit refresh, but can also be from XHR cancelling the request. There are ways to tell nginx to signal uwsgi about this, but to have your app notice and handle it well is a little tricky. |
This seems very similar to the timeouts I get in #1602. |
Any advancements on this? |
I have the same issue. So far I didn't solve it properly, but I found that when I set processes to 1, problem is gone. My stack:
|
Well, in my case I discover uwgsi is very cpu hungry so I had to increase the vcpus of my instance. |
Well, I haven't set number of vcpus, but number of uwsgi workers to 1 in app uwsgi .ini file. You have 5 in your configuration. I know it is not wanted configuration if you have more cpus, but could you please try it to check whether it fixes 'broken pipe' issue. |
I am facing the same issue. Has anyone found a solution? |
Hi guys I think we had the same problem that uwsgi closed the connection but continued processing and setting/extending the |
I am facing the same issue after I upgraded my app from py2 to py3, with the same nginx and ini files.
Please kindly help me out, thx |
Like the other people above me, I too was pulling my hair out as to why my requests were timing out. My symptoms are as follows. At first, everything would work as expected. But as soon as I hit my endpoints a couple of times a second, it just locks up, and Nginx serves me
The suggestion by @kmadac does wonders, however! Setting I understand that this certainly won't be an appropriate fix for everyone, but it might be a good workaround for others until the underlying issue is found and fixed. My setup:
|
Unfortunately, after testing the 'workaround' some more, the problem started appearing again. So it does not go away; it happens less frequently. To my perception it occurs randomly. |
The following worked for me as work around, as I am temporarily downloading a huge file, might be the case for others.
|
I met the same problem. but, same code works well in case of sigle statnd alone Flask mode. |
For me the issue was from Redis, just on pages when Redis was called. |
same for me, keras uwsgi flask = 504 |
I'm seeing the same issue with an nginx -> uwsgi -> django configuration, all hooked up in a docker-compose network. Here's the trio of messages I'm seeing in the logs:
I then also see my nginx container returning a 499 (Client closed the connection) response code. And my config:
It's probably also worth noting that I saw an uptick in these exceptions after moving my database from the EC2 it shared with my application on to an external RDS. |
@MacFarlaneBro does anything improve if you update to a current release of uWSGI? |
@funkybob I'll try that out and report back |
Exactly the same problem using uwsgi with unix socket: python 3.5.5 / python 3.6.3 Our projects are deployed on a k8s cluster. I have the same problem in all the deployed python 3 projects. python 2.7.14 |
@funkybob So I upgraded to uWSGI 2.0.17 a couple of days ago and I'm still seeing the issue. |
I'm experiencing something similar with uwsgi 2.0.17 behind nginx 1.10.3 (communicating with a socket file, uwsgi protocol), running in a Debian 9 Docker container, with Python 3.6.5. In my case the first requests sometimes work. After some random amount of time (or a condition that I haven't been able to identify; has even happened that no requests ever work), all further requests hang. Nginx times out after the configured
But uwsgi is not logging anything at all for those requests (the initialization messages do come up). |
@adibalcan might be on to something when they mention Redis. The two possibly related bugs I opened (#1602 and #1716) both use Redis. In fact the OSError in #1716 is coming from Redis. @MacFarlaneBro says that the problem became worse when switching to remote RDS but I find the opposite. When I develop locally with Redis/Postgres I constantly get timeouts even as a single user but it's pretty rare in production with Elasticache/RDS until the server comes under heavier load. |
@kylemacfarlane I solved the problem, my Redis connexion lasts just 5 minutes, after my app tries to connect at Redis but takes a very long time(maybe infinite) and the server responds with timeout. |
Also experiencing similar issue of 504 at random. my error on uwsgi log shows different line number:
error on nginx log:
Maybe I started to get troubles when using |
I added this line in my ini file.
It works for me! |
For me, juste |
So what is best config for Behind Nginx when getting too much users online ? |
In my case (Nginx + UWSGI + Flask) the error was:
This happens when the client makes a request and then closes it (either because server took too long to respond or client has been disrupted) but uwsgi is still processing that request. In my case it did not take too long to respond (< 2 seconds), so in my case the issue was the client going away before the response was ready. To remove these errors from the uwsgi log I have added the following to the NGinx config (for uwsgi routes):
Alternatively you could just disable the logging of write errors:
|
Former-commit-id: dc0dfd0
I've experienced the same issue when proxying to uwsgi with envoy instead of nginx, causing envoy to fail requests with uwsgi==2.0.18 |
it worked for me by comment in uwsgi.ini file |
Does lazy-apps even mean anything without master? My undestanding was it would delay initialising python / the app from the master process into the workers. |
explain for lazy-apps: load apps in each worker instead of the master |
is it possible to log how much time uwsgi took in It's helpful to debug which part was responsible for the timeout. |
I am very lucky! it worked for me by comment #master = true in uwsgi.ini file |
I would like to note that none of yall are using If you're trying to maximize throughput and you're responses are always very fast you can get some extra perf by not using it, but then you get into strange situations where a single process gets to serve all the requests - I guess that's a scenario where a timeout could occur if you have few responses that are very slow? https://uwsgi-docs.readthedocs.io/en/latest/articles/SerializingAccept.html |
I have a similar problem but nothing has helped. I'm running a Ubuntu 20.04 (LTS) x64 machine on a Digital oceans droplet. My Flask app opens a socket and sends some information to the external server and waits for a response (it takes around 8 minutes to respond), I tried everything suggested here and more like pinking the server every 50 seconds, running the socket in async task etc. After some time I always get error 504 Gateway Time-out. The app worked great until I installed Nginx and removed the test port. The weird thing is that even when I change the timeout options in nginx.conf and flaskapp.ini there is no difference in time before the timeout (I'm doing a full reboot to make sure new options are used). Edit- Fixed it my adding |
Hello. (I apologize in advance for the places - the translation is crooked, I use google translator ...) hardware and software configuration:
I ran into a problem (I don’t even know what to call it, I’ll try to describe it).
in general, I wrote an API on flask, a regular route, where make_response({'key': 'value'}) is returned. there is a form on the page where it makes a JS request by apishka. only at the same time everything seems to be fine, but at this moment the process (if you look through the "top" goes into Sl status and starts eating 2x operatives from the entire application. main_process has Ssl status, the rest S. Reference: I sinned that in the application I use a search using threading to streamline multi-level for loops and speed up, but that's not the point. I also checked is_alive() of threads and wrote a whole module for processing threads, but again this is not the case (threads do not hang, and if they hang, then everyone is killed) I checked the operation of such parameters on the uWSGI itself in .ini, such as: this problem does not occur on windows, however, on windows I run the server in debug mode. Now let's talk about the specifics and causes of this bug. my configuration, which worked flawlessly, before adding python threads:
The most important! I tried different configuration options, both nginx and uWSGI, below is a list of them (ps. none of the options helped to solve the problem):
also to no avail.
after all the work done, I still did not give up and went even further. I decided to check how exactly the consumption of RAM changes and was very surprised at how unsafe and most likely erroneous the exception handling was written by the developers themselves. at the expense of specific figures for the consumption of RAM, BEFORE THE BUG:
AFTER THE Bug:
as we can see, after some process takes over the processing of the response and subsequently bugs, the consumption of RAM increases strictly by 2 times (excluding caching errors). my final verdict is that the problem is the socket connection and incorrect exception handling when threads fail. although, I want to note that during detailed logging of the work of subthreads, there were no errors of incorrect termination of their work (I mean processing and logging during the operation of the application itself, where I use multithreading). I hope my more detailed comment will significantly speed up the correction of this error. Sincerely, 02.04.2022 I decided to see which files are used by "bugged processes" in comparison with normal ones, and I saw the differences.
the normal process uses:
as we can see, the difference is in "socket" and "urandom"|"random". also, I double-checked the correctness of the responses of flask itself, for errors in the "headers": "Accept", "Content-Type" headers.
ps ### strace -tt -T -p 7851
I hope my addition helps. |
I started facing this problem when I introduced Sentry SDK to my Django project. Fortunately the SDK itself suggested to |
I played with nginx+uwsgi+flask configuration for a long time. Then I find a problem in a python script with concurrent.futures.ProcessPoolExecutor which ran in ~100ms but sometimes stuck with ~30 concurrent requests. |
did you using schedule tools in your project ? |
No. The script in multiprocessor mode reads text files line by line and calculates the Levenshtein distance with the given string. From the CLI, the script works flawlessly. But uwsgi, with dozens of simultaneous requests, sometimes did not return the result. |
Hey guys, I discovered the source of this problem in MY CASE. For some reason saving info in session: |
According to my experience, APScheduler package (v3+) also causes this error |
Judging by the fact that the problem has been pending since 2017, the developers are not even bothered to fix it. |
updating: In my case the sessionid was not being setting because the Postgress was working in localhost but wasnt in AWS. So i just recreate the database and kow everything is working great |
My setup is
In my case, I finally figured out that opening a port from UWSGI directly and exposing it in AWS security group allows request that unfortunately requires more than 60 seconds. I don't think this should be a solution, just a work around.\ Edit: Based on this observation, something between uWSGI and NGINX must be the problem (?) |
Hi everyone,
I'm using uwsgi (version 2.0.13.1) behind nginx (version 1.10.3) to run a Flask application on Python 3 on Ubuntu 14.04.03. The connection to nginx is done via https, but nginx connects to uwsgi via an IPC socket (config below).
I've been getting some timeout errors from nginx that I can't explain. Here are some examples:
Here's my uwsgi config file:
And here's my uwsgi_params settings in nginx:
The error occurs randomly. With the exact same payload, the first request could result in an error, but sending it again it could work fine. So I'm not exactly sure if increasing the timeout would help.
What I also don't understand are also these parts of the log messages:
What does
22887#22887
mean? Why does it say*613 upstream ..
? These numbers are always different per error message.And ultimately, is there a way to reproduce this reliably and then fix it?
The text was updated successfully, but these errors were encountered: