-
Notifications
You must be signed in to change notification settings - Fork 549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
6.0.24 request routing causes massive performance regression #2579
Comments
No rolling restarts are involved so the difference between 2.0.23 and 2.0.24 is that older processes are prioritized over less busy processes. The newness bias is only involved when rolling restarting (ie processes from both before and after a restart was begun). So what you are actually seeing is that older more-warmed-up processes are responding more slowly than less-busy processes, which is surprising, so it's possible that it could be due to another change in 6.0.24, or it might just be that the warmed-up processes are slower, we'll have to gather more data. Can you provide a bit more information: Are you using multi-threading? Can you be more detailed with regards to this bit:
Specifically, is the Passenger process being killed by a watchdog or are the Passenger-managed app processes being killed by Passenger? |
Just a note for myself: I can diff these tags with the following to get the most suspect changes: |
|
Ok so the oldest process is likely being loaded with 10 requests before the next process is chosen, and then Passenger is killing the app process because some requests aren't done in 120s. @FooBarWidget thoughts? |
We did indeed see high thread use and busyness. (I assumed this was more tied to latency and thus just regular use-the-capacity.) Can we maybe round-robin or otherwise better distribute to new passengers, or maybe only start sending to new ones once a threshold of them are ready? (But again we aren't doing rolling restarts, this is happening in new containers on deployment.) |
Can you tell us how you decided on the 12 processes and 10 threads configuration? |
I'm having trouble reproducing this issue. I'm able to load nginx+Passenger Enterprise set to 12 procs and 10 threads with 120 concurrent requests and 50,000 total requests and the worst time I get is 1.6 seconds to finish processing a request+reply, with a mean of 208ms and a median of 185ms. And that's with a 6 core machine, so it's actually a bit over-loaded in that configuration. |
I've recently been testing an upgrade from 6.0.18 to 6.0.24, and have also noticed an increase in request times. I'm not sure if this is the exact issue (I haven't got around to trying out 6.0.23 yet), but there has been no other changes other than updating Passenger and nginx (nginx stayed at 1.18, only applied security updates). I have Locust load testing results below. In 6.0.24, the requests per second has dropped, it has introduced a lot of jitter in the response times, and the average p95 has increased. No change in the median response times. Edit: I'm not using Enterprise, if that matters. I just noticed the changelog for the routing changes has an [Enterprise] tag on it. |
The swap of priority between oldest-process and least-busy-process when routing is also the case in Passenger OSS, the Enterprise bit was the prioritization of new processes during a rolling restart which isn't relevant to this issue. Are you also using the docker image like @johnmadden-sp? |
Not using Docker, I'm running it on Ubuntu 22 on EC2 instances. Used apt to install (https://www.phusionpassenger.com/docs/tutorials/deploy_to_production/installations/oss/aws/ruby/nginx/) Other details:
|
Actually, come to think of it, @joelngwt if you're using OSS Passenger then the concurrency per process is set to 1 by default as you cannot use multithreading, so the new routing would behave exactly the same as before right @FooBarWidget? |
@CamJN 12 procs as the result of cpu count and threads to get better concurrency with i/o waits. (I can't go into a great level of detail publicly of course.) |
@johnmadden-sp are you using sticky sessions by any chance? |
@FooBarWidget I had a thought, since joelngwt should be seeing the same process selection as 6.0.23 due to OSS not having multithreading, could it be that I messed up the changes to SessionManagement.cpp? Could you please take a look and think about if I got the logic right there? And if the changes there are fine, can you suggest an approach for getting the rolling-restart state in the session management methods so I can limit the new routing to during RRs? |
@CamJN no, no sticky sessions. |
Issue report
Are you sure this is a bug in Passenger?
Not necessarily a bug but the implementation change -- we think -- caused a severe increase in latency leading to outages. The 6.0.24 changelog mentions:
We changed the way we route requests. Instead of picking the least-busy process, we now first prioritize new processes first.
Reverting to 6.0.23 fixes the problem.Please try with the newest version of Passenger to avoid issues that have already been fixed
New issue in 6.0.24
Question 1: What is the problem?
Question 2: Passenger version and integration mode:
Question 3: OS or Linux distro, platform (including version):
Question 4: Passenger installation method:
Your answer:
phusion-supplied docker images, but with an
apt-get install libnginx-mod-http-passenger-enterprise
the 6.0.24 upgrade takes place.Question 5: Your app's programming language (including any version managers) and framework (including versions):
Question 6: Are you using a PaaS and/or containerization? If so which one?
The text was updated successfully, but these errors were encountered: