Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to obtain > 40 RPS after migrating to our own Parse server #2030

Closed
sohagfan opened this issue Jun 10, 2016 · 22 comments
Closed

Unable to obtain > 40 RPS after migrating to our own Parse server #2030

sohagfan opened this issue Jun 10, 2016 · 22 comments

Comments

@sohagfan
Copy link

Our setup is as follows:

  • Elastic Beanstalk
  • 64bit Amazon Linux 2016.03 v2.1.1 running Node.js (v4.4.3)
  • nginx proxy server
  • m4.large single instance hosted in Virginia
  • Parse Server (v2.2.11)
  • Mongo DB driver (v2.1.18)
  • mLab dedicated cluster (v3.0.10)

We are getting an average of only around 25 RPS and a peak of 40 RPS. When we exceed the peak, we see high latency and connection dropped errors in the logs.

An example error from the nginx log is as follows:

"2016/06/09 06:46:34 [error] 2684#0: *254 upstream prematurely closed connection while reading response header from upstream, client: , server: , request: "GET /1/classes/. . . host: “www.example.com""

With the same application, on the hosted Parse.com, we were able to scale to as high as required. We were able to request 70+ RPS successfully without any requests dropped.

Are there any configuration changes in any of the setup mentioned above (EB, Node.js, nginx, Parse Server, Mongo DB driver, mLab) or some other that we have not mentioned or missed to get a better performance?

If you have a better performance, what is your setup?

Any pointers / comments will be much appreciated.

@bohemima
Copy link
Contributor

You might want to have a look at the parse-server logs (run with VERBOSE=1 environment variable) and also have a look at indexes and enable profiling in your mongod to pinpoint slow running queries.

@flovilmart
Copy link
Contributor

You may wanna run on multiple smaller instances

@Knana
Copy link

Knana commented Jun 23, 2016

At NodeChef we have customers performing 150+ req/sec doing complex ad-hoc queries with just two 512 MB RAM app containers before they even run into the issue you describe. Our databases run on 8 physical cores that is the equivalent of either c4.2xlarge or r3.2xlarge on AWS. We use bare metal infrastructure providing the the best performance. We also provide you with RPS stats in real-time as well so you can gauge this for yourself. We can help you get started if interested.

@flovilmart
Copy link
Contributor

RAM is definitely not an issue with parse-server, as it's much more I/O bound and CPU bound, that's what I experience on my side on GAE, We'd be able to process ~100 rps on 2 instances while maintaining CPU < 50% with n1-highcpu-2 (2vCPU, 1.7Gb of ram each).

@sohagfan
Copy link
Author

Thanks for all your comments and suggestions.

@bohemima: We have run the server with VERBOSE=1, but have had no further insight. Thanks for your suggestion of building indexes in Mongo DB to address slow queries. We have done this previously and may need to continue doing this. However, we don't feel this is the issue, as the queries that we run during the regular course of our application have optimized indexes.

@flovilmart and @Knana: Thanks for your suggestion to run on multiple smaller instances and for the performance information you have provided. In order to test the limits of a single server, we disabled autoscaling. It is good to know that you were able to achieve 150+RPS with two containers.
Question: Do you see a temporary performance hit when a new EC2 instance fires up?

Your suggestions are all good suggestions, however we suspect our problem is in the interaction between the Parse server and our application.
One more question: Do you use Node.js profiling tools? We need to gain insight into where the request-response loop is being delayed within the stack.

Thanks in advance.

@kranzky
Copy link

kranzky commented Jul 6, 2016

We are using NodeChef and currently run 10 256MB app containers. No queries to mongo take more than 20ms, and most take a tiny fraction of that, and yet we do have performance issues. The problem seems to be a rather complex cloud code function which performs a query which returns 10 objects, and then performs an additional 4 queries for each of the returned objects, resolving these 40 queries in a single big Parse.Promise.when(...).

This wouldn't be an issue, I don't think, if each of those additional queries was hitting mongo directly. Which would make sense to me; they're cloud code functions running within the server. But they don't, each additional query hits the app containers which, apart from introducing inefficiencies, means that even running this single cloud code function causes requests to our parse api to queue up. The end result is that this single cloud code function can take up to ten seconds to complete. I just don't get it.

@lpremraj
Copy link

lpremraj commented Jul 6, 2016

Thanks Jason for taking the time to respond.
Much appreciated. We'll look into this and respond
Thanks Again
Prem

On Wednesday 6 July 2016, Jason Hutchens [email protected] wrote:

We are using NodeChef and currently run 10 256MB app containers. No
queries to mongo take more than 20ms, and most take a tiny fraction of
that, and yet we do have performance issues. The problem seems to be a
rather complex cloud code function which performs a query which returns 10
objects, and then performs an additional 4 queries for each of the returned
objects, resolving these 40 queries in a single big
{{Parse.Promise.when(...)}}.

This wouldn't be an issue, I don't think, if each of those additional
queries was hitting mongo directly. Which would make sense to me; they're
cloud code functions running within the server. But they don't, each
additional query hits the app containers which, apart from introducing
inefficiencies, means that even running this single cloud code function
causes requests to our parse api to queue up. The end result is that this
single cloud code function can take up to ten seconds to complete. I
just don't get it.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#2030 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AIt2fD1zerV8GbPSMBRwQXoU6lKBeqZGks5qS2xSgaJpZM4IzY5G
.

@sohagfan
Copy link
Author

@flovilmart and @Knana: We are revisiting this issue which remains a problem for us. Thanks once again for your previous helpful suggestions and insights. I had a couple of new questions for both of you.

As we know, RPS is not available from the Elastic Beanstalk Monitoring graphs - only latency, CPU utilization, network bytes in and out, network packets in and out.

My questions are:

Q1. How are you measuring RPS? Are you calculating it using some benchmark tests or are you inferring it based on what it was when you had pointed your app to hosted parse.com or something else altogether?

Q2. Are you using primarily Cloud Code functions (/functions endpoint) or are you using the /batch endpoint or /class endpoint or a mixture of some or all of these?

@jasonhutchens , this may be of interest to you: in one of our benchmark tests, using:

  • Autoscaling 1->4 on
  • Autoscale trigger latency >= 2 seconds
  • c4.xlarge instances
  • 4 instances active
  • the rest of the setup being similar as mentioned in the first post,

we saw a latency of 20-25 seconds with VERBOSE environmental variable set. When we removed the VERBOSE environment variable latency dropped an order of magnitude to between 1 and 2 seconds, all though 4 instances remained active.

@kranzky
Copy link

kranzky commented Jul 28, 2016

thanks @sohagfan; I'll benchmark with verbose logging disabled. although I still don't understand why Parse queries made from cloud code functions still need to be routed back through the api layer when they could bypass a lot of that?

@sohagfan
Copy link
Author

@jasonhutchens: sure, no problem. Hope it helps. You may be aware of this already, but in case you aren't: you have to actually delete the environmental variable for it to stop taking effect; it doesn't matter if the variable exists and its value is 1 or 0; setting the value to 0 makes no difference, it would be as though the variable is still set.

@kranzky
Copy link

kranzky commented Jul 28, 2016

@sohagfan no, we're running on NodeChef; I'll let them know about this

@Knana
Copy link

Knana commented Jul 28, 2016

@sohagfan see below responses to your questions

  1. NodeChef metrics measure request/second as well as number of connections in real-time. So we are actually inferring the RPS on a live app the customer is using in production. This information is available from our dashboard. The RPS is calculated by summing the number of request dispatched to all app containers within a second and the connections simply measure the number of open sockets for the app.
  2. What we measured is typically a mixed workload scenario, direct class queries as well as cloud code making queries back to query a class and so on.

hope this helps.

@bkprabhak
Copy link

@flovilmart please let me know what tests you ran to determine you got ~100 rps with 2 n1-highcpu-2 machine? We are running tests on similar machines on AWS Elastic Beanstalk through REST API and are seeing much worse performance numbers. Should we also expect a difference in performance between Cloud Clode methods triggered using the REST API v clients (through Parse iOS and Android SDKs)?

Also, what trigger did you setup for autoscaling Latency/CPU/Networkout?

@drew-gross @hramos Happy to hear what others are doing as well. We are stuck with our load testing at the moment and it's preventing us from moving to production.

Let me know what tests you recommend for load testing our dev environment so that we can feel comfortable moving to production. Most of our load comes from clients and we are not sure how to simulate this due to the latency issues we have noticed when running benchmark tests through the REST API.

@flovilmart
Copy link
Contributor

For now there should be no difference between cloud code and the client SDK's as all cloud code request go though the HTTP interface. There is a pull request that attempts to run cloud code with direct access to the JS interface instead of the HTTP one.

@reasonman
Copy link

@flovilmart I'd be interested in how you got to your 100+ RPS number as well. In my tests using a single n1-highcpu-2 on GCE and testing with Locust, I can get between 20-30 RPS before the CPUs peg. I didn't try 2 but suspect I would only get around double what I'd get with a single instance.

For reference, my load tests using Locust via REST against a clustered Parse instance(PM2: https://nodejs.org/api/cluster.html#cluster_cluster and http://pm2.keymetrics.io/docs/usage/cluster-mode/):

f1-micro-1: ~5 RPS @ ~40ms response time
n1-highcpu-2: ~30 RPS @ ~400ms response time
n1-highcpu-4: ~70 RPS @ ~600ms response time

@flovilmart
Copy link
Contributor

@reasonman we don't use PM2 nor cluster, I spawned 10 AWS instances, and queried random objects from our DB.

In general we see the request times below 30ms with stack driver.

Since then, we changed our setup to cap at 50rps and the CPU is still below 35%.

Note that the DB is in the same zone as the servers.

@kranzky
Copy link

kranzky commented Aug 16, 2016

@flovilmart I'm using Parse server on nodechef and definitely have problems with the SDK routing everything through the HTTP interface when running cloud code functions. Right now this blocks us from moving our production apps from Parse to Parse Server.

I spun up a simple nodechef Parse instance with one app server to demonstrate the issue. I deployed the following main.js to cloud code:

Parse.Cloud.define("niceFunction", function(request, response) {
  response.success("Hello world!");
});

function allUsers(num) {
  var query = new Parse.Query(Parse.User);
  return query.find().then(function() { return num; });
}

Parse.Cloud.define("insaneFunction", function(request, response) {
  var num = request.params.num || 1;
  var promises = [];

  for (var i = 0; i < num; ++i) {
    promises.push(allUsers(i));
  }

  Parse.Promise.when(promises).then(function(results) {
    response.success(results);
  }, function(error) {
    response.error(error);
  });
});

I called niceFunction from Postman and got a response latency of 2ms according to the nodechef stats. I then called insaneFunction without any parameters, and got a response latency of 29ms for the function call, noting that an additional API query to _User was made prior to the request completing (with a response latency of 8ms).

I then called insaneFunction with the num parameter set to 10 and got a response latency of 150ms, with the response latency of the 10 separate API queries to _User ranging between 14ms and 34ms.

So routing queries through the HTTP interface is causing requests to queue up, basically serialising what should be parallel operations. What's concerning to me is that many of our production cloud code functions perform many more than 10 queries when called.

My expectation is that calling insaneFunction with num set to 10 should only send a single request to the API, and should run 10 mongo queries in parallel, completing in about 15ms, not 150ms. We shouldn't need to scale out our app servers to handle this kind of workload :)

Now, having said all that, I suspect Parse also suffered from the same issue. I'm just curious to know what it would take to have the SDK talk directly to Mongo from cloud code to avoid having functions spawn requests that queue up at the HTTP interface? I'm hopeful this would be a performance win, and would avoid the problem of a single function call blocking access to our entire cluster (which is what happens at the moment with our staging apps, which use 10 app servers, and which have cloud code functions that may fire of 100 queries, causing all 10 app servers to do work to serve a single request).

@flovilmart
Copy link
Contributor

There is a PR for that #2316

@kranzky
Copy link

kranzky commented Aug 16, 2016

@flovilmart cheers, I'll test that on our nodechef instance (with help from the team there) in the hope that this will improve our numbers. thanks!

@flovilmart
Copy link
Contributor

Actually, there are some things to fix in that Pr before you can roll it out confidently

@kranzky
Copy link

kranzky commented Aug 17, 2016

@flovilmart yes, understood... I just meant we'd do some testing, and I can confirm that the insaneFunction which previously took 150ms now takes 40ms to complete, which is a very nice optimisation. looking forward to being able to use this in production :)

@flovilmart
Copy link
Contributor

flovilmart commented Sep 3, 2016

I'm gonna close that issue as this turned into a discussion and we're explored different ways to improve the server performance.

Also #2316 will probably land in the next version protected by the EXPERIMENTAL flag.

Default cluster support from the CLI #2596 will also appear next version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants