Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to any version > 0.18.2 more than doubles CPU usage #209

Closed
Lidbetter opened this issue Jun 22, 2017 · 11 comments
Closed

Upgrade to any version > 0.18.2 more than doubles CPU usage #209

Lidbetter opened this issue Jun 22, 2017 · 11 comments
Assignees
Milestone

Comments

@Lidbetter
Copy link

Hi,

Using Rollbar 0.18.2:
We normally see 2-4k requests per min come through our load balancer, which are then served by 2 AWS m1.large instances at ~40-60% CPU utilization

Using Rollbar 1.0.1 or 1.1.1 (tried to upgrade twice):
Our 2 AWS m1.large instances immediately jumped to 90-100% CPU utilization, we then scaled to 5 AWS m1.large instances to support the same amount of traffic at ~40-60% CPU utilization

After experiencing the issue with 1.0.1 I had hoped that this pr: #158 would fix the issue we ran into. That does not appear to be the case.

When upgrading the library, the entirety of our diff is as follows:

// setup rollbar location:
 -    Rollbar::init([
 +    \Rollbar\Rollbar::init([

// composer.json
 -    "rollbar/rollbar": "^0.18.2",
 +    "rollbar/rollbar": "^1.1.1",

// composer.lock
// snipped, no other dependencies were changed

// manual reporting helpers
function reportErrorMessage($message, $extra_data = []) {
 // ... snip
 -    \Rollbar::report_message($message, Level::ERROR, $extra_data);
 +    \Rollbar\Rollbar::log('error', $message, $extra_data);
}

function reportErrorMessage($message, $extra_data = []) {
 // ... snip
 -    \Rollbar::report_message($message, Level::ERROR, $extra_data);
 +    \Rollbar\Rollbar::log('error', $message, $extra_data);
}

function reportException(Exception $exception, $extra_data = []) {
 // ... snip
 -    \Rollbar::report_exception($exception, $extra_data);
 +    \Rollbar\Rollbar::log('error', $exception, $extra_data);
}

function notifyException(Exception $exception, $extra_data = []) {
 // ... snip
 -    \Rollbar::report_exception($exception, $extra_data, [
 -        'level' => 'info',
 -    ]);
 +    \Rollbar\Rollbar::log('info', $exception, $extra_data);
}

There were no additional factors/changes which were introduced at the same time as the Rollbar upgrade.

Please let me know if there is any other information I can provide which would help with tracking down the cause of this issue.

Thanks.

@rokob
Copy link
Contributor

rokob commented Jul 6, 2017

Hey sorry about this, I have been working on profiling the library and fixing the hot spots, this PR #217 is the work. The main changes from 0.18.2 to 1.0 are that we are doing more scrubbing/truncation work and the logs no longer are batchable. So I am addressing those two areas. Will update you with what I find.

@rokob rokob added this to the v1.3.0 milestone Jul 6, 2017
@Lidbetter
Copy link
Author

Thanks for the update, looks good so far

@rokob rokob self-assigned this Jul 7, 2017
@cordoval
Copy link
Contributor

@rokob are you using blackfire or what? just curious 👍 great job

@rokob
Copy link
Contributor

rokob commented Jul 20, 2017

Xdebug and Webgrind

@elazar
Copy link
Contributor

elazar commented Aug 10, 2017

@rokob Was this was resolved by #217? This comment from that PR indicates that the performance fixes are behind a configuration flag that's disabled by default and doesn't appear to be documented anywhere. Can you provide further details here?

@ArturMoczulski
Copy link
Contributor

@elazar I believe the configuration flag is batched

@rokob
Copy link
Contributor

rokob commented Aug 14, 2017

So there are a few potential issues that could have been causing performance problems. I fixed all of the ones that I could find. I also reintroduced the ability to send errors in batches instead of as they arise. However, the way we send batched errors is different as we no longer support an api endpoint that accepts a batch of errors. Instead we are using libcurl's multiplexing feature. So you can choose to batch errors with the boolean config parameter batched, and you can configure the size of the batch with batch_size which is 50 by default.

So by default a decent chunk of the performance improvements are already part of the current release as they do not change the way requests are sent. However, if you want to try turning on batching you can choose to do so, but note that it may have a positive or negative impact on your performance depending on the actual workload of your app.

@Lidbetter
Copy link
Author

Thanks for your effort on this @rokob we will be testing out 1.3.1 this week (probably today).

@rokob
Copy link
Contributor

rokob commented Aug 19, 2017

Going to close this, please re-open or open a new issue if there is still a problem.

@rokob rokob closed this as completed Aug 19, 2017
@Lidbetter
Copy link
Author

We still experienced a massive performance hit (pretty much unchanged from before - with both batching (50) on and off). I realize that is not super helpful in narrowing down where the problem is. Just got approval for blackfire, so will benchmark the differences and hopefully narrow down where the issue lies.

@nandgate7400
Copy link

I'd like to chime in that I'm also experiencing massive performance issues. I updated to the latest version (1.3.1) and my server load goes up by a factor of 20 or more. It is unusable in a production environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants