Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues after upgrading from 1.0.4 -> 1.0.8 #2857

Closed
neerajrj opened this issue Apr 6, 2015 · 16 comments
Closed

Performance issues after upgrading from 1.0.4 -> 1.0.8 #2857

neerajrj opened this issue Apr 6, 2015 · 16 comments
Labels

Comments

@neerajrj
Copy link
Contributor

neerajrj commented Apr 6, 2015

Hello,

We are seeing high levels of lock contention after our upgrade to RxJava 108

Apologize for not having a unit test to reproduce this we have a fairly complex system and we are having trouble figuring out which areas to dig deeper to find a reproducible case.

This is a paste from a JMC view. As far as we know nothing should be getting unsubscribed in our application.
We would appreciate if anyone can shed some light on what kind of behavior would trigger the stack below.

Stack Trace Sample Count    Percentage(%)
java.util.concurrent.locks.LockSupport.unpark(Thread)   1,113   76.233
 java.util.concurrent.locks.AbstractQueuedSynchronizer.unparkSuccessor(AbstractQueuedSynchronizer$Node) 1,113   76.233
      java.util.concurrent.locks.AbstractQueuedSynchronizer.release(int)    1,113   76.233
         java.util.concurrent.locks.ReentrantLock.unlock()  1,113   76.233
            rx.internal.util.SubscriptionList.remove(Subscription)  1,113   76.233
               rx.internal.schedulers.ScheduledAction$Remover2.unsubscribe()    1,113   76.233
                  rx.internal.util.SubscriptionList.unsubscribeFromAll(Collection)  1,113   76.233
                     rx.internal.util.SubscriptionList.unsubscribe()    1,113   76.233
                        rx.internal.schedulers.ScheduledAction.unsubscribe()    1,113   76.233
                           rx.internal.schedulers.ScheduledAction.run() 1,113   76.233
                              java.util.concurrent.Executors$RunnableAdapter.call() 1,113   76.233
                                 java.util.concurrent.FutureTask.run()  1,113   76.233
                                    java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor$ScheduledFutureTask)    1,113   76.233
                                       java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()   1,113   76.233
                                          java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)  1,113   76.233
                                             java.util.concurrent.ThreadPoolExecutor$Worker.run()   1,113   76.233
                                                java.lang.Thread.run()  1,113   76.233

@akarnokd
Copy link
Member

akarnokd commented Apr 6, 2015

I guess you are scheduling a lot of work on the computation scheduler. We changed the tracking of tasks from using synchronized to j.u.c.Lock because it gives better throughput according to our JMH benchmarks. It appears task addition takes longer while holding the lock and the unsubscribe part spins and parks; most of the time, spin should be enough. Do you measure performance degradation?

@neerajrj
Copy link
Contributor Author

neerajrj commented Apr 6, 2015

We do not have formal performance benchmarks on our jobs. We autoscale our cluster based on resource utilization and we saw our cluster size go up by about 30% - 100% depending on the workload.
To give you a bit more context we have built a reactive stream processing system that basically reads data items as they flow in and processes them on a computation threadpool using observeOn. In terms of scale the jobs seeing the most impact process <= 10k messages/sec of size a few kBs each

@akarnokd
Copy link
Member

akarnokd commented Apr 6, 2015

Sounds like your data rate reaches a critical frequency where the submission of new values in observeOn overlaps its drain and thus causes extra contention. The change from 1.0.4 to 1.0.8 consists of two parts: lock in SubscriptionList and the use of SubscriptionList instead of CompositeSubscription for non-timed tasks inside the computation scheduler. What is the java version you are running and can you name the virtualization environment?

@neerajrj
Copy link
Contributor Author

neerajrj commented Apr 6, 2015

Hey David,

We are on Java 8 and we running in a Mesos container inside an AWS instance
(m3-2xl series)

Thanks
Neeraj

On Mon, Apr 6, 2015 at 11:56 AM, David Karnok [email protected]
wrote:

Sounds like your data rate reaches a critical frequency where the
submission of new values in observeOn overlaps its drain and thus causes
extra contention. The change from 1.0.4 to 1.0.8 consists of two parts:
lock in SubscriptionList and the use of SubscriptionList instead of
CompositeSubscription for non-timed tasks inside the computation scheduler.
What is the java version you are running and can you name the
virtualization environment?


Reply to this email directly or view it on GitHub
#2857 (comment).

@akarnokd
Copy link
Member

akarnokd commented Apr 6, 2015

Thanks. Not sure what's the cause, but you could try shifting the contention by using batching before the observeOn and unbatch after it;

source.batch(4).observeOn(Scheduler.computation()).concatMap(v -> Observable.from(v))

@davidmoten
Copy link
Collaborator

That would be buffer rather than batch, right? I was going to suggest the
same thing, sounds like a fast round robin on a scheduler.

Thanks. Not sure what's the cause, but you could try shifting the
contention by using batching before the observeOn and unbatch after it;

source.batch(4).observeOn(Scheduler.computation()).concatMap(v ->
Observable.from(v))


Reply to this email directly or view it on GitHub
#2857 (comment).

@akarnokd
Copy link
Member

akarnokd commented Apr 6, 2015

@davidmoten Sure.

@neerajrj
Copy link
Contributor Author

neerajrj commented Apr 6, 2015

Hey David,
Batching is a good suggestion however that would be a breaking change

On Mon, Apr 6, 2015 at 1:02 PM, David Karnok [email protected]
wrote:

@davidmoten https://github.com/davidmoten Sure.


Reply to this email directly or view it on GitHub
#2857 (comment).

@akarnokd
Copy link
Member

akarnokd commented Apr 6, 2015

This is why I suggested using concatMap to flatten the batches again after the observeOn so your subsequent computation chain doesn't need to change.

@neerajrj
Copy link
Contributor Author

neerajrj commented Apr 6, 2015

Oops hit the send button too soon.

As I was saying adding a batch or buffer would change our public API from T
to a List.

You mentioned earlier you saw significant throughput improvement in the JMH
tests can you provide more details, I am curious why the JMH tests do not
reflect the performance we see in production.
Were there any other benefits (did it resolve the issue of the NPE in merge
as well ?)

Thanks
Neeraj

On Mon, Apr 6, 2015 at 1:04 PM, Neeraj Joshi [email protected] wrote:

Hey David,
Batching is a good suggestion however that would be a breaking change

On Mon, Apr 6, 2015 at 1:02 PM, David Karnok [email protected]
wrote:

@davidmoten https://github.com/davidmoten Sure.


Reply to this email directly or view it on GitHub
#2857 (comment).

@neerajrj
Copy link
Contributor Author

neerajrj commented Apr 6, 2015

Ah ok, I see now, let me try out your suggestion. Thanks!

@akarnokd
Copy link
Member

akarnokd commented Apr 6, 2015

There are 2 PRs that did perf enhancements:

#2603, #2773

Benchmark                                   (size)     1.0.4      PR 2603    PR 2773
r.s.ComputationSchedulerPerf.observeOn           1  104110.926  115707.286  113905.358
r.s.ComputationSchedulerPerf.observeOn        1000    3212.434   13020.027   28618.423
r.s.ComputationSchedulerPerf.observeOn     1000000       9.508      16.559      32.166

@akarnokd
Copy link
Member

I've run into this performance degradation and indeed, for some concurrency level (4+ in my case) the degradation was enormous. Could you check if PR #2912 fixes your case?

@neerajrj
Copy link
Contributor Author

Hey David,
Our current setup doesn't lend itself to trying out a jar at scale (I can
only do local tests). I will try it out but the results may not be the same
at production scale.
Thanks for the fix.
-Neeraj

On Thu, Apr 23, 2015 at 8:56 AM, David Karnok [email protected]
wrote:

I've run into this performance degradation and indeed, for some
concurrency level (4+ in my case) the degradation was enormous. Could you
check if PR #2912 #2912 fixes
your case?


Reply to this email directly or view it on GitHub
#2857 (comment).

@neerajrj
Copy link
Contributor Author

Hey David,

Looks like the performance is back to what it used to be with release 1.0.4 after we upgraded to RxJava 1.0.10! Looks like your fixes worked
Thanks!

@akarnokd
Copy link
Member

@neerajrj Hi and thanks for confirming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants