-
Notifications
You must be signed in to change notification settings - Fork 7.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profiling Memory Usage and Object Creation #1204
Comments
The most likely source of garbage is the |
The |
I'll also like to help out here, since its crucial for our uses as well. |
Thanks @daschl I'd appreciate your help. Profiling and identifying hot spots is what we need most right now. |
I did some GC profiling of my test workloads and I'd also like to nominate:
The bad news is that i had to fall back out of Observables to plain execution on the hot code path (aside from the overall wrapping observable), because also using it in the path produces way too much garbage (moving away from Rx in the hot code path got my throughput from 20% to 80% according to the GC logs) and it correlates with my findings since I could not sustain constant IO throughput because of full GCs happening way too frequently. |
Not surprised on this. Were you able to identify what the garbage is? |
We can definitely improve on the |
I think that many |
I went back in history to 0.16.1 to compare performance of the basic Here is the code for the test:
Results0.16
Master
GCOn the master branch test I'm seeing GC results like this:
versus 0.16
SummaryUnless I'm mistaken, current code is better:
I'll start profiling this and improve ... but this does not reveal the source of the problems seen. Possibly it's related to schedulers, or it's a specific operator. I exercised |
The @GenerateMicroBenchmark
public void observeOn(UseCaseInput input) throws InterruptedException {
input.observable.observeOn(Schedulers.computation()).subscribe(input.observer);
input.awaitCompletion();
}
Thus, with an |
By the way, all testing is just being done on my Mac laptop ... so these numbers are all relative and not representative of proper server hardware. |
Converting from
to this:
|
@benjchristensen I suppose the |
If you want me to run a specific workload/type of test let me know so we can compare results. |
I've been experimenting with FieldUpdaters and Unsafe for the |
@akarnokd since RxJava also runs on android, I'm not sure how good/standard the support is there. I know that the netty folks are having the same issues and they are wrapping those Unsafe calles in a PlatformDependent util class. |
This sounds like a valid approach for us. As we mature Rx we'll want to squeeze as much performance out of it as we can, while still remaining portable. |
Testing with:
May 21st 0efda07
May 26th a34cba2
According to these results we got slower (though within mean error it appears, so if not slower, then no better). |
Which Java version is this? Java 6 intrinsics isn't as good as the newer versions. Maybe the |
/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk |
Master branch with /Library/Java/JavaVirtualMachines/jdk1.8.0_05.jdk
|
Here is a simple test without JMH (but using same coding pattern) that shows significant increases in throughput from 0.16 -> 0.17 -> 0.18 -> current master branch for this code: public void mapTransformation(UseCaseInput input) throws InterruptedException {
input.observable.map(i -> {
return String.valueOf(i);
}).map(i -> {
return Integer.parseInt(i);
}).subscribe(input.observer);
input.awaitCompletion();
} master
Version 0.18.3
Version 0.17.6 (using
|
Very good progress! I'll get back to profiling from master next week. |
I run some benchmarks with |
Well that's odd and doesn't help much when two different ways of measuring are giving contradictory results :-( |
This is creating lots of |
Those were from 0.18.2 ... now with Master, plus a modified |
The Then the master branch with The issue is definitely the |
I've added some logging to our production instances and discovered that the cause of large Subscription arrays we see in practice is caused by a prefetching operation which generates many (> 500) This seems like a valid case to support and any work which improves performance for large Subscription arrays would be a meaningful improvement. |
- significant reduction in object allocations - details on research available at ReactiveX#1204
I have submitted a pull request for this: #1281 We are testing the code in our environment shortly. |
- significant reduction in object allocations - details on research available at ReactiveX#1204
For anyone wanting to dig into this, Java Flight Recorder has been very helpful, and far better than the other profiling tools I've used and tried for this. |
- significant reduction in object allocations - details on research available at ReactiveX#1204
- significant reduction in object allocations - details on research available at ReactiveX#1204
- significant reduction in object allocations - details on research available at ReactiveX#1204
Backporting to 0.18.x in #1283 |
Superb work guys, I've held off on 18.x on Android as I noticed an increase in GC, really glad you guys take this seriously! |
Thanks @chrisjenx ... it looks like the most glaring issues are resolved, and low hanging fruit taken care of. There are a few other things for us to improve on, but I think we'll release 0.19 early next week. I would appreciate your feedback on whether you see an improvement. I have also opened #1299 to document our attempts on blocking vs non-blocking implementations and to seek input from anyone who can provide better solutions. |
@akarnokd Is there anything else that stands out to you that we should fix before closing this issue? I'll continue doing some profiling but it seems the obvious ones are done. We'll continue working on performance going forward and those can have their own issues and pull requests, so if nothing else obvious stands out let's close this issue and not leave it open-ended. |
The history List in ReplaySubject; since ArrayList uses more memory then actual items, it might be worth compacting it on a terminal state (one time, but might be costly and run out of memory). Alternatively, it could use fixed increment expansion strategy. Third option is to have a cache() overload that passes in a capacity hint to reduce reallocation and wasted space. |
I think that the object allocation penalty of resizing after a terminal event would be worse. A cache() overload that takes a capacity hint may be valuable, particularly in the single item case where it could just a single volatile ref instead of an array. |
I have opened a new issue for the cache() overload: #1303 |
I'm closing this issue out as I believe we have handled the most glaring problems and don't want this to be a never-ending issue. We will of course continue seeking to improve performance, but let's use individual issues for each improvement/problem we find. Thanks everyone for your involvement on this one as it was rather significant and important. @Xorlev and @daschl I would appreciate feedback once you've had a chance to try the changes in the master branch (or the portion that was backported to 0.18.4) to know if you see the improvements or still have issues. @Xorlev In particular I'd like to know if the issue you had was only the GC pressure, or if you still see signs of a memory leak (which I have not seen yet). |
@benjchristensen Hystrix 1.3.16 w/ RxJava 0.18.4 has been in prod for about a day now, I'm happy to report a decrease in garbage (and CPU usage in general). I believe the pressure & the suboptimal subscription removal was causing the leak-like behavior. @mattrjacobs's use case matches a few of our own (fan out commands, wait on all), which is likely the source of the large numbers of subscriptions. I'll keep an eye out for any similar issues that might crop up. Thanks a lot for all the help and dedication to improving RxJava. |
Excellent. Thank you @Xorlev for the confirmation. I'll release Hystrix 1.3.17 in a few days hopefully with RxJava 0.19 as a dependency, and at least one performance optimization I found I can do in Hystrix directly. |
We need to spend time profiling memory and object allocation and finding places where we can improve.
I would really appreciate help diving into this and finding problem areas. Even if you don't fix them, but just identity use cases, operators, etc that would be very valuable.
This is partly a result of the fact that in Netflix production we have seen an increase in YoungGen GCs since 0.17.x.
The areas to start should probably be:
If you can or want to get involved in this please comment here so we all can collaborate together.
The text was updated successfully, but these errors were encountered: