Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.x: overhead reduction for merge and flatMap #3476

Merged
merged 1 commit into from
Feb 21, 2016

Conversation

akarnokd
Copy link
Member

Changes to the scalar fast-path was inspired by the Project Reactor's flatMap which was in turn inspired by RxJava 2.x's implementation of flatMap.

Naturally, this will conflict with #3169 .

Benchmark for comparison (i7 4770K, Windows 7 x64, Java 8u66):

image

Just by applying the scalar re-batching, the operator gained a massive 45% throughput increase, from 48 MOps/s to 71 MOps/s.

When the range optimization is also applied, the improvement is even more impressive: +60% throughput, from 48 MOps/s to 79 MOps/s.

The optimization doesn't really affect rangeFlatMapRange, it has a larger run-to-run variance due to GC.

I'm experimenting with the 2.x branch as well and by applying these two optimizations, the throughput increasd from 40 MOps/s to 58 MOps/s. I'm investigating if switching to synchronized would help with the remaining overhead gap.

Note also that the perf tests measure the operator overhead only.

@akarnokd akarnokd added this to the 1.0.x milestone Oct 28, 2015
@akarnokd
Copy link
Member Author

Rebasing...

@akarnokd akarnokd force-pushed the FlatMapRangePerfFix1x branch from 7c7bb7a to a8ce5e3 Compare November 10, 2015 22:01
@akarnokd
Copy link
Member Author

... rebased, sort of.

@@ -488,7 +502,10 @@ protected void emitScalar(T value, long r) {
if (r != Long.MAX_VALUE) {
producer.produced(1);
}
this.requestMore(1);
if (++scalarEmission == scalarLimit) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@artem-zinnatullin
Copy link
Contributor

👍/2, I was able to understand what you did for Merge, but completely lost in tries of understanding what's going on in Range… Those one-lettered variables make code totally unreadable :(

@akarnokd can you please somehow improve readability of slowPath in OnSubscribeRange?

Performance improvement looks awesome, thank you for doing that!

@akarnokd
Copy link
Member Author

For me, it is the opposite; reading long variable names and methods drains my focus. You can always post a cleanup PR.

@davidmoten
Copy link
Collaborator

@akarnokd there might be some happy medium here. Write your code using one
letter variables. Finish up refactoring some names so more readable. These
terse PRs are harder to evaluate and part of the evaluation is judging the
author's intent based on variable naming.

On 11 November 2015 at 20:00, David Karnok [email protected] wrote:

For me, it is the opposite; reading long variable names and methods drains
my focus. You can always post a cleanup PR.


Reply to this email directly or view it on GitHub
#3476 (comment).

@akarnokd akarnokd force-pushed the FlatMapRangePerfFix1x branch from a8ce5e3 to 5865f4a Compare November 11, 2015 09:40
@akarnokd
Copy link
Member Author

Fine, I've renamed the variables. Now it doesn't fit on my screen and I can't tell what is what because all variable names look the same at glance. Distinguishing between e, n, r, i, o is much easier; a performance optimization for the mind.

@akarnokd akarnokd force-pushed the FlatMapRangePerfFix1x branch from 5865f4a to 3b93232 Compare January 13, 2016 22:49
@akarnokd
Copy link
Member Author

Rebased onto main.

scalarEmissionLimit = Integer.MAX_VALUE;
request(Long.MAX_VALUE);
} else {
scalarEmissionLimit = Math.max(1, maxConcurrent >> 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small question: any reason for setting scalarEmissionLimit to maxConcurrent / 2? Or just pick up a magic number 2?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to use 50% - 75% of the prefetch as the rerequest amount. It's a heuristic and its optimal value depends on the source emission pattern. The idea is to amortize a 40 cycle atomic increment by doing it less frequently. Basically any value above 1 helps with the overhead reduction but too large and any source to consumer loses pipelining effects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying 👍

@zsxwing
Copy link
Member

zsxwing commented Feb 21, 2016

👍

int produced = scalarEmissionCount + 1;
if (produced == scalarEmissionLimit) {
scalarEmissionCount = 0;
this.requestMore(produced);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use the const scalarEmissionLimit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That equals the producer which is already a local/register loaded value

zsxwing added a commit that referenced this pull request Feb 21, 2016
1.x: overhead reduction for merge and flatMap
@zsxwing zsxwing merged commit d3ebd70 into ReactiveX:1.x Feb 21, 2016
@akarnokd akarnokd deleted the FlatMapRangePerfFix1x branch May 18, 2016 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants