-
Notifications
You must be signed in to change notification settings - Fork 7.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate from SynchronizedObserver to SerializedObserver #962
Migrate from SynchronizedObserver to SerializedObserver #962
Conversation
Using the "queue and lock" implementation which won our performance and production testing.
- migrate all usage to Serialized instead of Synchronized - remove implementations of SerializedObserver that lost the competition (performance and testing in production)
RxJava-pull-requests #902 SUCCESS |
Migrate from SynchronizedObserver to SerializedObserver
if (canEmit) { | ||
// we won the right to emit | ||
try { | ||
drainQueue(list); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should onError be draining the queue before sending the error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can, but if we're winning the right to emit immediately as it's doing here, it's highly unlikely there is anything in the queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, by "yes we can" I meant we can clear it ... which means skip draining it in the onError
case.
How about the enqueue/drain approach with synchronized counter increment and queue replacement as drain similar to your code? (No ConcurrentLinkedQueue.) |
Possibly, but it exposes a different problem: starvation. One thread could end up constantly looping and draining the queue and never get a chance to return and emit data that it needs to emit. I considered setting a max number of loops, but when it finishes those loops then the other issue would still occur, possibly leaving items in the queue waiting for some period of time until another event. The queue/drain model works far better when there is a single thread dedicated to draining like in |
But streams are not guaranteed to terminate. They can be silent forever like Observable.never |
In which case it would never emit anything and never queue anything.
Of course, in which case it's the next The most concerning type of use case in the current implementation is one where 2 events are emitted, one is delivered, the other is queued, then a long time passes before any further events occur. In this use case, the second The only model I can think of that solves this requires injecting extra concurrency - using a |
Could a fast producer see that it's starving another thread and proactively On Thu, Mar 13, 2014 at 5:48 PM, Ben Christensen
|
How would you do that, by keeping track of the number of And how would it steal the |
I don't know if this issue will remain in a few weeks if we end up implementing the continuation back-pressure solution, as the inner queue would be bounded to a relatively small size (128? 512? 1024?) and the origin threads would end up parking themselves if they fill their buffers. This would naturally cause a form of rescheduling and no single thread would starve out another as all would be required to limit the amount they emit. |
Glad to see people taking notice of the deadlock issue. Here are some tweaks that might help the queue-and-counter case:
|
Hi @Strilanc, thanks for getting involved
Are you able to get it to perform as well as the queue-and-lock implementation?
What do you mean by this? Here is the use case that needs to be solved: https://github.com/Netflix/RxJava/blob/master/rxjava-core/src/main/java/rx/operators/OperatorMerge.java#L39
We have chosen the other tradeoff of possibly delayed delivery right now so this doesn't happen. We discuss this tradeoff in the comments above. Your proposed solution is one of the possibilities discussed but involves adding additional concurrency which has traditionally been unwanted (for cases like
Definitely. As part of the continuation implementation for back pressure I expect us to end up using optimized ring buffers since we will have fixed size buffers. We may be able to use a multi-producer-single-consumer ring buffer in some places, but in this case where we're stealing producer threads it's not a traditional producer/consumer use case so likely will still need multi-producer-multi-consumer. |
Hey @benjchristensen ,
I used the optimistic-CAS trick as part of fixing a performance problem in an Objective-C app that used ReactiveCocoa. However, that was essentially a single threaded case and I can't test on anything even approaching the scale NetFlix can.
You're right that it doesn't apply to that case. I was thinking of subscribing to a BehaviorSubject, where it would be wasteful to do the serialization once per observer instead of once inside the subject.
I think those comments are all about passing the draining to another producer. The suggestion I'm giving is to transfer the work to the thread pool. This avoids issues where the queue stops until another producer comes along, and the corresponding tricky synchronization involved in that.
Yes, it is a bit of a unique case. The consumer is only ever in one place, but that place keeps teleporting around. |
I don't think the Netflix scale impacts this very much, as many application use cases regardless of horizontal or vertical scale apply. There are 3 primary use cases that occur when serializing:
This needs to be optimized for as it's very common and the JVM does a great job with locks of recognizing there is no contention and performing well.
This happens when tight for/while loops are emitting and being merged. It's somewhat of a bad use case, but it happens. Locks perform rather well per-thread since they block the other threads.
This case is easy with locks, queues or CAS and all perform about the same.
That makes sense.
Agreed. It will be far easier in this case, and much more similar to how the queue-drain model works in In Rx it's fairly normal for the producer to move across threads (as long as the emissions are serialized) since merging of streams happens all the time or other things like That said, if the back-pressure work does not solve the trade-offs (thread-starvation or notification delay) we will likely end up pursuing the fallback solution of moving work to a |
I had done this optimization on the state machine one but hadn't on the queue and counter implementation as I'm not quite sure the race condition is safe. It seems it should be good as it drains AFTER emitting, since this only occurs when count is 0, but I get multiple unit test failures when I attempt this optimization. I haven't spent the time to figure out why. Code updated to not queue on non-contended case here: https://github.com/benjchristensen/RxJava/blob/serialize-implementations/rxjava-core/src/main/java/rx/observers/SerializedObserverViaQueueAndCounter.java#L50 With this optimization here are the before and after on performance:
However, the queue-and-lock implementation is still faster:
... and this optimization makes the contended cases slower: * Run: 10 - 4,300,229 ops/sec
* Run: 11 - 4,395,995 ops/sec
* Run: 12 - 4,551,550 ops/sec
* Run: 13 - 4,443,235 ops/sec
* Run: 14 - 4,158,475 ops/sec
*
* ... after "optimizations"
*
* Run: 10 - 2,857,008 ops/sec
* Run: 11 - 3,093,778 ops/sec
* Run: 12 - 4,009,758 ops/sec
* Run: 13 - 3,094,435 ops/sec
* Run: 14 - 3,166,119 ops/sec Even if the performance changes made it better than queue-and-lock I am not certain this is thread-safe and there are 6 unit tests across the codebase now failing, whereas queue-and-lock still performs better and all tests pass. I will explore this more as we finish experimenting with back pressure as that will define how |
Thanks for continuing on this. I'm not surprised we can't beat the compiler :-) I'm curious why the CAS operations aren't similarly optimized. This exploration isn't done yet, and I'll come back to it in coming weeks. We don't necessarily need to be faster than locks to end up using CAS, there are other benefits to CAS over locks. |
I'm seeing a nasty bug related to this I believe where the use of merge + SerializedSubscriber has the result of onCompleted being called before onNext has finished. I thought I'd mention it now while I dig around trying to get a unit test in case it rings bells for those of you that a familiar with the changed code. My use case worked as expected in 0.17.0 then not in 0.17.1 at all. Unfortunately the code is too complex to present here without adding stupid amounts of noise so I will try to distill a unit test. |
@benjchristensen @davidmoten I think the problem has to do with the queue not being drained properly. There's a copy-away-to-avoid-locking thing happening, but Actually, I'm kind of surprised any concurrent completions are working. If terminated is set while the queue is being concurrently drained, I don't think anything forwards the completion ever! There's also a race on I think your particular bug is because of this:
There's a race with an Still, the hand-off process between producers is wrong in the current queue+lock implementation. It's rotten enough to consider starting from scratch to avoid contamination from the same subtle mistakes. There definitely need to be tests that detect these. |
Thanks for that, I'll stop chasing a unit test. I'll leave it to the rxjava On 20 March 2014 11:22, Craig Gidney [email protected] wrote:
|
It happens right here: https://github.com/Netflix/RxJava/blob/master/rxjava-core/src/main/java/rx/observers/SerializedObserver.java#L160 |
Can one of you please provide me a test case? The only way I can cause a problem so far is if the source Observable breaks the Rx contract. Here are examples where threads are racing each other: @Test
public void testConcurrency() {
Observable<Integer> o = Observable.create(new OnSubscribe<Integer>() {
@Override
public void call(final Subscriber<? super Integer> s) {
Schedulers.newThread().schedule(new Action1<Inner>() {
@Override
public void call(Inner inner) {
for (int i = 0; i < 10000; i++) {
s.onNext(1);
}
s.onCompleted();
}
});
}
});
for (int i = 0; i < 1000; i++) {
Observable<Integer> merge = Observable.merge(o, o, o);
TestSubscriber<Integer> ts = new TestSubscriber<Integer>();
merge.subscribe(ts);
ts.awaitTerminalEvent();
assertEquals(1, ts.getOnCompletedEvents().size());
assertEquals(30000, ts.getOnNextEvents().size());
List<Integer> onNextEvents = ts.getOnNextEvents();
System.out.println("onNext: " + onNextEvents.size() + " onCompleted: " + ts.getOnCompletedEvents().size());
}
} Another variant this time with sleeps to cause random concurrent behavior: @Test
public void testConcurrencyWithSleeping() {
Observable<Integer> o = Observable.create(new OnSubscribe<Integer>() {
@Override
public void call(final Subscriber<? super Integer> s) {
Schedulers.newThread().schedule(new Action1<Inner>() {
@Override
public void call(Inner inner) {
for (int i = 0; i < 100; i++) {
s.onNext(1);
try {
Thread.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
s.onCompleted();
}
});
}
});
for (int i = 0; i < 100; i++) {
Observable<Integer> merge = Observable.merge(o, o, o);
TestSubscriber<Integer> ts = new TestSubscriber<Integer>();
merge.subscribe(ts);
ts.awaitTerminalEvent();
assertEquals(1, ts.getOnCompletedEvents().size());
assertEquals(300, ts.getOnNextEvents().size());
List<Integer> onNextEvents = ts.getOnNextEvents();
System.out.println("onNext: " + onNextEvents.size() + " onCompleted: " + ts.getOnCompletedEvents().size());
}
} Both of those work correctly. I am however able to get it to behave poorly when the Rx contract is broken: @Test
public void testConcurrencyWithBrokenContract() {
Observable<Integer> o = Observable.create(new OnSubscribe<Integer>() {
@Override
public void call(final Subscriber<? super Integer> s) {
Schedulers.newThread().schedule(new Action1<Inner>() {
@Override
public void call(Inner inner) {
for (int i = 0; i < 10000; i++) {
s.onNext(1);
}
s.onCompleted();
for (int i = 0; i < 100; i++) {
s.onNext(1);
}
s.onCompleted();
for (int i = 0; i < 100; i++) {
s.onNext(1);
}
}
});
}
});
for (int i = 0; i < 1000; i++) {
Observable<Integer> merge = Observable.merge(o, o, o);
TestSubscriber<Integer> ts = new TestSubscriber<Integer>();
merge.subscribe(ts);
ts.awaitTerminalEvent();
assertEquals(1, ts.getOnCompletedEvents().size());
assertEquals(30000, ts.getOnNextEvents().size());
List<Integer> onNextEvents = ts.getOnNextEvents();
System.out.println("onNext: " + onNextEvents.size() + " onCompleted: " + ts.getOnCompletedEvents().size());
}
} This emits: The fix for this is to not decrement the count if
|
Just merged the fix for an Observable that emits multiple |
Ben, this serialized subscriber complicates my world for this reason: I am merging asynchronous observables. Each observable has a resource that must remain open till all onNext calls have finished for that observable. The OnSubscribe for the observable get a fresh resource, calls onNext repeatedly then closes the resource and calls onComplete. Because the SerializedSubscriber has buffered stuff asynchronously I am encountering the situation where the resource has been closed before the onNext have finished and I get errors. Unless I'm mistaken this is a most undesirable effect that will complicate the hell out of writing Observables. |
@benjchristensen My guess was that if a subject sent a re-entrant onCompleted in its serialized onNext callback, that the onComplete would not forward. Something like:
(I apologize if I got some of the method names wrong.) |
@davidmoten It sounds like the problem is that the resource is being closed by the producer, instead of the consumer, when it's the consumer that relies upon it. Would it also break if I don't know the conventions of RxJava well enough to suggest what should be done. My guess would be that the consumer should be responsible for disposing the subscription, even if it completes, and so you'd cleanup-on-dispose rather than cleanup-on-complete-sent. |
@Strilanc good description. not sure about delay, I'll look it up. Cleanup on completion I suspect is achieved with the |
@davidmoten I don't know the specifics of the situation so this might be a ridiculous suggestion in context, but what about releasing producing rights after on-complete is sent but only release the actual resource based on a subscribed reference count? So, consumer1 subscribes and the resource is allocated. The notifications get sent, and the resource is now no longer "busy" but does still have to exist. Consumer2 subscribes, and does not wait because the resource is not "busy" (despite already being allocated). Consumer2 unsubscribes, and the resource goes back to "allocated but not busy". Consumer1 unsubscribes and the resource gets released. |
@davidmoten Once an
@Strilanc The Rx contract is clear that once an
The The From what I can tell the difference in behavior you're seeing is coincidental. Before the @davidmoten What is the use case you have where emitted events depend on their original |
@benjchristensen https://github.com/benjchristensen thanks for your The use case was resource = jdbc connection and item = ResultSet. The first Thanks again for your time and to @Strilanc, twas all me and was easily On 22 March 2014 01:17, Ben Christensen [email protected] wrote:
|
Always keep a copy of http://go.microsoft.com/fwlink/?LinkID=205219 next to your keyboard (which reminds me, we should do a refresh of this, and for every language). |
@davidmoten Glad it was easy to resolve. |
as per discussion at ReactiveX#962 (comment)
@benjchristensen With reference to this problem: The most concerning type of use case in the current implementation is one where 2 events are emitted, one is delivered, the other is queued, then a long time passes before any further events occur. In this use case, the second onNext is just sitting in a queue waiting to be delivered. I'm encountering this problem now using merge. My use case is described here . My assumption at this point is that the problem is here to stay and I'm looking for the least objectionable way of handling it. Do we need to be able to parameterize the use of a SerializedSubscriber so that we can avoid the behaviour if we choose to? In SerializedSubscriber once winning the right to emit we currently drain the queue before the onNext is called. Is there any scope for draining the queue some limited/configurable number of times AFTER the onNext is called as well? I certainly see problems with this as well because the last drain of the queue still offers the chance of delaying an incoming onNext. For certain use cases however this could reduce the likelihood of the delay happening. Perhaps some probabilistic diminishment is the best we can hope for. |
@davidmoten If an event sits in the queue until the next event, instead of eventually being forwarded even if another event doesn't come along, it's a bug. Are you actually experiencing this bug? |
@Strilanc Yeah I'm experiencing it as described here. My first para above quotes Ben on this one and is known issue. I realize on reviewing the conversation above that my rambly ideas seem to have been talked about already. I thought I'd push it along a little seeing as I'm bumping into the issue now (though not without a somewhat ugly workaround). It strikes me as a serious side-effect of work that is trying to ameliorate deadlocks and blocking. |
I have created #998 for us to continue this discussion and determine how to move forward. |
This pull request replaces use of
SynchronizedObserver
withSerializedObserver
.Why?
1) Deadlocks
Holding a lock while emitting notifications (
onNext
,onCompleted
,onError
) allows for deadlocks if the event results in a cycle back to the parent.While testing RxJava 0.17.0 in Netflix production we ran into one of these. The vulnerability has existed all along but we finally hit it.
This issue has also been reported before such as: http://twistedoakstudios.com/blog/Post8424_deadlocks-in-practice-dont-hold-locks-while-notifying
2) Blocking Threads
The use of
synchronized
can block threads. If it's used in areas such as modifying a data structure this can be okay. When emitting a notification however it is a problem as the result of anonNext
can take a non-deterministically long time to complete. This means any other thread trying to emit will be blocked.If the source threads are event loops (such as Vert.x or Netty) this will block the event loops.
For example, if two network calls off two Netty event loops are being merged (such as via
flatMap
) and one of them does further slow processing that causesonNext
to be slow, it will block the otheronNext
which blocks the event loop and prevents any further IO on that thread. This is a significant problem for system scale and breaks the promise of Rx being a non-blocking, reactive library.Solution
The
synchronize
,SynchronizedObserver
andSynchronizedSubscriber
operator and classes have been deprecated. They are replaced byserialize
,SerializedObserver
andSerializedSubscriber
.The
SerializedObserver
still ensures only a single thread can emitonNext
,onCompleted
, oronError
at a time but does not hold a lock while doing so. Instead of blocking threads it will accept the incoming events into a buffer. Thus, it becomes an asynchronous operator.The
merge
operator (which impactsflatMap
) now usesSerializedObserver
, along with any other place in RxJava that needed synchronization.Implementation
3 implementatations were written and tested:
Performance testing revealed:
The state machine solution was tested in production but caused performance problems, most likely due to the immense object allocation it needs to do.
The elegant "queue and counter" solution does not perform well enough in the non-contended case.
The "queue and lock" model performs well in the non-contended case and under contention, despite not being a very elegant solution and requiring the use of mutex locks for the state changes (but it does not hold the locks during notification).
Considerations
This does allow unbounded buffer growth, the same as
observeOn
andzip
instead of blocking the producer threads.Conclusion
The implementation in this pull request can and likely will be improved over time. The other implementations are purposefully being shown to allow others to provide further insight on how to do this better.
This change is important to ensure RxJava is non-blocking and our canary testing of this change in the Netflix production environment suggests this change is both performant and functional.