-
Notifications
You must be signed in to change notification settings - Fork 7.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PublishSubject's HashMap makes groupBy and such non-deterministic. #282
Comments
Short Version: It looks like a bug with Long Version: PublishSubject will send events through to whatever subscribers it currently has and not replay past events as new subscribers arrive. This is why the The As to the issue you're seeing I think it's not I have replicated the issue and see this in my output:
Note how the observers count increases as each group is triggered. The first observer is the Removing the
I don't have time tonight to figure out what needs to change with Regarding the given sample code, since the The code would need to look more like this: ConnectableObservable<Event> es = connectToEventStream().publish();
es.groupBy(event -> event.source).subscribe(eventStream -> {
eventStream.subscribe(event -> System.out.println(event.message));
});
es.connect(); I also generally recommend not doing nested subscribes as the control of the subscriptions is lost. ConnectableObservable<Event> es = connectToEventStream().publish();
es.groupBy(event -> event.source).mapMany(eventGroupedObservable -> {
return eventGroupedObservable.map(event -> {
return "Source: " + event.source + " Message: " + event.message;
});
}).subscribe(outputMessage -> {
System.out.println(outputMessage);
});
es.connect(); Are you using Following is the full test case I built (in normal Java so it fits into the Java 6 project and not Java 8 so it is far more verbose): package rx.subjects;
import static org.junit.Assert.*;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
import org.junit.Test;
import rx.Observable;
import rx.Observer;
import rx.Subscription;
import rx.observables.ConnectableObservable;
import rx.observables.GroupedObservable;
import rx.subscriptions.Subscriptions;
import rx.util.functions.Func1;
public class TestPublishSubject {
@Test
public void testPublishCount() throws Exception {
final AtomicInteger counter = new AtomicInteger();
final CountDownLatch latch = new CountDownLatch(1);
int count = 100;
ConnectableObservable<Event> es = connectToEventStream(count).publish();
// Observable<Event> es = connectToEventStream(count);
es.groupBy(new Func1<Event, Integer>() {
@Override
public Integer call(Event e) {
return e.source;
}
}).mapMany(new Func1<GroupedObservable<Integer, Event>, Observable<String>>() {
@Override
public Observable<String> call(GroupedObservable<Integer, Event> eventGroupedObservable) {
System.out.println("GroupedObservable Key: " + eventGroupedObservable.getKey());
return eventGroupedObservable.map(new Func1<Event, String>() {
@Override
public String call(Event event) {
return "Source: " + event.source + " Message: " + event.message;
}
});
};
}).subscribe(new Observer<String>() {
@Override
public void onCompleted() {
latch.countDown();
}
@Override
public void onError(Exception e) {
e.printStackTrace();
latch.countDown();
}
@Override
public void onNext(String outputMessage) {
System.out.println(outputMessage);
counter.incrementAndGet();
}
});
es.connect();
latch.await(5000, TimeUnit.MILLISECONDS);
assertEquals(count, counter.get());
}
public static Observable<Event> connectToEventStream(final int count) {
return Observable.create(new Func1<Observer<Event>, Subscription>() {
@Override
public Subscription call(final Observer<Event> observer) {
System.out.println("*** Subscribing to EventStream ***");
new Thread(new Runnable() {
@Override
public void run() {
for (int i = 0; i < count; i++) {
Event e = new Event();
e.source = i % 2;
e.message = "Event-" + i;
observer.onNext(e);
try {
Thread.sleep(50);
} catch (Exception ex) {
// ignore
}
}
}
}).start();
return Subscriptions.empty();
}
});
};
public static class Event {
int source;
String message;
@Override
public String toString() {
return "Event => source: " + source + " message: " + message;
}
}
} |
Thanks for the elaborate reply. From your answer I draw the conclusion that both PublishSubject and groupBy contain an error. PublishSubject
ConcurrentHashMap.values however does not return a snapshot of the map's values:
And as quoted before, the iterator does not make guarantees about including or excluding items that are added to the map after creation of the iterator (from the same doc):
Those two properties combined allow PublishSubject.onNext to sometimes include observers that are added while iterating. And, at least in the implementation I am using (OpenJDK8), it does this unpredictably, presumably based on hash values. I guess this bug should be fixed either by always including the newly added observers, or by always excluding them. You say they should be excluded. I think this does sound like the most intuitive behaviour, but maybe it would be more practical to opt for including them, as for example it would directly fix the problem with groupBy. groupBy randomIntSource().groupBy(x -> x%2).mapMany(subsource -> {
Counter counter = new Counter();
return subsource.map(x -> counter);
}).subscribe(counter -> counter.increment()); It would currently create a new random source for the even numbers and one for the odds, though my first expectation would be that it would allow to count the amount of evens and odds in a single random source. About my example code in the first post, I indeed forgot the
That was just to keep the example short. In real code I usually have one observable that reads incoming events and then I Thanks for the tip to use mapMany, I had not used it before. When more group-dependent variables are used in the observer I guess mapMany becomes less convenient though, and nesting subscribes becomes the more legible way. |
Why do you consider the following to be a bug, and how does this have anything to do with the
This is done in a thread-safe manner as that is the point of The nature of If the pub/sub is not being concurrently modified then none of this applies. Here is a flow showing concurrent subscriptions: |
I would suggest you re-consider as you lose the subscription and it can result in memory leaks, observable sequences never closing and other such behavior. You can read others saying the same thing here: http://www.introtorx.com/content/v1.0.10621.0/07_Aggregation.html
|
Can you give me a test case when an
That is the bug with |
Please review these changes: #283 This unit test in particular was derived from this discussion: https://github.com/Netflix/RxJava/pull/283/files#L0R365 |
That commit looks like a good step on first impression, I will scrutinize it tomorrow or Monday. I will also write an example for PublishSubject. |
Hi, Here is an example that shows the unpredictable results of PublishSubject. Implemented once using nested subjects and once using mapMany. It does use lambda expressions, I hope that's not a problem for you. |
Thanks for those code examples. I'm able to replicate the issue and working on a fix. |
I updated PublishSubject to take snapshots of the subscribed observers before iterating so that it behaves deterministically and only emits value to observers already subscribed once onNext starts. This makes the nested subscription behavior deterministic. What it was doing before was allowing new subscriptions from the nested subscribes to get into the iteration ... and that is based on the internal data structure and likely as you suggest the hashing of the object id (which is the only variant that changes between each run). The unit test for this is at https://github.com/Netflix/RxJava/pull/288/files#L0R479 The specific fix for onNext is this line: https://github.com/Netflix/RxJava/pull/288/files#L0R162 |
I first looked into how the Rx.Net implementation of groupBy works (which took longer than I wanted because of some bugs in there, more on this below), and then read your committed code (#284) and tested it. I have four comments on your code:
Rx.Net
Behaviour of groups after unsubscribing the primary observer |
@zsxwing This is another if you have the interest that could use someone's attention and is rather important. |
@benjchristensen I'll try to handle it. |
Thank you @zsxwing |
Closing this and it will be addressed via #570 |
- changed to take snapshot of observers.values() before iterating in onNext/onError/onCompleted so that nested subscriptions that add to observers can't change the values() iteration - single-threaded nested subscriptions are now deterministic - multi-threaded subscriptions will no longer be allowed to race to get into an interating onNext/onError/onCompleted loop, they will always wait until the next - also improved terminal state behavior when subscribing to a PublishSubject that has already received onError/onCompleted ReactiveX#282
ReactiveX#282 Refactored to maintain a single subscription that propagates events to the correct child GroupedObservables.
ReactiveX/RxJava#282 Refactored to maintain a single subscription that propagates events to the correct child GroupedObservables.
In a few cases already I noticed unexpected and non-deterministic behaviour occurring when subscribing (directly or via a chain of observables) to a PublishSubject source. This happens when subscribing from within an onNext method that is directly or indirectly triggered by that same source. The newly subscribed observer may or may not directly receive the onNext call from the same event. An example where this is annoying:
In this example the first event of each source might be skipped and not be logged, but other times it will work fine. To me this seems undesired behaviour. There may be cases where it actually is preferable that the current item will be skipped when subscribing from an onNext method, but this happening unpredictably is never a good idea.
The cause of the unpredictability is the implementation of Map that is used in PublishSubject, which when iterating on the map's values sometimes includes new items:
And supposedly whether an item will be iterated over or not depends on the hashes of the subscriptions, thus totally unpredictable. A sensible option would be to use a different implementation of Map that does iterate over any items that are added during the iterating loop, but it looks like this implementation then has to be written or found first as java seems not to provide anything like this.
As a quick hack I added this code to the PublishSubject, but I considered this solution too ugly to be worth a pull request (commit a7fc861):
The text was updated successfully, but these errors were encountered: