GroupBy "time gap" Issue #844

benjchristensen · 2014-02-10T22:37:06Z

The groupBy operator has a "time gap" issue when used with subscribeOn and observeOn. This exists in Rx.Net as well and was written about at http://blogs.msdn.com/b/rxteam/archive/2012/06/14/testing-rx-queries-using-virtual-time-scheduling.aspx

However, if you introduce asynchrony in the pipeline – e.g. by adding an ObserveOn operator to the mix – you’re effectively introducing a time gap during which we’ve handed out the sequence to you, control has been released on the OnNext channel, but subscription happens at a later point in time, causing you to miss elements. We can’t do any caching of elements because we don’t know when – if ever – someone will subscribe to the inner sequence, so the cache could grow in an unbounded fashion.

In discussion with @headinthebox I have decided to alter the behavior to remove this "time gap" issue so that non-deterministic data loss does not happen for the common use cases of using observeOn and subscribeOn with GroupedObservables from groupBy.

Why? It is common to want to use observeOn or subscribeOn with GroupedObservable do process different groups in parallel.

It comes with a trade-off though: all GroupedObservable instances emitted by groupBy must be subscribed to otherwise it will block. The reason for this is that to solve the "time gap" one of two things must be done:

a) use unbounded buffering (such as ReplaySubject)
b) block the onNext calls until GroupedObservable is subscribed to and receiving the data

We can not choose (a) for the reasons given in the Rx.Net blog post because it breaks backpressure and could buffer bloat until the system fails.

In general it is an appropriate thing to expect people to subscribe to all groups, except in one case where it will be expected to work – using filter.

In this case we can solve the common case by special-casing filter to be aware of GroupedObservable. It's not decoupled or elegant, but it solves the common problem.

Thus, the trade-offs are:

Allow for non-deterministic data loss if observeOn/subscribeOn are used and expect people to learn about this by reading docs.
Behave deterministically when observeOn/subscribeOn are used but block if groups are manually skipped.

Option 2 seems to be easier for developers to run into during dev and solve than option 1 which could often show up randomly – in prod – and be difficult to figure out and solve.

The text was updated successfully, but these errors were encountered:

akarnokd · 2014-02-10T22:51:43Z

How about 1) and present a new operator to cover the parallel processing case directly.

benjchristensen · 2014-02-10T22:55:19Z

That doesn't prevent people from using observeOn on GroupedObservable, such as for rendering to a UI.

benjchristensen · 2014-02-10T22:59:06Z

It's a tough trade-off ... non-determinism when using things that should not inject non-determism ... or risk of blocking.

Another possible solution is we could special case observeOn and subscribeOn instead to not return from onNext until they have subscribed. This would mean I need a hook inside the Subscriber type that tells me once a subscription is registered. I've looked at that as well, not sure which is better yet.

benjchristensen · 2014-02-10T23:00:17Z

Actually ... modifying observeOn and subscribeOn may not result in the deadlock risks so I'm going to test that implementation.

See ReactiveX#844

See ReactiveX#844 Not completely thrilled with or 100% confident in this solution, but it does make the groupBy unit tests pass.

benjchristensen · 2014-02-11T08:16:42Z

I've submitted an attempt at a solution. The issue appears to be resolved but I don't completely like the solution nor do I trust it 100% yet.

I need to sleep and think about it again tomorrow, but I'd appreciate a review and feedback, or a better solution from someone :-)

See ReactiveX#844 Not completely thrilled with or 100% confident in this solution, but it does make the groupBy unit tests pass.

benjchristensen · 2014-02-11T22:25:29Z

There are trade-offs to solving this solution that probably are not worth it.

I'd like people's opinion on whether we should figure out how to make these unit tests pass, or if it's acceptable for them not to:

    @Test
    public void testGroupsWithNestedSubscribeOn() throws InterruptedException {
        final ArrayList<String> results = new ArrayList<String>();
        Observable.create(new OnSubscribe<Integer>() {

            @Override
            public void call(Subscriber<? super Integer> sub) {
                sub.onNext(1);
                sub.onNext(2);
                sub.onNext(1);
                sub.onNext(2);
                sub.onCompleted();
            }

        }).groupBy(new Func1<Integer, Integer>() {

            @Override
            public Integer call(Integer t) {
                return t;
            }

        }).flatMap(new Func1<GroupedObservable<Integer, Integer>, Observable<String>>() {

            @Override
            public Observable<String> call(final GroupedObservable<Integer, Integer> group) {
                return group.subscribeOn(Schedulers.newThread()).map(new Func1<Integer, String>() {

                    @Override
                    public String call(Integer t1) {
                        System.out.println("Received: " + t1 + " on group : " + group.getKey());
                        return "first groups: " + t1;
                    }

                });
            }

        }).toBlockingObservable().forEach(new Action1<String>() {

            @Override
            public void call(String s) {
                results.add(s);
            }

        });

        System.out.println("Results: " + results);
        assertEquals(4, results.size());
    }

@Test
    public void testFirstGroupsCompleteAndParentSlowToThenEmitFinalGroupsWhichThenSubscribesOnAndDelaysAndThenCompletes() throws InterruptedException {
        final CountDownLatch first = new CountDownLatch(2); // there are two groups to first complete
        final ArrayList<String> results = new ArrayList<String>();
        Observable.create(new OnSubscribe<Integer>() {

            @Override
            public void call(Subscriber<? super Integer> sub) {
                sub.onNext(1);
                sub.onNext(2);
                sub.onNext(1);
                sub.onNext(2);
                try {
                    first.await();
                } catch (InterruptedException e) {
                    sub.onError(e);
                    return;
                }
                sub.onNext(3);
                sub.onNext(3);
                sub.onCompleted();
            }

        }).groupBy(new Func1<Integer, Integer>() {

            @Override
            public Integer call(Integer t) {
                return t;
            }

        }).flatMap(new Func1<GroupedObservable<Integer, Integer>, Observable<String>>() {

            @Override
            public Observable<String> call(final GroupedObservable<Integer, Integer> group) {
                if (group.getKey() < 3) {
                    return group.map(new Func1<Integer, String>() {

                        @Override
                        public String call(Integer t1) {
                            return "first groups: " + t1;
                        }

                    })
                            // must take(2) so an onCompleted + unsubscribe happens on these first 2 groups
                            .take(2).doOnCompleted(new Action0() {

                                @Override
                                public void call() {
                                    first.countDown();
                                }

                            });
                } else {
                    return group.subscribeOn(Schedulers.newThread()).delay(400, TimeUnit.MILLISECONDS).map(new Func1<Integer, String>() {

                        @Override
                        public String call(Integer t1) {
                            return "last group: " + t1;
                        }

                    });
                }
            }

        }).toBlockingObservable().forEach(new Action1<String>() {

            @Override
            public void call(String s) {
                results.add(s);
            }

        });

        System.out.println("Results: " + results);
        assertEquals(6, results.size());
    }

benjchristensen · 2014-02-11T22:25:37Z

Here is a subscribeOn unit test that fails when PublishSubject is used if the Scheduler is slower than the time to emit onNext:

    @Test
    public void testSubscribeOnPublishSubjectWithSlowScheduler() {
        PublishSubject<Integer> ps = PublishSubject.create();
        TestSubscriber<Integer> ts = new TestSubscriber<Integer>();
        ps.subscribeOn(new SlowScheduler()).subscribe(ts);
        ps.onNext(1);
        ps.onNext(2);
        ps.onCompleted();

        ts.awaitTerminalEvent();
        ts.assertReceivedOnNext(Arrays.asList(1, 2));
    }

akarnokd · 2014-02-11T22:40:45Z

Maybe subscribeOn has to be smarter about its source (and not everyone else).
I.e., if subscribeOn detects that the source is PublishSubject, It enters a buffering mode and once the time of the actual subscription has come, replay the buffered values and resume as a regular Observer would. This is is different from my previous hack where only the first observer would get the values. Here, everyone who is on a delayed subscription would eventually get the values from the time gap.

benjchristensen · 2014-02-11T22:48:00Z

That's an interesting idea, especially since PublishSubject is definitely the culprit.

What do you think about the lack of back pressure during that buffering period?

The reason is the same as observeOn being changed - not allowing buffer-bloat unless someone consciously chooses to do buffering or windowing.

akarnokd · 2014-02-11T22:59:52Z

Bounded buffering like in observeOn seems to be a workable solution, since the source needs to be blocked until the replay catches up anyway. I'm not sure, however, that my idea is deadlock-free or not. I'll do some experiments tomorrow.

benjchristensen · 2014-02-11T23:07:05Z

Thanks, I look forward to seeing your idea. I appreciate your back-and-forth on this with me.

I ran head-on into the "time gap" (ReactiveX#844) issue while working on a stream processing use case (and new 'pivot' operator I'm writing). This is a solution. It's still not ideal as the Javadocs of BufferUntilSubscriber mention, but this is working better than nothing and does not require blocking threads. A better solution will come as part of the back pressure work where BufferUntilSubscriber will evolve to have a bounded buffer.

akarnokd · 2014-05-08T21:47:56Z

Is this settled?

benjchristensen · 2014-05-20T06:58:48Z

I believe so.

lJoublanc · 2014-05-29T08:19:24Z

Hi Ben, I found your post after worrying that I could run into this situation in my own code. It is implied in your post, but I wanted to be entirely sure: Can this problem arise if you call subscribe on the same scheduler? i.e. if I do

grpdObs subscribe { keyAndObs => 
  val (key,obs) = keyAndObs
  obs subscribe( someObserver )
 }

is it guaranteed that the notification that evokes the first (outer) subscribe will also be pushed to someObserver? in other words, that the notification on the inner subscription won't be skipped on someObserver?
Now, the issue discussed here only applies when the inner subscription is done on another scheduler? correct?

benjchristensen · 2014-05-29T16:37:53Z

As of 0.18 this problem shouldn’t exist any longer as it will buffer until the subscriber arrives.

If the subscribe is occurring synchronously, then it is never a problem, only when the subscribe happens asynchronously (such as using subscribeOn).

Now the choice of same or different scheduler is nuanced if using subscribeOn. Even if it’s the same scheduler, it could schedule it to be done later, so the question is not about what scheduler it is, it is about whether there is a delay (async queueing/scheduling) of the subscribe occurring - which typically is going to happen when using a scheduler.

Your example code would be fine, because the obs subscribe is happening synchronously. If however you changed it to the following it would be vulnerable:

grpdObs subscribe { keyAndObs => 
  val (key,obs) = keyAndObs
  obs subscribeOn(aScheduler) subscribe( someObserver )
 }

The reason it becomes vulnerable is that the function returns and groupBy continues forward while the obs subscribe then happens sometime later.

We put a solution in place for this in #975. The downside is it makes it possible to have a memory leak if groups are skipped (purposefully not subscribed to), and we don’t have a solution for that yet.

We decided that we'd rather avoid the non-deterministic data loss and risk a memory leak as that's easier to explain to people. At some point we'll need to solve the memory leak issue as well (such as emit an error, auto-unsubscribe or something similar if a group is emitted and not subscribed to).

benjchristensen · 2014-05-29T16:42:03Z

I opened #1280 to cover the side-effect of this solution.

… tests. (ReactiveX#844)

benjchristensen added a commit to benjchristensen/RxJava that referenced this issue Feb 11, 2014

Comment Out "Time Gap" Tests on GroupBy

0ab38b4

See ReactiveX#844

benjchristensen added a commit to benjchristensen/RxJava that referenced this issue Feb 11, 2014

Attempt at SubscribeOn "Time Gap" Issue

904f300

See ReactiveX#844 Not completely thrilled with or 100% confident in this solution, but it does make the groupBy unit tests pass.

benjchristensen mentioned this issue Feb 11, 2014

Attempt at SubscribeOn "Time Gap" Issue [Preview] #848

Closed

benjchristensen added a commit to benjchristensen/RxJava that referenced this issue Feb 11, 2014

Attempt at SubscribeOn "Time Gap" Issue

6a7f614

See ReactiveX#844 Not completely thrilled with or 100% confident in this solution, but it does make the groupBy unit tests pass.

benjchristensen added a commit to benjchristensen/RxJava that referenced this issue Feb 11, 2014

Attempt at SubscribeOn "Time Gap" Issue

53384cf

See ReactiveX#844 Not completely thrilled with or 100% confident in this solution, but it does make the groupBy unit tests pass.

benjchristensen mentioned this issue Feb 11, 2014

Subscribe on gap #850

Closed

benjchristensen mentioned this issue Feb 12, 2014

Version 0.17.0 Release Notes [Preview] #802

Closed

This was referenced Feb 12, 2014

Proposed solution to the time gap, using unbounded buffering. #861

Closed

Blocking buffer until experiment #864

Closed

benjchristensen mentioned this issue Feb 13, 2014

subscribeOn + groupBy #869

Merged

jbripley mentioned this issue Mar 9, 2014

Weird behavior of "groupby" #947

Closed

benjchristensen mentioned this issue Mar 20, 2014

GroupBy & Time Gap Fixes #975

Merged

benjchristensen closed this as completed May 20, 2014

benjchristensen mentioned this issue May 29, 2014

Unbounded Buffer for Time Gap Issue #1280

Closed

benjchristensen mentioned this issue Jan 3, 2015

GroupBy behaviour onCompleted #2320

Closed

benjchristensen mentioned this issue Jun 23, 2015

published GroupedObservable emitted from groupBy() only emits a single item #3036

Closed

akarnokd mentioned this issue Nov 6, 2018

GroupBy and ObserveOn race condition causes loss of elements dotnet/reactive#841

Closed

jihoonson pushed a commit to jihoonson/RxJava that referenced this issue Mar 6, 2020

Propagate clock into CircuitBreakerMetrics to allow mocked time-based…

80b632a

… tests. (ReactiveX#844)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GroupBy "time gap" Issue #844

GroupBy "time gap" Issue #844

benjchristensen commented Feb 10, 2014

akarnokd commented Feb 10, 2014

benjchristensen commented Feb 10, 2014

benjchristensen commented Feb 10, 2014

benjchristensen commented Feb 10, 2014

benjchristensen commented Feb 11, 2014

benjchristensen commented Feb 11, 2014

benjchristensen commented Feb 11, 2014

akarnokd commented Feb 11, 2014

benjchristensen commented Feb 11, 2014

akarnokd commented Feb 11, 2014

benjchristensen commented Feb 11, 2014

akarnokd commented May 8, 2014

benjchristensen commented May 20, 2014

lJoublanc commented May 29, 2014

benjchristensen commented May 29, 2014

benjchristensen commented May 29, 2014

GroupBy "time gap" Issue #844

GroupBy "time gap" Issue #844

Comments

benjchristensen commented Feb 10, 2014

akarnokd commented Feb 10, 2014

benjchristensen commented Feb 10, 2014

benjchristensen commented Feb 10, 2014

benjchristensen commented Feb 10, 2014

benjchristensen commented Feb 11, 2014

benjchristensen commented Feb 11, 2014

benjchristensen commented Feb 11, 2014

akarnokd commented Feb 11, 2014

benjchristensen commented Feb 11, 2014

akarnokd commented Feb 11, 2014

benjchristensen commented Feb 11, 2014

akarnokd commented May 8, 2014

benjchristensen commented May 20, 2014

lJoublanc commented May 29, 2014

benjchristensen commented May 29, 2014

benjchristensen commented May 29, 2014