java.util.Comparator is not suitable for distinct and distinctUntilChanged #395

samuelgruetter · 2013-09-20T08:50:58Z

In C#, distinct and distinctUntilChanged take an IEqualityComparer (which provides Equals and GetHashCode).
That's not the same as java.util.Comparator (which provides compare).

Or mathematically speaking: There are two kinds of comparators:

Those which define a total ordering: java.util.Comparator and C#'s IComparer
Those which define an equivalence relation: C#'s IEqualityComparer, RxJava's Func2<T, T, Boolean>, and unfortunately no standard Java interface

The problem is that for distinct and distinctUntilChanged, we're not interested in an order on the elements, but only in equality.

So it would be better to use Func2<? super T, ? super T, Boolean> equality (just like sequenceEqual) instead of Comparator<U> equalityComparator.

The text was updated successfully, but these errors were encountered:

those with custom equality are not yet there because of issue ReactiveX#395

abersnaze · 2013-09-20T13:07:36Z

The 'Func2' sounds fine unless 'getHashCode()' is necessary to make 'distinct()' fast. In that case I think making our own 'EqualComparator' would be the way to go.

benjchristensen · 2013-09-20T14:26:08Z

A Comparator returning 0 means the two items are equal thus it will work for this use case.

int compare(T o1, T o2)

http://docs.oracle.com/javase/6/docs/api/java/util/Comparator.html#compare(T,%20T)

returns a negative integer, zero, or a positive integer as the first argument is less than, equal to, or greater than the second.

Moving to EqualityComparator would limit the return type to boolean as opposed to -1, 0, 1 as possible return types, but Comparator can represent equality as well, and both approaches let the developer define equality (using object reference, equals, hashCode, deep comparison, etc) instead of automatically using just the equals method.

Supporting Comparator (and Comparable object types) makes it so people can just reuse the same logic as used for ordering.

Checking for equality is simpler than checking for ordering (binary instead of trinary response), but if the Comparator returns 0 it means the objects are equal.

So if we're not going to use Comparator it should be because we want a simpler return type, not because it doesn't work. That said, I can't remember ever seeing something other than Comparator used for this type of behavior in a Java library because if not using Comparator then the library can just call equals on each Object. Why would something else be needed?

For example, the Set implementations in Java ensure uniqueness and they use the Object.equals method on objects to determine equality. Why would we need to do anything different than the Java collections libraries that use either equals or Comparator?

jmhofer · 2013-09-20T14:46:48Z

A Comparator (per javadoc) is a

comparison function, which imposes a total ordering on some collection of objects.

We don't actually need the total ordering for our purposes in distinct or distinctUntil, only the fact that it incidentally defines equality as well. - But still, if someone defines a Comparator for equality, not making it define a total ordering, he or she is basically misusing it.

As an aside, Comparators that are inconsistent with equals are dangerous beasts. It's way too easy to use them for collections like TreeSet which simply won't work in that case.

Also, at least for distinct, there is indeed an efficiency problem when (mis)using Comparator as we do now: There is no way to create an efficient set without either a total ordering or some kind of hash code. All the Set implementations in Java always depend on at least one of these two concepts afaik.

Imho, for doing this right, we actually need a real Java equivalent to IEqualityComparer, which defines both the concepts of equality and a hash code. - Or we could leave that whole thing out of the API and make people write their own wrapper classes for that use case, which is not very convenient, of course.

benjchristensen · 2013-09-20T14:53:06Z

The compare method on a Comparator can use hashcode and equals just the same as a new interface would allow, but I accept that having a trinary response is confusing when only a binary boolean is needed.

My vote is for Func2<? super T, ? super T, Boolean>, not creating a new special type for this very limited use case.

jmhofer · 2013-09-20T15:05:11Z

The compare implementation can of course use whatever it likes internally. However, our code can't ever get a hashCode from a Comparator. It could get one from a new interface and use it to make distinct efficient, for example.

I agree, though, that it's probably a very limited use case. So I'm ok with Func2, too.

samuelgruetter · 2013-09-20T15:35:26Z

Or maybe distinctUntilChanged taking a Func2<? super T, ? super T, Boolean>, and no custom comparison for distinct, to keep it simple.

benjchristensen · 2013-09-20T16:53:28Z

use it to make distinct efficient, for example.

This is a valid concern, iterating the entire list is not good (though with a Comparator we could maintain a sorted list and do a binary search!).

Another consideration for API design, an interface with 2 methods like IEqualityComparer can not be implemented by lambdas as it is not a functional interface with a single method (lambda body).

Is this use case and performance issue strong enough to have a non-functional interface that requires an anonymous inner class to implement two methods? Also, implementing hashCode is far more obnoxious than implementing equals.

For performance reasons and avoiding hashCode implementations, would Comparator having ordering in mind actually be the right choice so we can test for equality using a binary search algorithm?

How about this change to use TreeSet and Comparator and not require iterating the full list?

            final Subscription sourceSub = source.subscribe(new Observer<T>() {

                // use TreeSet instead of ArrayList
                private final TreeSet<U> emittedKeys = new TreeSet<U>(equalityComparator);

                @Override
                public void onCompleted() {
                    observer.onCompleted();
                }

                @Override
                public void onError(Throwable e) {
                    observer.onError(e);
                }

                @Override
                public void onNext(T next) {
                    U nextKey = keySelector.call(next);
                    if(emittedKeys.add(nextKey)) {
                        observer.onNext(next);
                    }
                }
            });

Unit Test:

        @Test
        public void testDistinctOfObjectWithComparator() {
            Observer<? super RandomObject> observer = mock(Observer.class);
            RandomObject r1 = new RandomObject(1, "one");
            RandomObject r2 = new RandomObject(2, "two");
            RandomObject r2b = new RandomObject(2, "twoB");
            Observable<RandomObject> src = from(r1, r1, r2, r1, r2, r2b);
            create(distinct(src, new Comparator<RandomObject>() {

                @Override
                public int compare(RandomObject o1, RandomObject o2) {
                    int i = o1.i - o2.i;
                    if(i != 0) {
                        return i;
                    }

                    return o1.v.compareTo(o2.v);
                }
            })).subscribe(observer);

            InOrder inOrder = inOrder(observer); 
            inOrder.verify(observer, times(1)).onNext(r1);
            inOrder.verify(observer, times(1)).onNext(r2);
            inOrder.verify(observer, times(1)).onNext(r2b);
            inOrder.verify(observer, times(1)).onCompleted();
            verify(observer, never()).onError(any(Throwable.class));
            inOrder.verifyNoMoreInteractions();
        }

jmhofer · 2013-09-20T17:05:42Z

For TreeSet to work as expected, the Comparator has to be consistent with equals, and in that case, we don't even need it in the first place.

It's not correct in Java to implement equals without implementing hashCode, too.

abersnaze · 2013-09-20T17:18:38Z

This method 'java.util.Objects.hash(Object...)' makes computing hash codes much easier.

benjchristensen · 2013-09-20T18:33:05Z

@jmhofer Yes, if it's not consistent with equals it won't work, but if they have equals they don't need to use Comparator and why would they make them be different?

@abersnaze Yes it does, but I'd like to avoid an API that requires developers to do that and know about some obscure API if we don't need it.

Question about using IEqualityComparer, would it need to calculate the hashCode every time the object is touched since it can't cache it (as it doesn't know if mutation has occurred in an object)?

jmhofer · 2013-09-21T09:09:49Z

If the observable objects already have an equals/hashCode implementation, then the distinct call without any additional parameters will do. Afaik the additional distinct signature using an explicit IEqualityComparer in Rx.NET is for cases where you want to use a different equals from the default one for distinguishing between the observable objects.

Java doesn't really support the concept of multiple equivalence relations per class. It would need something like the IEqualityComparer in its standard library for that, at least (and probably support that interface in the collections library, too). This was the reason why I initially left away the implementation for that specific distinct signature. I still don't think it's much of a use case...

My current implementation is a bit inconsistent: It takes a Comparator but doesn't care for or use the total ordering that should be defined by that Comparator. It just uses the equivalence relation defined by it, and has to live without being able to compute hash codes.

The more I think about it, the more I agree with @samuelgruetter that the Comparator is a bad choice for this. distinct has no need to impose a total ordering. It just needs an equivalence relation. Given a total ordering, we could make distinct efficient, of course, without using hash codes (just not via the Java default set implementations). But I don't see much value in that. It's also something completely different from what Rx.NET does here.

So basically, I'm for using a simple Func2 for distinctUntilChanged and maybe leaving that away from distinct due to efficiency.

If we created an interface like IEqualityComparer, we'd have to impose the same restrictions and contracts on its equals and hashCode methods as Object does. Mutable fields used in hash code computations break the standard Java collection implementations based on hash codes. Therefore I see no strong need to make distinct work with that special case either.

benjchristensen · 2013-12-06T01:18:17Z

We ended up removing Comparator for these operators ... so closing.

those with custom equality are not yet there because of issue ReactiveX#395

Removing these fairly recently added overloads as they turn out to not be the best approach. Discussion ongoing as to how to implement them at ReactiveX#395

samuelgruetter added a commit to samuelgruetter/RxJava that referenced this issue Sep 20, 2013

add distinct and distinctUntilChanged, but

5a813d6

those with custom equality are not yet there because of issue ReactiveX#395

benjchristensen mentioned this issue Sep 21, 2013

Removing 'comparator overloads of distinct` #400

Merged

abersnaze mentioned this issue Sep 24, 2013

combineWithMostRecent #405

Closed

zsxwing mentioned this issue Sep 26, 2013

Implemented the 'Contains' operator #410

Merged

benjchristensen closed this as completed Dec 6, 2013

rickbw pushed a commit to rickbw/RxJava that referenced this issue Jan 9, 2014

add distinct and distinctUntilChanged, but

da85d3b

those with custom equality are not yet there because of issue ReactiveX#395

rickbw pushed a commit to rickbw/RxJava that referenced this issue Jan 9, 2014

Removing 'comparator overloads of distinct`

e5e9626

Removing these fairly recently added overloads as they turn out to not be the best approach. Discussion ongoing as to how to implement them at ReactiveX#395

duarten pushed a commit to duarten/RxJava that referenced this issue Jan 12, 2014

Removing 'comparator overloads of distinct`

bb7b7a3

Removing these fairly recently added overloads as they turn out to not be the best approach. Discussion ongoing as to how to implement them at ReactiveX#395

akarnokd mentioned this issue Jun 9, 2016

1.x: added distinctUntilChanged(comparator) #4001

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

java.util.Comparator is not suitable for distinct and distinctUntilChanged #395

java.util.Comparator is not suitable for distinct and distinctUntilChanged #395

samuelgruetter commented Sep 20, 2013

abersnaze commented Sep 20, 2013

benjchristensen commented Sep 20, 2013

jmhofer commented Sep 20, 2013

benjchristensen commented Sep 20, 2013

jmhofer commented Sep 20, 2013

samuelgruetter commented Sep 20, 2013

benjchristensen commented Sep 20, 2013

jmhofer commented Sep 20, 2013

abersnaze commented Sep 20, 2013

benjchristensen commented Sep 20, 2013

jmhofer commented Sep 21, 2013

benjchristensen commented Dec 6, 2013

java.util.Comparator is not suitable for distinct and distinctUntilChanged #395

java.util.Comparator is not suitable for distinct and distinctUntilChanged #395

Comments

samuelgruetter commented Sep 20, 2013

abersnaze commented Sep 20, 2013

benjchristensen commented Sep 20, 2013

jmhofer commented Sep 20, 2013

benjchristensen commented Sep 20, 2013

jmhofer commented Sep 20, 2013

samuelgruetter commented Sep 20, 2013

benjchristensen commented Sep 20, 2013

jmhofer commented Sep 20, 2013

abersnaze commented Sep 20, 2013

benjchristensen commented Sep 20, 2013

jmhofer commented Sep 21, 2013

benjchristensen commented Dec 6, 2013