Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.util.Comparator is not suitable for distinct and distinctUntilChanged #395

Closed
samuelgruetter opened this issue Sep 20, 2013 · 12 comments

Comments

@samuelgruetter
Copy link
Contributor

In C#, distinct and distinctUntilChanged take an IEqualityComparer (which provides Equals and GetHashCode).
That's not the same as java.util.Comparator (which provides compare).

Or mathematically speaking: There are two kinds of comparators:

  • Those which define a total ordering: java.util.Comparator and C#'s IComparer
  • Those which define an equivalence relation: C#'s IEqualityComparer, RxJava's Func2<T, T, Boolean>, and unfortunately no standard Java interface

The problem is that for distinct and distinctUntilChanged, we're not interested in an order on the elements, but only in equality.

So it would be better to use Func2<? super T, ? super T, Boolean> equality (just like sequenceEqual) instead of Comparator<U> equalityComparator.

samuelgruetter added a commit to samuelgruetter/RxJava that referenced this issue Sep 20, 2013
those with custom equality are not yet there because of issue ReactiveX#395
@abersnaze
Copy link
Contributor

The 'Func2' sounds fine unless 'getHashCode()' is necessary to make 'distinct()' fast. In that case I think making our own 'EqualComparator' would be the way to go.

@benjchristensen
Copy link
Member

A Comparator returning 0 means the two items are equal thus it will work for this use case.

int compare(T o1, T o2)

http://docs.oracle.com/javase/6/docs/api/java/util/Comparator.html#compare(T,%20T)

returns a negative integer, zero, or a positive integer as the first argument is less than, equal to, or greater than the second.

Moving to EqualityComparator would limit the return type to boolean as opposed to -1, 0, 1 as possible return types, but Comparator can represent equality as well, and both approaches let the developer define equality (using object reference, equals, hashCode, deep comparison, etc) instead of automatically using just the equals method.

Supporting Comparator (and Comparable object types) makes it so people can just reuse the same logic as used for ordering.

Checking for equality is simpler than checking for ordering (binary instead of trinary response), but if the Comparator returns 0 it means the objects are equal.

So if we're not going to use Comparator it should be because we want a simpler return type, not because it doesn't work. That said, I can't remember ever seeing something other than Comparator used for this type of behavior in a Java library because if not using Comparator then the library can just call equals on each Object. Why would something else be needed?

For example, the Set implementations in Java ensure uniqueness and they use the Object.equals method on objects to determine equality. Why would we need to do anything different than the Java collections libraries that use either equals or Comparator?

@jmhofer
Copy link
Contributor

jmhofer commented Sep 20, 2013

A Comparator (per javadoc) is a

comparison function, which imposes a total ordering on some collection of objects.

We don't actually need the total ordering for our purposes in distinct or distinctUntil, only the fact that it incidentally defines equality as well. - But still, if someone defines a Comparator for equality, not making it define a total ordering, he or she is basically misusing it.

As an aside, Comparators that are inconsistent with equals are dangerous beasts. It's way too easy to use them for collections like TreeSet which simply won't work in that case.

Also, at least for distinct, there is indeed an efficiency problem when (mis)using Comparator as we do now: There is no way to create an efficient set without either a total ordering or some kind of hash code. All the Set implementations in Java always depend on at least one of these two concepts afaik.

Imho, for doing this right, we actually need a real Java equivalent to IEqualityComparer, which defines both the concepts of equality and a hash code. - Or we could leave that whole thing out of the API and make people write their own wrapper classes for that use case, which is not very convenient, of course.

@benjchristensen
Copy link
Member

The compare method on a Comparator can use hashcode and equals just the same as a new interface would allow, but I accept that having a trinary response is confusing when only a binary boolean is needed.

My vote is for Func2<? super T, ? super T, Boolean>, not creating a new special type for this very limited use case.

@jmhofer
Copy link
Contributor

jmhofer commented Sep 20, 2013

The compare implementation can of course use whatever it likes internally. However, our code can't ever get a hashCode from a Comparator. It could get one from a new interface and use it to make distinct efficient, for example.

I agree, though, that it's probably a very limited use case. So I'm ok with Func2, too.

@samuelgruetter
Copy link
Contributor Author

Or maybe distinctUntilChanged taking a Func2<? super T, ? super T, Boolean>, and no custom comparison for distinct, to keep it simple.

@benjchristensen
Copy link
Member

use it to make distinct efficient, for example.

This is a valid concern, iterating the entire list is not good (though with a Comparator we could maintain a sorted list and do a binary search!).

Another consideration for API design, an interface with 2 methods like IEqualityComparer can not be implemented by lambdas as it is not a functional interface with a single method (lambda body).

Is this use case and performance issue strong enough to have a non-functional interface that requires an anonymous inner class to implement two methods? Also, implementing hashCode is far more obnoxious than implementing equals.

For performance reasons and avoiding hashCode implementations, would Comparator having ordering in mind actually be the right choice so we can test for equality using a binary search algorithm?

How about this change to use TreeSet and Comparator and not require iterating the full list?

            final Subscription sourceSub = source.subscribe(new Observer<T>() {

                // use TreeSet instead of ArrayList
                private final TreeSet<U> emittedKeys = new TreeSet<U>(equalityComparator);

                @Override
                public void onCompleted() {
                    observer.onCompleted();
                }

                @Override
                public void onError(Throwable e) {
                    observer.onError(e);
                }

                @Override
                public void onNext(T next) {
                    U nextKey = keySelector.call(next);
                    if(emittedKeys.add(nextKey)) {
                        observer.onNext(next);
                    }
                }
            });

Unit Test:

        @Test
        public void testDistinctOfObjectWithComparator() {
            Observer<? super RandomObject> observer = mock(Observer.class);
            RandomObject r1 = new RandomObject(1, "one");
            RandomObject r2 = new RandomObject(2, "two");
            RandomObject r2b = new RandomObject(2, "twoB");
            Observable<RandomObject> src = from(r1, r1, r2, r1, r2, r2b);
            create(distinct(src, new Comparator<RandomObject>() {

                @Override
                public int compare(RandomObject o1, RandomObject o2) {
                    int i = o1.i - o2.i;
                    if(i != 0) {
                        return i;
                    }

                    return o1.v.compareTo(o2.v);
                }
            })).subscribe(observer);

            InOrder inOrder = inOrder(observer); 
            inOrder.verify(observer, times(1)).onNext(r1);
            inOrder.verify(observer, times(1)).onNext(r2);
            inOrder.verify(observer, times(1)).onNext(r2b);
            inOrder.verify(observer, times(1)).onCompleted();
            verify(observer, never()).onError(any(Throwable.class));
            inOrder.verifyNoMoreInteractions();
        }

@jmhofer
Copy link
Contributor

jmhofer commented Sep 20, 2013

For TreeSet to work as expected, the Comparator has to be consistent with equals, and in that case, we don't even need it in the first place.

It's not correct in Java to implement equals without implementing hashCode, too.

@abersnaze
Copy link
Contributor

This method 'java.util.Objects.hash(Object...)' makes computing hash codes much easier.

@benjchristensen
Copy link
Member

@jmhofer Yes, if it's not consistent with equals it won't work, but if they have equals they don't need to use Comparator and why would they make them be different?

@abersnaze Yes it does, but I'd like to avoid an API that requires developers to do that and know about some obscure API if we don't need it.

Question about using IEqualityComparer, would it need to calculate the hashCode every time the object is touched since it can't cache it (as it doesn't know if mutation has occurred in an object)?

@jmhofer
Copy link
Contributor

jmhofer commented Sep 21, 2013

If the observable objects already have an equals/hashCode implementation, then the distinct call without any additional parameters will do. Afaik the additional distinct signature using an explicit IEqualityComparer in Rx.NET is for cases where you want to use a different equals from the default one for distinguishing between the observable objects.

Java doesn't really support the concept of multiple equivalence relations per class. It would need something like the IEqualityComparer in its standard library for that, at least (and probably support that interface in the collections library, too). This was the reason why I initially left away the implementation for that specific distinct signature. I still don't think it's much of a use case...

My current implementation is a bit inconsistent: It takes a Comparator but doesn't care for or use the total ordering that should be defined by that Comparator. It just uses the equivalence relation defined by it, and has to live without being able to compute hash codes.

The more I think about it, the more I agree with @samuelgruetter that the Comparator is a bad choice for this. distinct has no need to impose a total ordering. It just needs an equivalence relation. Given a total ordering, we could make distinct efficient, of course, without using hash codes (just not via the Java default set implementations). But I don't see much value in that. It's also something completely different from what Rx.NET does here.

So basically, I'm for using a simple Func2 for distinctUntilChanged and maybe leaving that away from distinct due to efficiency.

If we created an interface like IEqualityComparer, we'd have to impose the same restrictions and contracts on its equals and hashCode methods as Object does. Mutable fields used in hash code computations break the standard Java collection implementations based on hash codes. Therefore I see no strong need to make distinct work with that special case either.

@benjchristensen
Copy link
Member

We ended up removing Comparator for these operators ... so closing.

rickbw pushed a commit to rickbw/RxJava that referenced this issue Jan 9, 2014
those with custom equality are not yet there because of issue ReactiveX#395
rickbw pushed a commit to rickbw/RxJava that referenced this issue Jan 9, 2014
Removing these fairly recently added overloads as they turn out to not be the best approach.

Discussion ongoing as to how to implement them at ReactiveX#395
duarten pushed a commit to duarten/RxJava that referenced this issue Jan 12, 2014
Removing these fairly recently added overloads as they turn out to not be the best approach.

Discussion ongoing as to how to implement them at ReactiveX#395
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants