Unify and fix parametrized metrics + add missing keywords to tests #135

devmotion · 2019-05-28T19:52:00Z

As correctly noticed in #134 (review), the current implementation of PeriodicEuclidean is a bit inconsistent and can be improved, it seems.

As an example, before the last PR #134 was merged,

julia> evaluate(PeriodicEuclidean([4]), 1, 3)
2

julia> evaluate(PeriodicEuclidean([4]), [1], [3])
2.0

julia> evaluate(PeriodicEuclidean([4,5]), 1, 3)
2

julia> evaluate(PeriodicEuclidean([4,5]), [1], [3])
ERROR: DimensionMismatch("arrays have length 1 but parameters have length 2.")
Stacktrace:
 [1] evaluate(::PeriodicEuclidean{Array{Int64,1}}, ::Array{Int64,1}, ::Array{Int64,1}) at /home/david/.julia/dev/Distances/src/metrics.jl:183
 [2] top-level scope at none:0

and now currently on master

julia> evaluate(PeriodicEuclidean([4]), 1, 3)
2.0

julia> evaluate(PeriodicEuclidean([4]), [1], [3])
2.0

julia> evaluate(PeriodicEuclidean([4,5]), 1, 3)
2.0

julia> evaluate(PeriodicEuclidean([4,5]), [1], [3])
ERROR: DimensionMismatch("arrays have length 1 but parameters have length 2.")
Stacktrace:
 [1] evaluate(::PeriodicEuclidean{Array{Int64,1}}, ::Array{Int64,1}, ::Array{Int64,1}) at /home/david/.julia/dev/Distances/src/metrics.jl:183
 [2] top-level scope at none:0

So the incorrect result types were fixed but since the scalar evaluation just takes first(d.periods) even incorrect lengths of d.periods do not throw errors for scalar input whereas they do if arrays are used as inputs.

I tried to fix this problem in this PR by checking the size of d.periods also for scalar inputs:

julia> evaluate(PeriodicEuclidean([4]), 1, 3)
2.0

julia> evaluate(PeriodicEuclidean([4]), [1], [3])
2.0

julia> evaluate(PeriodicEuclidean([4,5]), 1, 3)
ERROR: DimensionMismatch("inputs are scalars but parameters have length 2.")
Stacktrace:
 [1] evaluate(::PeriodicEuclidean{Array{Int64,1}}, ::Int64, ::Int64) at /home/david/.julia/dev/Distances/src/metrics.jl:266
 [2] top-level scope at none:0

julia> evaluate(PeriodicEuclidean([4,5]), [1], [3])
ERROR: DimensionMismatch("arrays have length 1 but parameters have length 2.")
Stacktrace:
 [1] evaluate(::PeriodicEuclidean{Array{Int64,1}}, ::Array{Int64,1}, ::Array{Int64,1}) at /home/david/.julia/dev/Distances/src/metrics.jl:180
 [2] top-level scope at none:0

Adding a result_type implementation that specializes on PeriodicEuclidean and does not directly call evaluate with scalar inputs allows to get rid of the special case for empty periods as well (which, as mentioned in #134 (comment)), was the reason for setting p to oneunit(eltype(d.periods)) for empty periods before). Moreover, I removed all occurrences of parameters() since it was only used to check if d is of type PeriodicEuclidean and made these checks explicit. Maybe one could implement two methods of evaluate for arrays that dispatch on PeriodicEuclidean to make it even more explicit and easier to read - currently half of the implementation of these methods are just handling PeriodicEuclidean.

dkarrasch · 2019-05-29T05:53:08Z

I like this very much, but I'd ask to leave the parameters trait. This was designed for allowing also all weighted UnionMetrics being handled by the same evaluate method. It just hasn't been done yet. Along those lines, we could have a "unified" scalar evaluate for UnionMetrics, which branches over p === nothing. Similarly, one could combine the two result_type functions using the same branching (as it used to be before #134). I could do it, if you prefer.

devmotion · 2019-05-29T07:25:13Z

Ah, to combine it with the weighted metrics is a neat idea! I think it would increase readability if the resulting evaluate logic for both parametrized and non-parametrized metrics is kept separated, i.e., forward parametrized metrics (using the trait) to the methods in wmetrics (with some slight generalizations such as replacing d.weights with parameters(d)) and the other ones to "old" implementation in metrics.

When I quickly checked the weighted metrics, I noted some issues, highlighted by the following example:

julia> evaluate(WeightedSqEuclidean([3]), 1, 3)
4

julia> evaluate(WeightedSqEuclidean([3]), [1], [3])
12

julia> evaluate(WeightedSqEuclidean([3,4]), 1, 3)
4

julia> evaluate(WeightedSqEuclidean([3,4]), [1], [3])
ERROR: DimensionMismatch("arrays have length 1 but weights have length 2.")
Stacktrace:
 [1] evaluate(::WeightedSqEuclidean{Array{Int64,1}}, ::Array{Int64,1}, ::Array{Int64,1}) at /home/david/.julia/dev/Distances/src/wmetrics.jl:60
 [2] top-level scope at none:0

Basically, the weighted metrics require the same updates as PeriodicEuclidean, and in particular https://github.com/JuliaStats/Distances.jl/blob/master/src/wmetrics.jl#L43 is just plain wrong. Moreover, I noted that there's no constructor such as WeightedSqEuclidean(). That seems reasonable to me, since it's completely unclear what a reasonable initialization would be - if the weights are not of the same length as the evaluated numbers/vectors the computation will fail. Due to the same reason I think the constructor PeriodicEuclidean() should be removed, since the resulting metric can only be used to evaluate empty vectors.

I will try to add these changes later today.

KristofferC · 2019-05-29T07:29:16Z

Please try the #107 (comment) benchmark as well. It is a good benchmark to see that things get correctly optimized for a simple case.

dkarrasch · 2019-05-29T07:35:55Z

I guess there is a trade-off between readability and code duplication. @KristofferC made me work hard to not add another evaluate method for PeriodicEuclidean. One reason being the benchmark problem @KristofferC mentioned here #135 (comment). That is at least an argument in favor of having only "one loop", that can be tracked and maintained. OTOH, since we currently have the weighted evaluate anyway, we could put the whole parametrized logic in there. I guess there is no right or wrong here, it's just a design decision. Within the current framework, I have a working version already.

devmotion · 2019-05-29T07:42:11Z

OTOH, since we currently have the weighted evaluate anyway, we could put the whole parametrized logic in there.

That was exactly what I proposed, but I guess I was not clear enough 😃

dkarrasch · 2019-05-29T07:44:42Z

No, you were clear. I should have added "as you said". I didn't mean to steal your argument, I meant to give your argument against mine. 😄

devmotion · 2019-05-29T14:50:23Z

The last commit fixes the issues with weighted metrics that I mentioned above, and unifies the way in which parametrized metrics (i.e., at the moment weighted metrics and PeriodicEuclidean) are evaluated. I ran both @KristofferC's short benchmark script and the full benchmark suite and could not observe any regressions.

dkarrasch · 2019-05-29T15:10:09Z

Wow, amazing work! I see you have included the generic eval_reduce proposal. You could include the minor test "fixes" from #136, and then we can close that one. And you may consider to rename your PR for future reference.

johnnychen94 · 2019-05-29T16:50:11Z

test/test_dists.jl

-        @test pairwise(dist, x, y) ≈ rxy
-        @test pairwise(dist, x) ≈ rxx
+        @test pairwise(dist, x, y, dims=2) ≈ rxy
+        @test pairwise(dist, x, dims=2) ≈ rxx


I suppose these two lines don't need change since pairwise(dist, x, y) still works (although with depwarn).

P.S. L500-L501 is duplicated to L502-L503; we can just remove it if we really need to change.

Hm, yes, either remove, or leave to test for behavior without dims keyword.

I don't see why you would want to keep something that throws a depwarn, so I removed these duplicate lines.

dkarrasch · 2019-05-31T13:37:19Z

src/metrics.jl

+            @simd for I in 1:length(a)
+                ai = a[I]
+                bi = b[I]
+                pi = p[I]


Is there any issue with overwriting the Base constant pi? I was experimenting with that and wasn't sure, so I used pI and aI etc.

I don't think so. Since we don't want to access the mathematical constant here, it should not matter that it is masked by the local variable.

dkarrasch · 2019-06-19T07:04:27Z

Bump. It would be cool if we could review, merge, and tag this, to make the PeriodicEuclidean available.

nalimilan · 2019-06-19T08:45:51Z

Is there any chance you could split this PR into smaller parts, or move out some fixes which can be applied separately first? It's very hard to review as-is as it touches many things. In particular, it would be nice to have a commit that moves code without changing it, so that we can review what actually changes.

devmotion · 2019-06-19T11:16:48Z

Hmm... I'm not sure what would be a good way to split this PR. The first five commits unify and fix the parametrized metrics and fix and extend the tests to account for these changes; maybe the last two commits which fix and simplify different distance implementations could be separated.

it would be nice to have a commit that moves code without changing it

What do you mean?

nalimilan · 2019-06-19T12:08:08Z

maybe the last two commits which fix and simplify different distance implementations could be separated.

Then yes, please do. Anything that isn't strictly required should be separated for clarity.

it would be nice to have a commit that moves code without changing it

What do you mean?

That the "Fix weighted metrics and handle parametrized metrics in unified way" commit moves lots of things (like the docstring for Euclidean([thresh])) from one place to another, making it impossible to spot what changed and want was just moved.

Also notice how the diff for metrics.jl makes it looks like you added a new _pairwise!(r::AbstractMatrix, dist::Euclidean, ...) method somewhere and modified the existing method with that signature to be _pairwise!(r::AbstractMatrix, dist::WeightedEuclidean, ...): this is almost impossible to review. If you moved the code without modifying it in a separate commit, then the diff would show what changed in the existing ::Euclidean and ::WeightedEuclidean methods (if anything). Also, does the code really needs to move?

devmotion · 2019-07-05T15:35:23Z

Sorry for the delay! I tried to figure out how to split the PR, but even splitting the last two commits and rebasing them on master didn't work out well since they already require the restructuring of the weighted distances. Overall I got the impression that all changes are very closely related to each other.

I assumed that by unifying the parametrized metrics and getting rid of the separation between the weighted metrics and the periodic Euclidean metric, which were already defined in metrics.jl, there's no need for a separate file wmetrics.jl anymore, and so I moved the remaining parts of that file to metrics.jl.

I don't know how to do any more advanced rewriting of the git history, so I'm a bit lost here. Maybe it helps if you look at a smaller subset of changes in the Github review interface by, e.g., selecting the last two commits separately?

Simplify PeriodicEuclidean

9b59713

ararslan requested a review from nalimilan May 28, 2019 19:56

Fix weighted metrics and handle parametrized metrics in unified way

c94bcf1

Add missing dims keyword to tests

423bf62

devmotion changed the title ~~Simplify PeriodicEuclidean~~ Unify and fix parametrized metrics + add missing keywords to tests May 29, 2019

dkarrasch mentioned this pull request May 29, 2019

update tests in view of recently added dims keyword (avoid warning) #136

Closed

johnnychen94 reviewed May 29, 2019

View reviewed changes

devmotion added 4 commits May 30, 2019 01:22

Remove duplicate tests

5de5bce

Remove comment

c36c886

Simplify CosineDist and CorrDist

afee96a

Simplify SpanNormDist, Jaccard, and BrayCurtis

80e85a2

dkarrasch reviewed May 31, 2019

View reviewed changes

dkarrasch mentioned this pull request Oct 3, 2019

Extend parameter trait to wmetrics #150

Merged

KristofferC closed this in #150 Apr 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify and fix parametrized metrics + add missing keywords to tests #135

Unify and fix parametrized metrics + add missing keywords to tests #135

devmotion commented May 28, 2019 •

edited

Loading

dkarrasch commented May 29, 2019

devmotion commented May 29, 2019

KristofferC commented May 29, 2019

dkarrasch commented May 29, 2019

devmotion commented May 29, 2019

dkarrasch commented May 29, 2019

devmotion commented May 29, 2019

dkarrasch commented May 29, 2019

johnnychen94 May 29, 2019 •

edited

Loading

dkarrasch May 29, 2019

devmotion May 29, 2019 •

edited

Loading

dkarrasch May 31, 2019

devmotion Jun 1, 2019

dkarrasch commented Jun 19, 2019

nalimilan commented Jun 19, 2019

devmotion commented Jun 19, 2019

nalimilan commented Jun 19, 2019

devmotion commented Jul 5, 2019

Unify and fix parametrized metrics + add missing keywords to tests #135

Unify and fix parametrized metrics + add missing keywords to tests #135

Conversation

devmotion commented May 28, 2019 • edited Loading

dkarrasch commented May 29, 2019

devmotion commented May 29, 2019

KristofferC commented May 29, 2019

dkarrasch commented May 29, 2019

devmotion commented May 29, 2019

dkarrasch commented May 29, 2019

devmotion commented May 29, 2019

dkarrasch commented May 29, 2019

johnnychen94 May 29, 2019 • edited Loading

Choose a reason for hiding this comment

dkarrasch May 29, 2019

Choose a reason for hiding this comment

devmotion May 29, 2019 • edited Loading

Choose a reason for hiding this comment

dkarrasch May 31, 2019

Choose a reason for hiding this comment

devmotion Jun 1, 2019

Choose a reason for hiding this comment

dkarrasch commented Jun 19, 2019

nalimilan commented Jun 19, 2019

devmotion commented Jun 19, 2019

nalimilan commented Jun 19, 2019

devmotion commented Jul 5, 2019

devmotion commented May 28, 2019 •

edited

Loading

johnnychen94 May 29, 2019 •

edited

Loading

devmotion May 29, 2019 •

edited

Loading