Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend parameter trait to wmetrics #150

Merged
merged 7 commits into from
Apr 10, 2020
Merged

Extend parameter trait to wmetrics #150

merged 7 commits into from
Apr 10, 2020

Conversation

dkarrasch
Copy link
Member

@dkarrasch dkarrasch commented Oct 3, 2019

This is conceptually nothing but a rebase of #135, so (almost) all the credit goes to @devmotion.

This first commit simply moves wmetrics.jl to metrics.jl, the second rearranges a few docstrings. Functional changes start only in commit 3.

EDIT: Closes #135, but we should have @devmotion as a co-author.

@dkarrasch dkarrasch changed the title Add parameter trait, merge wmetrics into metrics WIP: Add parameter trait, merge wmetrics into metrics Oct 3, 2019
@dkarrasch
Copy link
Member Author

I think this is ready for review. As for the little regression benchmark from #107, this is what I get on this branch:

julia> @btime colwise(dd, Xcol,Ycol);
  798.898 μs (2 allocations: 781.33 KiB)

julia> @btime colwise_naive(Xcol,Ycol);
  834.077 μs (2 allocations: 781.33 KiB)

The little gap is consistent and in the "desired direction". What #135 was also doing and what is included here is fixing some length checking and return type issues for the weighted metrics. Other than that this establishes the parameter trait across all UnionMetrics (including the weighted ones) and adds a few scalar methods.

@dkarrasch dkarrasch changed the title WIP: Add parameter trait, merge wmetrics into metrics Add parameter trait, merge wmetrics into metrics Oct 4, 2019
src/metrics.jl Outdated Show resolved Hide resolved
@dkarrasch dkarrasch mentioned this pull request Oct 9, 2019
@dkarrasch
Copy link
Member Author

It would be great if we could merge these changes this time before other dramatic PRs come in. Therefore, here's a gentle "bump".

@nalimilan nalimilan requested a review from KristofferC October 13, 2019 13:47
@KristofferC
Copy link
Member

Correct me if I am wrong but by doing stuff like:

const weightedmetrics = (WeightedEuclidean,WeightedSqEuclidean,WeightedCityblock,WeightedMinkowski,WeightedHamming)
const UnionWeightedMetrics = Union{weightedmetrics...}
const UnionMetrics = Union{metrics...,weightedmetrics...}

and then dispatching on UnionMetrics means it is impossible for anyone else to add their own WeightedMetrics.
What is the reason we cannot use standard multiple dispatch instead of these hardcoded list of types?

src/metrics.jl Outdated Show resolved Hide resolved
src/metrics.jl Outdated Show resolved Hide resolved
src/metrics.jl Outdated Show resolved Hide resolved
@dkarrasch
Copy link
Member Author

@KristofferC I'm not sure I understand. Currently, we have this in metrics.jl

const metrics = (Euclidean,SqEuclidean,PeriodicEuclidean,Chebyshev,Cityblock,TotalVariation,Minkowski,Hamming,Jaccard,RogersTanimoto,CosineDist,ChiSqDist,KLDivergence,RenyiDivergence,BrayCurtis,JSDivergence,SpanNormDist,GenKLDivergence)
const UnionMetrics = Union{metrics...}

and this in wmetrics.jl

const weightedmetrics = (WeightedEuclidean,WeightedSqEuclidean,WeightedCityblock,WeightedMinkowski,WeightedHamming)
const UnionWeightedMetrics{W} = Union{map(M->M{W}, weightedmetrics)...}

Are you proposing to replace UnionMetrics by Union{metrics...,weightedmetrics...} in the method definitions? The tuple of types was introduced recently and is used to define the call-by-type methods

for M in (metrics..., weightedmetrics...)
    @eval @inline (dist::$M)(a::AbstractArray, b::AbstractArray) = _evaluate(dist, a, b, parameters(dist))
    if M != SpanNormDist
        @eval @inline (dist::$M)(a::Number, b::Number) = _evaluate(dist, a, b, parameters(dist))
    end
end

This is already the adopted version.

@KristofferC
Copy link
Member

Yeah, I'm sorry if my comment wasn't directly related to the PR but more of a comment about the current state of affairs.

There is a lot of code movement here and it is a bit unclear to me what is the core change in this PR? Is it just a bug fix and the added tests fail on master?

@dkarrasch
Copy link
Member Author

First of all, this PR joins the weighted metrics with the UnionMetrics. To achieve this, parameters is generalized towards a "trait" for all union metrics: the default being nothing, and the d.periods for the periodic Euclidean distance, and d.weights for the classic weighted distances. By doing so, a few minor bugs/inconsistencies in the weighted metrics are fixed: #135 (comment) Finally, this adds a few Number methods for some metrics that were previously missing, and some corresponding tests. A few other tests are simply strengthened to also test for the exact return type. Other than that, a few docstrings are moved for more consistency.

Now that I understand that your comment was not related to the actual PR, I think I understand what you mean, and I was thinking about it when I was writing the periodic distance. Should I introduce some abstract types like UnionMetric <: Metric and UnionSemiMetric <: SemiMetric, make the distances subtypes of the corresponding UnionXYZ and dispatch on that? Or shall we have that in a separate PR?

@KristofferC
Copy link
Member

Should I introduce some abstract types like UnionMetric <: Metric and UnionSemiMetric <: SemiMetric

Yes, that makes sense to me.

Or shall we have that in a separate PR?

I think that is the best.

src/metrics.jl Outdated Show resolved Hide resolved
@dkarrasch
Copy link
Member Author

Bump. 😇

@@ -222,7 +222,7 @@ Base.@propagate_inbounds function _evaluate(d::UnionMetrics, a::AbstractArray, b
@inbounds begin
s = eval_start(d, a, b)
if IndexStyle(a, b) === IndexLinear() || size(a) == size(b)
@simd for I in 1:length(a)
@simd for I in eachindex(a, b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note JuliaLang/julia#28226 (comment).

Is that no longer the case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested #150 (comment), and obtained the results above. Wasn't this the test that would determine the slowdown?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant #107 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that colwise is consistently and measurably faster than naivecolwise in that test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is fixed then...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll quickly double-check on Julia v1.0...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I have tested my but off, and I can't seem to be able to reproduce the slowdown anymore, even on Julia v1.0. While the issue JuliaLang/julia#28226 persists even on nightlies, the issue #100 does not re-occur with this PR, neither on Julia v1.0.5 nor v1.3 or nightlies. AFAICT, we wouldn't introduce a slowdown with the change in the line commented on above.

@dkarrasch
Copy link
Member Author

Gentle bump.

@dkarrasch
Copy link
Member Author

Bump again. colwise is getting ever more faster on each new Julia version. In this test #107 (comment) this branch is significantly faster than colwise_naive!

I'd like to continue with the transition to Union[Pre/Semi]Metric types, which should help resolve #126 and #95.

@devmotion
Copy link
Member

Bumping again 🙂 What is holding this PR back?

@dkarrasch dkarrasch changed the title Add parameter trait, merge wmetrics into metrics Extend parameter trait to wmetrics Mar 31, 2020
@dkarrasch
Copy link
Member Author

Gentle bump once more.

@dkarrasch
Copy link
Member Author

Please, could we make some progress here? There are new PRs starting to pile up (like #161) that would make it unnecessarily tedious to merge this after those.

@KristofferC KristofferC merged commit 00b568b into JuliaStats:master Apr 10, 2020
@dkarrasch dkarrasch deleted the parameter_trait branch April 10, 2020 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants