-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify and fix parametrized metrics + add missing keywords to tests #135
Conversation
I like this very much, but I'd ask to leave the |
Ah, to combine it with the weighted metrics is a neat idea! I think it would increase readability if the resulting evaluate logic for both parametrized and non-parametrized metrics is kept separated, i.e., forward parametrized metrics (using the trait) to the methods in When I quickly checked the weighted metrics, I noted some issues, highlighted by the following example: julia> evaluate(WeightedSqEuclidean([3]), 1, 3)
4
julia> evaluate(WeightedSqEuclidean([3]), [1], [3])
12
julia> evaluate(WeightedSqEuclidean([3,4]), 1, 3)
4
julia> evaluate(WeightedSqEuclidean([3,4]), [1], [3])
ERROR: DimensionMismatch("arrays have length 1 but weights have length 2.")
Stacktrace:
[1] evaluate(::WeightedSqEuclidean{Array{Int64,1}}, ::Array{Int64,1}, ::Array{Int64,1}) at /home/david/.julia/dev/Distances/src/wmetrics.jl:60
[2] top-level scope at none:0 Basically, the weighted metrics require the same updates as PeriodicEuclidean, and in particular https://github.com/JuliaStats/Distances.jl/blob/master/src/wmetrics.jl#L43 is just plain wrong. Moreover, I noted that there's no constructor such as I will try to add these changes later today. |
Please try the #107 (comment) benchmark as well. It is a good benchmark to see that things get correctly optimized for a simple case. |
I guess there is a trade-off between readability and code duplication. @KristofferC made me work hard to not add another |
That was exactly what I proposed, but I guess I was not clear enough 😃 |
No, you were clear. I should have added "as you said". I didn't mean to steal your argument, I meant to give your argument against mine. 😄 |
The last commit fixes the issues with weighted metrics that I mentioned above, and unifies the way in which parametrized metrics (i.e., at the moment weighted metrics and PeriodicEuclidean) are evaluated. I ran both @KristofferC's short benchmark script and the full benchmark suite and could not observe any regressions. |
Wow, amazing work! I see you have included the generic |
@test pairwise(dist, x, y) ≈ rxy | ||
@test pairwise(dist, x) ≈ rxx | ||
@test pairwise(dist, x, y, dims=2) ≈ rxy | ||
@test pairwise(dist, x, dims=2) ≈ rxx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose these two lines don't need change since pairwise(dist, x, y)
still works (although with depwarn).
P.S. L500-L501 is duplicated to L502-L503; we can just remove it if we really need to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, yes, either remove, or leave to test for behavior without dims
keyword.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why you would want to keep something that throws a depwarn, so I removed these duplicate lines.
@simd for I in 1:length(a) | ||
ai = a[I] | ||
bi = b[I] | ||
pi = p[I] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any issue with overwriting the Base constant pi
? I was experimenting with that and wasn't sure, so I used pI
and aI
etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. Since we don't want to access the mathematical constant here, it should not matter that it is masked by the local variable.
Bump. It would be cool if we could review, merge, and tag this, to make the |
Is there any chance you could split this PR into smaller parts, or move out some fixes which can be applied separately first? It's very hard to review as-is as it touches many things. In particular, it would be nice to have a commit that moves code without changing it, so that we can review what actually changes. |
Hmm... I'm not sure what would be a good way to split this PR. The first five commits unify and fix the parametrized metrics and fix and extend the tests to account for these changes; maybe the last two commits which fix and simplify different distance implementations could be separated.
What do you mean? |
Then yes, please do. Anything that isn't strictly required should be separated for clarity.
That the "Fix weighted metrics and handle parametrized metrics in unified way" commit moves lots of things (like the docstring for Also notice how the diff for metrics.jl makes it looks like you added a new |
Sorry for the delay! I tried to figure out how to split the PR, but even splitting the last two commits and rebasing them on master didn't work out well since they already require the restructuring of the weighted distances. Overall I got the impression that all changes are very closely related to each other. I assumed that by unifying the parametrized metrics and getting rid of the separation between the weighted metrics and the periodic Euclidean metric, which were already defined in I don't know how to do any more advanced rewriting of the git history, so I'm a bit lost here. Maybe it helps if you look at a smaller subset of changes in the Github review interface by, e.g., selecting the last two commits separately? |
As correctly noticed in #134 (review), the current implementation of
PeriodicEuclidean
is a bit inconsistent and can be improved, it seems.As an example, before the last PR #134 was merged,
and now currently on master
So the incorrect result types were fixed but since the scalar evaluation just takes
first(d.periods)
even incorrect lengths ofd.periods
do not throw errors for scalar input whereas they do if arrays are used as inputs.I tried to fix this problem in this PR by checking the size of
d.periods
also for scalar inputs:Adding a
result_type
implementation that specializes onPeriodicEuclidean
and does not directly callevaluate
with scalar inputs allows to get rid of the special case for emptyperiods
as well (which, as mentioned in #134 (comment)), was the reason for settingp
tooneunit(eltype(d.periods))
for emptyperiods
before). Moreover, I removed all occurrences ofparameters()
since it was only used to check ifd
is of typePeriodicEuclidean
and made these checks explicit. Maybe one could implement two methods ofevaluate
for arrays that dispatch onPeriodicEuclidean
to make it even more explicit and easier to read - currently half of the implementation of these methods are just handlingPeriodicEuclidean
.