Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colwise ridiculously slow for small columns due to allocating array views #83

Closed
chethega opened this issue Nov 22, 2017 · 3 comments
Closed

Comments

@chethega
Copy link

A quite common need is to have one (or two) data matrices, and wanting to compute distances between certain columns. Using Distances.jl is nice, because it supplies an API, so that users can plug different distance functions into an algorithm. As example, consider NearestNeighbors.jl.

Hence, one needs an API to compute evaluate(dist, X[:,i1], Y[:,i2]).

Unfortunately, Distances.jl uses array views. Array views are harmful, because they allocate (and add indirection!). Until this is fixed, one must provide a zero-overhead way of computing such distances.

A simple API would be evaluate(dist, X, i1, Y, i2). Due to limitations of julia, we cannot bundle the underlying matrix and the indices into a struct or tuple (this is what array views do). This constraint is unfortunate, but I don't think there is any way to have a nice and fast API. As far as I understood, this will not change in julia 1.0; hence an ugly-but-fast API is necessary (if you consider allocating array views a bug, then a workaround is needed).

I am not sure about the final API design; hence no pull request. However, I expect that a lot of packages downstream will want to make use of this.

Indeed, the broken API is used even internally in colwise.

I attached an example benchmark, comparing colwise against a naive loop for euclidean distances between 10-dimensional points. Lower dimensional data has even worse relative speed differences; higher dimensional data reduces the difference.

using Distances
using BenchmarkTools

m = 10
n=100_000
Xcol = rand(m,n)
Ycol=rand(m,n)


function colwise_naive(X,Y)
    m = size(X,1)
    n = size(X,2)
    T = eltype(X)
    assert(m == size(Y,1) && n == size(Y,2) && T==eltype(Y))

    res = Vector{T}(n)
    @inbounds  for j = 1:n
        s_ = zero(T)
        @inbounds @simd for i = 1:m
            dd_ = X[i,j]-Y[i,j]
            s_ += dd_*dd_
        end
        res[j] = sqrt(s_)
    end
    res
end

dd = Euclidean()
bench_colw = @benchmark colwise(dd, Xcol,Ycol)
bench_colw2 = @benchmark colwise_naive(Xcol,Ycol)

println("distances.jl colwise")
show(STDOUT, MIME"text/plain"(), bench_colw)

println("\nnaive colwise")
show(STDOUT, MIME"text/plain"(), bench_colw2)

yielding

colwise
BenchmarkTools.Trial: 
  memory estimate:  9.92 MiB
  allocs estimate:  200002
  --------------
  minimum time:     5.914 ms (0.00% GC)
  median time:      6.522 ms (0.00% GC)
  mean time:        6.724 ms (4.03% GC)
  maximum time:     15.262 ms (0.00% GC)
  --------------
  samples:          742
  evals/sample:     1
naive colwise
BenchmarkTools.Trial: 
  memory estimate:  781.33 KiB
  allocs estimate:  2
  --------------
  minimum time:     1.636 ms (0.00% GC)
  median time:      1.726 ms (0.00% GC)
  mean time:        1.932 ms (0.46% GC)
  maximum time:     8.783 ms (0.00% GC)
  --------------
  samples:          2573
  evals/sample:     1

PS. The combination of evaluate(dist, X, i1, Y, i2) and evaluate(dist, X, Y) [for vectors] works when the distance evaluation is inlined. If it is not inlined, then one should consider an API evaluate(dist, Xptr, Yptr, len), in order to avoid avoidable indirection for Arrays and data copying for SVectors.

@KristofferC
Copy link
Member

Unfortunately, Distances.jl uses array views. Array views are harmful, because they allocate (and add indirection!)
...

As far as I understood, this will not change in julia 1.0; hence an ugly-but-fast API is necessary (if you consider allocating array views a bug, then a workaround is needed).

JuliaLang/julia#23240 might do it.

@chethega
Copy link
Author

True; but I have not seen this in the 1.0 milestones. Hence the need for a workaround, instead of waiting.

It is not obvious to me whether JuliaLang/julia#23240 will fix this reliably (or only dependent on internal states of the inliner/optimizer).

I think that some version of JuliaLang/julia#18632 would be the only way of really fixing this on the language level (allow bundling of references and bitstypes into struct with zero-overhead). Unfortunately, and to my infinite disappointment, this was given up. This makes me believe that we will have to live with this unfortunate state of affairs for the next several years.

My situation is that I am building a small geometry package; I'd love to use the distances.jl API in order to allow users to provide their distances (because 1st, many distances are provided, and 2nd, most other packages use the distances.jl API). Unfortunately I cannot, currently (and I cannot profile/optimize code under the assumption that it will become fast with a julia update)

If I did a PR which offers the modified API for unbundled views, (non-allocating access of columns in matrices), would you merge/maintain it?

@KristofferC
Copy link
Member

Same perf now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants