-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
colwise ridiculously slow for small columns due to allocating array views #83
Comments
JuliaLang/julia#23240 might do it. |
True; but I have not seen this in the 1.0 milestones. Hence the need for a workaround, instead of waiting. It is not obvious to me whether JuliaLang/julia#23240 will fix this reliably (or only dependent on internal states of the inliner/optimizer). I think that some version of JuliaLang/julia#18632 would be the only way of really fixing this on the language level (allow bundling of references and bitstypes into struct with zero-overhead). Unfortunately, and to my infinite disappointment, this was given up. This makes me believe that we will have to live with this unfortunate state of affairs for the next several years. My situation is that I am building a small geometry package; I'd love to use the distances.jl API in order to allow users to provide their distances (because 1st, many distances are provided, and 2nd, most other packages use the distances.jl API). Unfortunately I cannot, currently (and I cannot profile/optimize code under the assumption that it will become fast with a julia update) If I did a PR which offers the modified API for unbundled views, (non-allocating access of columns in matrices), would you merge/maintain it? |
Same perf now. |
A quite common need is to have one (or two) data matrices, and wanting to compute distances between certain columns. Using Distances.jl is nice, because it supplies an API, so that users can plug different distance functions into an algorithm. As example, consider NearestNeighbors.jl.
Hence, one needs an API to compute evaluate(dist, X[:,i1], Y[:,i2]).
Unfortunately, Distances.jl uses array views. Array views are harmful, because they allocate (and add indirection!). Until this is fixed, one must provide a zero-overhead way of computing such distances.
A simple API would be evaluate(dist, X, i1, Y, i2). Due to limitations of julia, we cannot bundle the underlying matrix and the indices into a struct or tuple (this is what array views do). This constraint is unfortunate, but I don't think there is any way to have a nice and fast API. As far as I understood, this will not change in julia 1.0; hence an ugly-but-fast API is necessary (if you consider allocating array views a bug, then a workaround is needed).
I am not sure about the final API design; hence no pull request. However, I expect that a lot of packages downstream will want to make use of this.
Indeed, the broken API is used even internally in colwise.
I attached an example benchmark, comparing colwise against a naive loop for euclidean distances between 10-dimensional points. Lower dimensional data has even worse relative speed differences; higher dimensional data reduces the difference.
yielding
PS. The combination of evaluate(dist, X, i1, Y, i2) and evaluate(dist, X, Y) [for vectors] works when the distance evaluation is inlined. If it is not inlined, then one should consider an API evaluate(dist, Xptr, Yptr, len), in order to avoid avoidable indirection for Arrays and data copying for SVectors.
The text was updated successfully, but these errors were encountered: