Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type-stabilizing array concatenations #19387

Merged
merged 4 commits into from
Dec 1, 2016

Conversation

pabloferz
Copy link
Contributor

@pabloferz pabloferz commented Nov 22, 2016

This is a rewriting of the some of the array concatenation methods to make them type stable. The PR adds a type stable version of cat (cat{n}(::Type{Val{n}}, X...)) while preserving the existing API (which cannot be made inferable, but should be just a bit faster than before anyway).

Fixes #13665, #19038 and #19304

NOTE: I believe the first three commits could be backported to 0.5.


function _cat(T::Type, shape, sifter, X...)
N = length(shape)
A = cat_similar(X[1], T, shape)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're at it, could you choose the return type based on all input types (#2326)? Or do you think it should go into another PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to handle that somehow, but I'd say that it would have to be addressed elsewhere. For one I have some ideas that would require the new sub-typing algorithm in place.

@pabloferz pabloferz force-pushed the pz/typestablecat branch 2 times, most recently from e72d1c8 to 318cf38 Compare November 23, 2016 15:10
@stevengj
Copy link
Member

Do we have benchmark coverage of these functions for dense and sparse matrices?

@stevengj
Copy link
Member

Needs tests?

hcat(X...)
end
function vcat(Xin::_SparseConcatGroup...)
X = SparseMatrixCSC[issparse(x) ? x : sparse(x) for x in Xin]
X = map(x -> SparseMatrixCSC(issparse(x) ? x : sparse(x)), Xin)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why replace the comprehensions with maps? Generate Tuples rather than Vectors?

Would SparseMatrixCSC suffice in place of x -> SparseMatrixCSC(issparse(x) ? x : sparse(x))?

Best!

Copy link
Contributor Author

@pabloferz pabloferz Nov 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why replace the comprehensions with maps? Generate Tuples rather than Vectors?

Just to avoid the allocation from creating the array, although it shouldn't make much of a difference.

Would SparseMatrixCSC suffice in place of x -> SparseMatrixCSC(issparse(x) ? x : sparse(x))?

SparseMatrixCSC([1]) is not the same as SparseMatrixCSC(sparse([1])). Actually, the first one fails.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, don't both map and the comprehension allocate the same number of arrays?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

map returns a tuple for tuple inputs, while a comprehension creates an array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is just for avoiding the array that holds the arrays. But as I said, it should make much of a difference and I can change it back if you think its better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your modifications look great to me --- thank you for the explanations!

@pabloferz pabloferz force-pushed the pz/typestablecat branch 2 times, most recently from a244ad6 to 7c5ea3e Compare November 24, 2016 05:33
@pabloferz
Copy link
Contributor Author

pabloferz commented Nov 24, 2016

I don't think there are benchmarks for concatenations of mixed dense and sparse matrices. Is it worth to check the performance anyway?

@kshyatt kshyatt added the needs tests Unit tests are required for this change label Nov 24, 2016
@pabloferz
Copy link
Contributor Author

I think we can remove the 'needs tests' label, there already tests covering the pointed issues.

@pabloferz pabloferz changed the title WIP: Type-stabilizing array concatenations Type-stabilizing array concatenations Nov 25, 2016
@pabloferz
Copy link
Contributor Author

Unless there is an impact on performance or there are any more comments, this is ready on my side.

@stevengj
Copy link
Member

It needs @inferred tests for the type stability, to prevent regressions.

@pabloferz
Copy link
Contributor Author

There are already tests in there, but I can add more if they don't seem enough.

@stevengj
Copy link
Member

stevengj commented Nov 25, 2016

@pabloferz, I just want to make sure that tests were added for any issues that you fixed, e.g. I don't see a test for #19304 oh, now I see it.

catdims = dims2cat(dims)
shape = cat_shape(catdims, (), map(cat_size, X)...)
A = cat_similar(X[1], T, shape)
if countnz(catdims) > 1 && T <: Number
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't putting T <: Number first be better to avoid computing the countnz part when possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Changed.

@stevengj stevengj removed the needs tests Unit tests are required for this change label Dec 1, 2016
@stevengj
Copy link
Member

stevengj commented Dec 1, 2016

LGTM.

@stevengj stevengj merged commit 684dc9c into JuliaLang:master Dec 1, 2016
@KristofferC KristofferC added the potential benchmark Could make a good benchmark in BaseBenchmarks label Dec 1, 2016
@tkelman
Copy link
Contributor

tkelman commented Dec 1, 2016

Why wasn't nanosoldier run here?

@stevengj
Copy link
Member

stevengj commented Dec 1, 2016

Forgot, sorry.

@pabloferz pabloferz deleted the pz/typestablecat branch December 2, 2016 17:11
@pabloferz
Copy link
Contributor Author

Fortunately, there were only speed improvements that came from this. See https://github.com/JuliaCI/BaseBenchmarkReports/blob/a9c6ffef26b60d527d06b80ac3ea2fde79637a2a/daily_2016_12_2/report.md

@tkelman
Copy link
Contributor

tkelman commented Dec 7, 2016

This broke quite a few packages: https://htmlpreview.github.io/?https://github.com/JuliaCI/pkg.julialang.org/blob/ac017050e0c662e46feef496e77ac12baa85583c/pulse.html

I haven't bisected all of them, so this isn't necessarily at fault for every one, but it's at least the underlying cause for AverageShiftedHistograms and BlackBoxOptim.

@nalimilan
Copy link
Member

The new failure in CategoricalArrays tests is due to the fact that vcat(["a"], "az") now returns a Vector{Any}, when it previously returned a Vector{String} (both are ==). Of course this can (should?) be written as vcat(["a"], ["az"]), but I'm not sure whether Any is intended or not. For example, vcat([1], 2) still returns a Vector{Int}.

@pabloferz
Copy link
Contributor Author

@nalimilan That's because we this uses now promote_eltype for type stability reasons, but eltype(String) == Char. So we need special handling for scalars (in this case non AbstractArray subtypes before doing the promote_eltype). Fortunately, that is pretty easy to fix.

@tkelman I'll look into the ones you mentioned above to see what is the problem, and I you get a bigger list I can look into that too.

@pabloferz
Copy link
Contributor Author

@tkelman I didn't check them all, but #19523 seems to fix the problems on a bunch of packages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
potential benchmark Could make a good benchmark in BaseBenchmarks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants