Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix type instability in generic matmul #14722

Merged
merged 1 commit into from
Jun 27, 2016

Conversation

simonster
Copy link
Member

For C::AbstractVecOrMat{R}, A::AbstractVecOrMat{T}, B::AbstractVecOrMat{S}, we were sometimes initializing the intermediate sum as zero(R), but it wouldn't stay that way if R was narrower than T and S.

This promotes the intermediate sums (s and Ctmp in the code) to the promotion of R and zero(T)*zero(S) + zero(T)*zero(S). Thus, for A_mul_B!(C::Matrix{Float32}, A::Matrix{Float64}, B::Matrix{Float64}, the intermediate sums are Float64. The other possibility would be to enforce that the intermediate sums are the same type as R (i.e. Float32 in this example) but it seemed more conservative to potentially use higher precision than lower precision.

Before:

julia> C = Array(Float32, 100, 100);
       A = rand(Float64, 100, 100);
       B = rand(Float64, 100, 100);

julia> @time A_mul_B!(C, A, B);
  0.031763 seconds (2.11 M allocations: 32.154 MB, 9.28% gc time)

After:

julia> @time A_mul_B!(C, A, B);
  0.000870 seconds (10 allocations: 448 bytes)

@andreasnoack
Copy link
Member

LGTM

@simonster
Copy link
Member Author

The RootInt type in one of the tests doesn't have zero defined, so that test is failing. In principle I guess we should be setting up the zero values for the blocked algorithm the same way as we currently do for the naive algorithm.

@@ -524,10 +525,11 @@ function _generic_matmatmul!{T,S,R}(C::AbstractVecOrMat{R}, tA, tB, A::AbstractV
if tB == 'N'
for i = 1:mA, j = 1:nB
if isempty(A) || isempty(B)
Ctmp = zero(R)
z2 = zero(T)*zero(S) + zero(T)*zero(S)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If A or B is empty, maybe we could just skip the loop entirely and fill!(C, zero(R))?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think this check can then be moved outside the transpose branches which will make the code a bit shorter and simpler.

@andreasnoack
Copy link
Member

In principle I guess we should be setting up the zero values for the blocked algorithm the same way as we currently do for the naive algorithm.

I.e. check for non-emptiness and then use the elements of the arrays instead of the array element types?

@tkelman
Copy link
Contributor

tkelman commented Jan 19, 2016

can this be tested against via @inferred or looking at the @code_warntype output to prevent it from regressing? cc @timholy on the zero issue

@timholy
Copy link
Member

timholy commented Jan 19, 2016

On the zero issue, there's no mathematical reason to prevent defining zero(RootInt). OTOH maybe it's a useful test case for such situations.

Also make sure that intermediate values are promoted to the output type
@simonster simonster force-pushed the sjk/matmatmul-type-instability branch from f870d93 to 10d4d2c Compare June 25, 2016 21:58
@simonster
Copy link
Member Author

Updated to:

  • Bail out early if A or B is empty and just fill C with zeros, instead of putting a bunch of branches everywhere.
  • Use zero based on elements themselves rather than types for the blocked case. (I just use the first elements in both arrays, since if they're isbits the types will be the same. If you tried hard enough, I think you could construct a type where this is not mathematically correct because zero does different things for different elements, but then the current code wouldn't work either.)
  • Add a test that allocation doesn't scale with input size for A_mul_B!(::Matrix{Float32}, ::Matrix{Float64}, ::Matrix{Float64}). It's possibly more fragile than is ideal, but it was the easiest approach I could think of that would catch a regression here.

@simonster simonster merged commit 6c4429b into master Jun 27, 2016
@simonster simonster deleted the sjk/matmatmul-type-instability branch June 27, 2016 16:07
A2 = rand(Float64, 6, 6)
B2 = rand(Float64, 6, 6)
A_mul_B!(C1, A1, B1)
@test @allocated(A_mul_B!(C1, A1, B1)) == @allocated(A_mul_B!(C2, A2, B2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I totally missed this the first time around. Filed #17327 to revert the test. Will look into adding a benchmark instead tomorrow.

simonster added a commit that referenced this pull request Jul 8, 2016
This test apparently breaks with inlining disabled.
simonster added a commit that referenced this pull request Jul 8, 2016
Remove test for #14722 (type instability in generic matmul)
simonster added a commit to simonster/BaseBenchmarks.jl that referenced this pull request Jul 8, 2016
In JuliaLang/julia#14722, I fixed an issue where the inner loop was
type-unstable. This benchmark ensures that doesn't regress.
jrevels pushed a commit to JuliaCI/BaseBenchmarks.jl that referenced this pull request Jul 11, 2016
In JuliaLang/julia#14722, I fixed an issue where the inner loop was
type-unstable. This benchmark ensures that doesn't regress.
mfasi pushed a commit to mfasi/julia that referenced this pull request Sep 5, 2016
This test apparently breaks with inlining disabled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants