Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression of reduce (formerly reducedim) in Julia 0.7 #498

Open
wsshin opened this issue Sep 17, 2018 · 7 comments
Open

Regression of reduce (formerly reducedim) in Julia 0.7 #498

wsshin opened this issue Sep 17, 2018 · 7 comments
Labels
performance runtime performance

Comments

@wsshin
Copy link
Contributor

wsshin commented Sep 17, 2018

I observe a significant regression in reduce in Julia 0.7 compared to reducedim in Julia 0.6. Below, I add the columns of an m×2 matrix to create an m×1 matrix for different values of m. In Julia 0.6,

julia> VERSION
v"0.6.3-pre.0"

julia> m = 1; A = @SMatrix rand(m, 2); @btime reducedim(+, $A, Val{2});
  1.894 ns (0 allocations: 0 bytes)

julia> m = 2; A = @SMatrix rand(m, 2); @btime reducedim(+, $A, Val{2});
  2.261 ns (0 allocations: 0 bytes)

julia> m = 3; A = @SMatrix rand(m, 2); @btime reducedim(+, $A, Val{2});
  2.261 ns (0 allocations: 0 bytes)

julia> m = 4; A = @SMatrix rand(m, 2); @btime reducedim(+, $A, Val{2});
  2.264 ns (0 allocations: 0 bytes)

julia> m = 5; A = @SMatrix rand(m, 2); @btime reducedim(+, $A, Val{2});
  2.634 ns (0 allocations: 0 bytes)

julia> m = 10; A = @SMatrix rand(m, 2); @btime reducedim(+, $A, Val{2});
  4.208 ns (0 allocations: 0 bytes)

On the other hand, in Julia 0.7:

julia> VERSION
v"0.7.1-pre.0"

julia> m = 1; A = @SMatrix rand(m, 2); @btime reduce(+, $A, dims=Val(2));
  8.015 ns (0 allocations: 0 bytes)

julia> m = 2; A = @SMatrix rand(m, 2); @btime reduce(+, $A, dims=Val(2));
  8.205 ns (0 allocations: 0 bytes)

julia> m = 3; A = @SMatrix rand(m, 2); @btime reduce(+, $A, dims=Val(2));
  210.229 ns (3 allocations: 160 bytes)

julia> m = 4; A = @SMatrix rand(m, 2); @btime reduce(+, $A, dims=Val(2));
  55.512 ns (2 allocations: 128 bytes)

julia> m = 5; A = @SMatrix rand(m, 2); @btime reduce(+, $A, dims=Val(2));
  44.934 ns (2 allocations: 144 bytes)

julia> m = 10; A = @SMatrix rand(m, 2); @btime reduce(+, $A, dims=Val(2));
  57.598 ns (2 allocations: 272 bytes)

Observations:

  • 0.6 does not use any allocations.
  • 0.7 starts using allocations for $m ≥ 3$, but even for $m ≤ 2$ for which no allocations are used, 0.7 is 3–4 times slower than 0.6.
  • There is something weird about $m = 3$ in 0.7: the code runs significantly slower for $m = 3$ than for $m > 3$, probably because it uses one more allocation for some reason.

Any idea why this regression occurs?

@wsshin
Copy link
Contributor Author

wsshin commented Sep 17, 2018

Maybe related to #439?

@tkoolen
Copy link
Contributor

tkoolen commented Sep 17, 2018

See also #494.

@nlw0
Copy link

nlw0 commented Dec 27, 2018

When I reproduced your tests today I only saw allocations for m>=6, but sooner in my use case (4x4 matrices)

julia> VERSION
v"1.2.0-DEV.66"

julia> A = randn(4,4);

julia> for m = 1:10
display(m)
A = @smatrix rand(m, 2); @Btime reduce(+, $A, dims=Val(1));
A = @smatrix rand(m, m); @Btime reduce(+, $A, dims=Val(1));
end
1
1.814 ns (0 allocations: 0 bytes)
1.693 ns (0 allocations: 0 bytes)
2
1.695 ns (0 allocations: 0 bytes)
2.024 ns (0 allocations: 0 bytes)
3
1.754 ns (0 allocations: 0 bytes)
2.029 ns (0 allocations: 0 bytes)
4
2.031 ns (0 allocations: 0 bytes)
36.787 ns (2 allocations: 192 bytes)
5
2.032 ns (0 allocations: 0 bytes)
41.070 ns (2 allocations: 256 bytes)
6
37.657 ns (2 allocations: 144 bytes)
53.786 ns (2 allocations: 368 bytes)
7
32.522 ns (2 allocations: 160 bytes)
65.344 ns (2 allocations: 464 bytes)
8
34.810 ns (2 allocations: 176 bytes)
78.683 ns (2 allocations: 624 bytes)
9
35.368 ns (2 allocations: 192 bytes)
92.158 ns (2 allocations: 752 bytes)
10
38.982 ns (2 allocations: 208 bytes)
107.778 ns (2 allocations: 912 bytes)

Are there any open plans to solve this? This looks like a problem I might go around by myself in my own project, but it would be certainly better to have this solved in the library. Is there some way a newbie could help?

@andyferris
Copy link
Member

Yes, this is annoying. It is possible that the problem might relate to how keyword argument functions are lowered - I think there was an issue or comment somewhere to automatically add @propagate_inbounds to the helper functions to fix certain inlining and boundschecking performance issues.

PS - Personally, I don't love the dims keyword appearing everywhere in method signatures since it reflects poor seperation of concerns. I would we write things like sum.(splitdims(A, 1)) (or the new eachrow, eachcol, eachslice functions).

@tkoolen
Copy link
Contributor

tkoolen commented Jan 2, 2019

Completely agree, I wish the dims kwarg API would just go away.

@c42f c42f added the performance runtime performance label Jul 31, 2019
@c42f
Copy link
Member

c42f commented Jul 31, 2019

Probable same root cause as #540

@mateuszbaran
Copy link
Collaborator

One possible workaround for this is directly using the StaticArrays._reduce method like this:

f_(A) = StaticArrays._reduce(+, A, Val(2), NamedTuple())

That method was added in #659.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance runtime performance
Projects
None yet
Development

No branches or pull requests

6 participants