Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.. becomes slow and allocates for 4 or more dimensions #20

Closed
ranocha opened this issue Sep 16, 2020 · 4 comments · Fixed by #23
Closed

.. becomes slow and allocates for 4 or more dimensions #20

ranocha opened this issue Sep 16, 2020 · 4 comments · Fixed by #23

Comments

@ranocha
Copy link
Member

ranocha commented Sep 16, 2020

julia> using BenchmarkTools, EllipsisNotation

julia> u = rand(1, 2, 3);

julia> @benchmark view($u, $1, $..) # that's okay
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.889 ns (0.00% GC)
  median time:      1.907 ns (0.00% GC)
  mean time:        1.929 ns (0.00% GC)
  maximum time:     162.367 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> u = rand(1, 2, 3, 4);

julia> @benchmark view($u, $1, $..) # that's relatively slow and allocates...
BenchmarkTools.Trial: 
  memory estimate:  832 bytes
  allocs estimate:  36
  --------------
  minimum time:     1.623 μs (0.00% GC)
  median time:      1.721 μs (0.00% GC)
  mean time:        1.756 μs (1.44% GC)
  maximum time:     257.162 μs (98.41% GC)
  --------------
  samples:          10000
  evals/sample:     10

julia> @benchmark view($u, $1, $:, $:, $:) # that's fast again
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     2.305 ns (0.00% GC)
  median time:      2.321 ns (0.00% GC)
  mean time:        2.329 ns (0.00% GC)
  maximum time:     17.121 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

A lot of time seems to be spent in

julia> @benchmark to_indices($u, $(1, ..))
BenchmarkTools.Trial: 
  memory estimate:  192 bytes
  allocs estimate:  8
  --------------
  minimum time:     290.716 ns (0.00% GC)
  median time:      294.605 ns (0.00% GC)
  mean time:        306.962 ns (2.65% GC)
  maximum time:     8.650 μs (95.44% GC)
  --------------
  samples:          10000
  evals/sample:     271

Is there some inlining/tuple-magic to make .. fast for arrays with four or more dimensions?

@ChrisRackauckas
Copy link
Member

Odd. Yeah, maybe something needs to make use of ntuple or something? @mbauman knows this array stuff well so I always ping him here.

@sloede
Copy link

sloede commented Sep 20, 2020

Just to clarify: the slowness is not dependent on the number of array dimensions but on the number of ellipsis dimensions, i.e., it becomes slow if more than 3 dimensions are "slurped" by the .. operator:

julia> u = rand(1, 2, 3, 4, 5, 6, 7);

julia> @benchmark view($u, $1, $:, $.., $:, $:) # this is fast, as `..` only slurps 3 dimensions (3-5)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     4.020 ns (0.00% GC)
  median time:      4.866 ns (0.00% GC)
  mean time:        4.997 ns (0.00% GC)
  maximum time:     37.419 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @benchmark view($u, $1, $:, $..,$:) # that's again slow with `..` replacing 4 dimensions (3-6)
BenchmarkTools.Trial: 
  memory estimate:  1.53 KiB
  allocs estimate:  66
  --------------
  minimum time:     3.955 μs (0.00% GC)
  median time:      5.232 μs (0.00% GC)
  mean time:        5.570 μs (2.66% GC)
  maximum time:     827.886 μs (98.18% GC)
  --------------
  samples:          10000
  evals/sample:     7

@ChrisRackauckas
Copy link
Member

Yes, it's hit whenever the splat case is hit:

https://github.com/ChrisRackauckas/EllipsisNotation.jl/blob/master/src/EllipsisNotation.jl#L14

I wonder if it needs to be made fully recursive in order to make the compiler fully specialize it, or whether it needs an @generated function.

@sloede
Copy link

sloede commented Sep 20, 2020

Without really knowing what's going on, I think the line you quoted is not the (only) culprit: I can construct a case where the respective fillcolons version is never hit (and here the allocations already occur even though only 3 dimensions are slurped by ..):

julia> u = rand(1, 2, 3, 4);

julia> @benchmark view($u, $1, $:, ..)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     3.461 ns (0.00% GC)
  median time:      3.982 ns (0.00% GC)
  mean time:        4.327 ns (0.00% GC)
  maximum time:     37.940 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @benchmark view($u, $1, ..)
BenchmarkTools.Trial: 
  memory estimate:  800 bytes
  allocs estimate:  35
  --------------
  minimum time:     2.067 μs (0.00% GC)
  median time:      2.307 μs (0.00% GC)
  mean time:        2.462 μs (0.85% GC)
  maximum time:     212.839 μs (98.22% GC)
  --------------
  samples:          10000
  evals/sample:     9

By my count, both versions only use fillcolons L10, L15 (twice in the first case, three times in the second), and finally L12. So the one you reference (L14) does not seem to play a role.
https://github.com/ChrisRackauckas/EllipsisNotation.jl/blob/43f09f9467006f21a462426bd88df707c91d0bbd/src/EllipsisNotation.jl#L10-L15
But as stated above, I am really out of my depth here.

ranocha added a commit to ranocha/EllipsisNotation.jl that referenced this issue Oct 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants