Optimize `maybe_macroexpand` #1991

rikhuijzer · 2022-03-17T18:24:18Z

This PR optimizes and fixes inference for maybe_macroexpand and split_funcname. This reduces the time to first X (TTFX) and running time.

The reason that better type inference causes lower TTFX is because maybe_macroexpand is called somewhere in the SessionActions.open. Via the precompile that we've put on SessionActions.open, the compiler will use type inference to figure out what methods it needs to be compile. However, if the compiler cannot infer some type when stepping through all the function bodies, it cannot compile the called methods. For example, we can make a function g which calls f but make the object that is passed into f non-inferable via Base.inferencebarrier and call precompile on the outer function:

julia> f(x) = x;

julia> g(x) = f(Base.inferencebarrier(x));

julia> precompile(g, (Int,))
true

Now, we can see that Julia didn't compile any specializations:

julia> methods(f)[1].specializations
svec()

The compilation will happen just-in-time when we call g:

julia> g(1)
1

julia> methods(f)[1].specializations
svec(MethodInstance for f(::Int64), nothing, nothing, nothing, nothing, nothing, nothing, nothing)

On the other hand, without the inferencebarrier, precompilation hits the inner and outer function straight away:

julia> a(x) = x;

julia> b(x) = a(x);

julia> precompile(b, (Int,))
true

julia> methods(a)[1].specializations
svec(MethodInstance for a(::Int64), nothing, nothing, nothing, nothing, nothing, nothing, nothing)

Another thing that saved surprisingly much compilation time is getting rid of broadcasting. It sounds a bit annoying and I love broadcasting too, but in some cases is it worth it if it means extra compilation time for every person that is running Pluto? See also Chris saying "I'm the compiler now!" (SciML/OrdinaryDiffEq.jl#1465). I don't think we have to get rid of all broadcasts, but mainly on places where type inference is falling back to Any and gives up precompiling. These cases can be quite easily found via JET.jl and @report_opt annotate_types=true SessionActions.open. Just search for failed to optimize in the output.

Anyway, time for the benchmarks:

`maybe_macroexpand`

Branch `main`

$ julia --project -ie 'using Pluto, BenchmarkTools'

julia> @time @eval Pluto.ExpressionExplorer.maybe_macroexpand(:(@time 1))
  0.375900 seconds (891.63 k allocations: 48.112 MiB, 21.58% gc time, 99.68% compilation time)

julia> @benchmark Pluto.ExpressionExplorer.maybe_macroexpand(:(@time 1))
BenchmarkTools.Trial: 10000 samples with 180 evaluations.
 Range (min … max):  588.978 ns …  16.822 μs  ┊ GC (min … max): 0.00% … 89.32%
 Time  (median):     628.444 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   782.516 ns ± 671.607 ns  ┊ GC (mean ± σ):  3.75% ±  4.45%

  ▇█▅▄▃▃▃▂▂▃▄▃▂▂▂▂▂▃▂▂▂▁▁ ▁▁                                    ▂
  ███████████████████████████████▇████▇█▇█▇▆▆▆▇█▇▇▆▅▆▆▅▆▅▆▆▆▅▄▄ █
  589 ns        Histogram: log(frequency) by time       1.78 μs <

 Memory estimate: 608 bytes, allocs estimate: 13.

────────────────────────────────────────────────────────────────────────────
                                                 Time          Allocations
                                           ───────────────   ───────────────
              Total measured:                   51.1s            3.81GiB

Section                            ncalls     time    %tot     alloc    %tot
────────────────────────────────────────────────────────────────────────────
import Pluto                            1    570ms    1.3%   66.2MiB    1.8%
compiletimes.jl                         1    41.8s   98.7%   3.52GiB   98.2%
  Pluto.Cell                            1   28.2ms    0.1%   3.82MiB    0.1%
  Pluto.Notebook                        1    1.33s    3.1%    164MiB    4.5%
  SessionActions.open                   1    12.8s   30.3%   1.00GiB   27.7%
  SessionActions.shutdown               1    675ms    1.6%   71.4MiB    1.9%
  Configuration.from_flat_kwargs        1    2.38s    5.6%    192MiB    5.2%
  Pluto.run                             1    6.79s   16.1%    759MiB   20.7%
────────────────────────────────────────────────────────────────────────────

Branch `rh/maybe_macroexpand` (this PR)

$ julia --project -ie 'using Pluto, BenchmarkTools'

julia> @time @eval Pluto.ExpressionExplorer.maybe_macroexpand(:(@time 1));
  0.101043 seconds (87.07 k allocations: 4.655 MiB, 99.93% compilation time)

julia> @benchmark Pluto.ExpressionExplorer.maybe_macroexpand(:(@time 1))
BenchmarkTools.Trial: 10000 samples with 199 evaluations.
 Range (min … max):  419.814 ns …  16.994 μs  ┊ GC (min … max): 0.00% … 94.32%
 Time  (median):     448.852 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   572.488 ns ± 691.028 ns  ┊ GC (mean ± σ):  5.41% ±  4.49%

  ▇█▆▄▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁                                   ▂
  ███████████████████████████████▇▇██▇▇█▇▇▇▇▇▅▇▆▇▇▇▆▆▆▆▆▅▆▆▅▅▅▅ █
  420 ns        Histogram: log(frequency) by time        1.3 μs <

 Memory estimate: 512 bytes, allocs estimate: 11.

────────────────────────────────────────────────────────────────────────────
                                                 Time          Allocations
                                           ───────────────   ───────────────
              Total measured:                   49.6s            3.73GiB

Section                            ncalls     time    %tot     alloc    %tot
────────────────────────────────────────────────────────────────────────────
import Pluto                            1    549ms    1.3%   64.7MiB    1.8%
compiletimes.jl                         1    40.3s   98.7%   3.45GiB   98.2%
  Pluto.Cell                            1   29.8ms    0.1%   3.82MiB    0.1%
  Pluto.Notebook                        1    1.33s    3.3%    164MiB    4.6%
  SessionActions.open                   1    11.9s   29.0%    950MiB   26.4%
  SessionActions.shutdown               1    647ms    1.6%   71.5MiB    2.0%
  Configuration.from_flat_kwargs        1    2.36s    5.8%    192MiB    5.3%
  Pluto.run                             1    7.05s   17.3%    759MiB   21.1%
────────────────────────────────────────────────────────────────────────────

So, that's about 50 MiB reduction in allocations on SessionActions.open.

cc: @ghaetinger

github-actions · 2022-03-17T18:24:33Z

Try this Pull Request!

Open Julia and type:

julia> import Pkg
julia> Pkg.activate(temp=true)
julia> Pkg.add(url="https://github.com/fonsp/Pluto.jl", rev="rh/maybe_macroexpand")
julia> using Pluto

dralletje · 2022-03-17T18:56:35Z

src/analysis/ExpressionExplorer.jl

@@ -988,9 +992,15 @@ function maybe_macroexpand(ex::Expr; recursive = false, expand_bind = true)
    end

    if recursive && (result isa Expr)
-        Expr(result.head, maybe_macroexpand.(result.args; recursive = recursive, expand_bind = expand_bind)...)


Getting rid of broadcasting is totally fine if it speeds stuff up, go for loops!
BUT, did you see how it performed with map(result.args) do x maybe_macroexpand(x, recursive=recursive, expand_bind=expand_bind) end? Just curious if that takes similar compilation-related performance hits

Just curious if that takes similar compilation-related performance hits

Noice. Well spotted. map is fine too.

Benchmarks below. All after restarting Julia with julia --project -ie 'using Pluto'.

with broadcasting (original)

julia> @time @eval Pluto.ExpressionExplorer.maybe_macroexpand(:(@time 1)); 0.306775 seconds (894.16 k allocations: 48.205 MiB, 22.07% gc time, 99.59% compilation time)

I think that this is so bad because the types are unknown and multiple types show up in various calls in the recursion which need to be compiled just-in-time. Also, broadcasting takes long to compile. See @code_typed identity.([1, 2]) for more information.

with map

julia> @time @eval Pluto.ExpressionExplorer.maybe_macroexpand(:(@time 1)); 0.079647 seconds (86.74 k allocations: 4.639 MiB, 101.12% compilation time)

with a loop

julia> @time @eval Pluto.ExpressionExplorer.maybe_macroexpand(:(@time 1)); 0.090846 seconds (87.07 k allocations: 4.655 MiB, 99.57% compilation time)

rikhuijzer added 2 commits March 17, 2022 16:56

Optimize maybe_macroexpand

b1c31f5

Add @inferred tests

e565b05

rikhuijzer requested review from dralletje and pankgeorg March 17, 2022 18:24

rikhuijzer added the performance label Mar 17, 2022

rikhuijzer mentioned this pull request Mar 17, 2022

🧂 more precompile for dynamic functions! #1978

Merged

dralletje reviewed Mar 17, 2022

View reviewed changes

dralletje merged commit cba6a50 into main Mar 17, 2022

rikhuijzer deleted the rh/maybe_macroexpand branch March 25, 2022 15:26

rikhuijzer mentioned this pull request Apr 22, 2022

Generate precompile directives via PrecompileSignatures.jl #2054

Merged

fonsp added the backend Concerning the julia server and runtime label Apr 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `maybe_macroexpand` #1991

Optimize `maybe_macroexpand` #1991

rikhuijzer commented Mar 17, 2022 •

edited

Loading

github-actions bot commented Mar 17, 2022

dralletje Mar 17, 2022

rikhuijzer Mar 19, 2022

Optimize maybe_macroexpand #1991

Optimize maybe_macroexpand #1991

Conversation

rikhuijzer commented Mar 17, 2022 • edited Loading

maybe_macroexpand

Branch main

Branch rh/maybe_macroexpand (this PR)

github-actions bot commented Mar 17, 2022

Try this Pull Request!

dralletje Mar 17, 2022

Choose a reason for hiding this comment

rikhuijzer Mar 19, 2022

Choose a reason for hiding this comment

with broadcasting (original)

with map

with a loop

Optimize `maybe_macroexpand` #1991

Optimize `maybe_macroexpand` #1991

rikhuijzer commented Mar 17, 2022 •

edited

Loading

`maybe_macroexpand`

Branch `main`

Branch `rh/maybe_macroexpand` (this PR)

with `map`