Latency regression in Plots on master vs 1.7 #41914

KristofferC · 2021-08-18T08:58:06Z

1.7:

julia> @time using Plots
  2.311376 seconds (7.10 M allocations: 490.159 MiB, 2.82% gc time, 0.35% compilation time)

julia> @time (p = plot(rand(2,2)); display(p));
  6.417196 seconds (18.23 M allocations: 959.720 MiB, 3.93% gc time, 17.02% compilation time)

master:

julia> @time using Plots
  3.946437 seconds (9.13 M allocations: 622.219 MiB, 4.27% gc time, 1.18% compilation time)

julia> @time (p = plot(rand(2,2)); display(p));
  8.140353 seconds (18.87 M allocations: 1001.987 MiB, 1.61% gc time, 17.00% compilation time)

I haven't looked deeply into it yet.

aviatesk · 2021-08-20T07:52:30Z

From sysimg log, there seems to be significant difference between precompiled statements

release/v1.7

    JULIA usr/lib/julia/sys-o.a
Generating REPL precompile statements... 31/31
Executing precompile statements... 1395/1431
Precompilation complete. Summary:
Total ─────── 105.469442 seconds
Generation ──  79.721119 seconds 75.5869%
Execution ───  25.748323 seconds 24.4131%

master

    JULIA usr/lib/julia/sys-o.a
Generating REPL precompile statements... 31/31
Executing precompile statements... 1244/1276
Precompilation complete. Summary:
Total ───────  98.027579 seconds
Generation ──  74.974410 seconds 76.483%
Execution ───  23.053170 seconds 23.517%

and I guess we need to compile more things at runtime on master ? Haven't bisected what caused this particular regression though.

EDIT: well, there is not much difference between 1.7.0-beta4.1 and master ?

release/v1.7

~/julia3 remotes/origin/release-1.7 aviatesk@amdci2 7s
❯ ./usr/bin/julia -e '@time using Plots; @time plot(rand(10,3))'
  3.693188 seconds (7.44 M allocations: 510.916 MiB, 3.79% gc time, 0.24% compilation time)
  2.870833 seconds (3.60 M allocations: 199.335 MiB, 7.50% gc time, 56.30% compilation time)

master

~/julia avi/noinfer ⇣⇡ aviatesk@amdci2 7s
❯ ./usr/bin/julia -e '@time using Plots; @time plot(rand(10,3))'
  3.460878 seconds (7.19 M allocations: 493.561 MiB, 5.72% gc time, 1.23% compilation time)
  2.995208 seconds (3.49 M allocations: 196.758 MiB, 9.84% gc time, 57.56% compilation time)

timholy · 2021-08-20T09:29:50Z

Plots is a tough case because the most effective way to be cross-version and cross-platform is to use

if ccall(:jl_generating_output, Cint, ()) == 1
    while false; end   # disable the interpreter
    do_work(small_dataset)
end

rather than

if ccall(:jl_generating_output, Cint, ()) == 1
    precompile(do_work, (DataSetType,))
end

but plotting has side effects.Still, they should probably try to transition to this style wherever possible. Currently they rely on the bot to generate precompiles.

KristofferC · 2021-08-25T11:21:49Z

EDIT: well, there is not much difference between 1.7.0-beta4.1 and master ?

Yes, looks like something regressed vs beta3?

beta3:

julia> @time using Plots
  3.277906 seconds (7.42 M allocations: 511.464 MiB, 9.06% gc time, 0.20% compilation time)

julia> @time (p = plot(rand(2,2)); display(p));
  7.475758 seconds (18.31 M allocations: 962.836 MiB, 2.53% gc time, 15.59% compilation time)

on #41781:

julia> @time using Plots
  4.682400 seconds (8.93 M allocations: 610.531 MiB, 7.02% gc time, 0.16% compilation time)

julia> @time (p = plot(rand(2,2)); display(p));
  9.137335 seconds (19.63 M allocations: 1.005 GiB, 2.10% gc time, 16.63% compilation time)

Should be fairly quick to bisect perhaps.

JeffBezanson · 2021-09-14T20:11:58Z

I get:

1.7.0-rc1:

julia> @time using Plots
  4.155797 seconds (8.16 M allocations: 571.039 MiB, 8.09% gc time, 0.34% compilation time)

julia> @time display(plot(rand(10)))
  6.991758 seconds (10.62 M allocations: 569.403 MiB, 1.73% gc time)

master:

julia> @time using Plots
  6.922687 seconds (12.16 M allocations: 778.810 MiB, 5.00% gc time, 1.01% compilation time)

julia> @time display(plot(rand(10)))
 10.923829 seconds (22.90 M allocations: 1.203 GiB, 2.35% gc time)

aviatesk · 2021-09-15T09:09:11Z

So there are two levels of TTFP regression...:

between 1.7-beta3 and 1.7-rc1
1.7-rc1 and master

I filed an issue around throw block deoptimization (#42258), which I thought can be a trigger of the regression between 1.7-rc1 and master.
I tried the plot benchmark with reverting #42149 that can be pretty confused by the issue, but it seems like the regression between master and v1.7-rc1 is not caused by that:

master

~/julia master aviatesk@amdci2 11s
❯ ./usr/bin/julia -e '@time using Plots; @time plot(rand(10,3))'
  7.265401 seconds (12.46 M allocations: 793.437 MiB, 5.08% gc time, 0.65% compilation time)
  2.661832 seconds (3.55 M allocations: 196.283 MiB, 2.25% gc time)

with reverting #42149

~/julia3 master ⇡ aviatesk@amdci2 10m 7s
❯ ./usr/bin/julia -e '@time using Plots; @time plot(rand(10,3))'
  7.294589 seconds (12.44 M allocations: 791.542 MiB, 5.15% gc time, 0.63% compilation time)
  2.622218 seconds (3.64 M allocations: 204.145 MiB, 1.88% gc time)

KristofferC · 2021-09-15T10:30:47Z

Tried again: I get:

julia> @time using Plots
# 1.6.2:
  2.795078 seconds (8.13 M allocations: 564.960 MiB, 3.26% gc time, 0.10% compilation time)
# 1.7.0-rc1:
  4.379164 seconds (10.48 M allocations: 713.449 MiB, 5.69% gc time, 0.15% compilation time)
# master:
  7.175294 seconds (16.00 M allocations: 1018.494 MiB, 3.80% gc time, 0.17% compilation time)

julia> @time display(plot(rand(10)))
# 1.6.2:
  6.755702 seconds (18.10 M allocations: 1.013 GiB, 4.01% gc time, 15.37% compilation time)
# 1.7.0-rc1:
  8.278466 seconds (18.85 M allocations: 1001.208 MiB, 1.95% gc time)
# mater:
 10.549896 seconds (25.17 M allocations: 1.321 GiB, 1.90% gc time, 13.70% compilation time)

So 1.7 is bad and master is really bad. 2x regression in using seems milestone worthy. I'll try to bisect it.

KristofferC · 2021-09-15T10:32:42Z

From sysimg log, there seems to be significant difference between precompiled statements

@aviatesk, that's from #41898

aviatesk · 2021-09-15T11:00:57Z

So 1.7 is bad and master is really bad.

😱 okay, I will also try to bisect it (I think currently we don't have a solid way to define a latency regression)

KristofferC · 2021-09-15T12:29:12Z

I checked out 1.6.2 locally and get:

julia> @time using Plots
  5.467579 seconds (10.95 M allocations: 743.108 MiB, 4.75% gc time, 0.13% compilation time)

julia> VERSION
v"1.6.2"

And if I use the released version downloaded from the website:

julia> @time using Plots
  2.821719 seconds (7.97 M allocations: 554.232 MiB, 7.47% gc time, 0.15% compilation time)

julia> VERSION
v"1.6.2"

Hard to bisect when the same Julia version can give so dramatically different results... I even cleaned out .julia/compiled in between.

Edit: I need to check when I'm back at the computer that I am not running with assertions on or something like that.

vchuravy · 2021-09-15T12:39:23Z

Hard to bisect when the same Julia version can give so dramatically different results...

Do you have any Make.user flags set?

KristofferC · 2021-09-15T12:39:55Z

Literally just updated my comment :)

aviatesk · 2021-09-15T13:33:23Z

I started git bisect between HEAD and 9eade61 with the following script (to find the regression between 1.7-rc1 and master:

# ~/julia4 tags/v1.7.0-rc1 aviatesk@amdci2
# ❯ ./usr/bin/julia -e '@time using Plots; @time plot(rand(10,3))'
#   4.071415 seconds (8.09 M allocations: 569.073 MiB, 7.24% gc time, 0.10% compilation time)
#   2.741481 seconds (3.46 M allocations: 194.945 MiB, 5.41% gc time)

# ~/julia master aviatesk@amdci2 4m 33s
# ❯ ./usr/bin/julia -e '@time using Plots; @time plot(rand(10,3))'
#   7.275295 seconds (12.46 M allocations: 793.529 MiB, 5.06% gc time, 0.65% compilation time)
#   2.663575 seconds (3.55 M allocations: 196.283 MiB, 2.25% gc time)

make cleanall
make -j 64
./usr/bin/julia -e 'using Pkg; Pkg.precompile()'
./usr/bin/julia -e """
let
    allocated = @allocated using Plots
    if allocated > 750_000_000 # let's set threshold on 750 MiB
        exit(1)
    else
        exit(0)
    end
end
"""

And it bisects 5405994, which is fairly reasonable: before the commit, @time using Plots ran on interpreter, but now it's compiled and comes with some overhead.
If we revert that commit, we can get back to the 1.7-rc1 latency at least:

# HEAD
~/julia3 master aviatesk@amdci2 10s
❯ ./usr/bin/julia -e '@time using Plots; @time plot(rand(10,3))'
  7.271732 seconds (12.44 M allocations: 791.544 MiB, 5.17% gc time, 0.63% compilation time)
  2.656330 seconds (3.64 M allocations: 204.145 MiB, 2.37% gc time)

# reverted 5405994f2c5a2a91813dccd203e8f07f72b07521
~/julia3 avi/try aviatesk@amdci2 9m 40s
❯ ./usr/bin/julia -e '@time using Plots; @time plot(rand(10,3))'
  4.482852 seconds (9.17 M allocations: 606.832 MiB, 4.89% gc time, 0.93% compilation time)
  2.759052 seconds (3.59 M allocations: 200.138 MiB, 5.18% gc time)

This suggests we may need to change our mind on how to the latency regression quickly ?
I'd like to hear what @timholy suggests on this here again.

KristofferC · 2021-09-15T13:41:01Z

And it bisects 5405994, which is fairly reasonable: before the commit, @time using Plots ran on interpreter, but now it's compiled and comes with some overhead.

Thanks, now we just need to find the 1.6 -> 1.7 regression, I guess.

timholy · 2021-09-15T13:43:47Z

With regards to 5405994, I think there was some consensus that @time was supposed to compile its workload first. Does that mean that the previous heuristic didn't always work? Of course you can circumvent this with tstart = time(); using Plots; time() - tstart, which is recommended any time you're analyzing latency-related issues.

Was that your question, @aviatesk, or is it something else? I haven't bisected or done anything else like that.

aviatesk · 2021-09-15T14:06:53Z

Was that your question, @aviatesk, or is it something else?

Yes, thanks for your response.

I think there was some consensus that @time was supposed to compile its workload first. Does that mean that the previous heuristic didn't always work?

Yeah, when I merged the PR, I assumed the previous while false; end trick was functional, but it seems like it didn't.

I wonder @time will show much different result for this sort of measurement between v1.8 and lower, which can be a source of confusion ? Maybe now we want to have a simple macro which is supposed to be used for latency-related measurement ?

KristofferC · 2021-09-15T14:12:44Z

Okay, so embarrassingly, making sure that I don't run with assertions, I don't see a regression in 1.7 vs 1.6. And the regression on master is explained by #41914 (comment). So I am not sure it makes sense to keep this open?

aviatesk · 2021-09-15T14:15:41Z

I'm trying to find the different between 1.7 and 1.6, and let me close this if I can't find the difference.

aviatesk · 2021-09-15T14:31:08Z

Okay, I got:

~/julia3 remotes/origin/release-1.6 aviatesk@amdci2 7s
❯ ./usr/bin/julia -e '@time using Plots; @time plot(rand(10,3))'
  4.115876 seconds (7.77 M allocations: 553.396 MiB, 5.08% gc time, 14.70% compilation time)
  2.541933 seconds (3.12 M allocations: 187.996 MiB, 1.98% gc time, 54.41% compilation time)

~/julia3 tags/v1.7.0-rc1 aviatesk@amdci2 10m 28s
❯ ./usr/bin/julia -e '@time using Plots; @time plot(rand(10,3))'
  3.951941 seconds (7.92 M allocations: 551.359 MiB, 5.15% gc time, 0.10% compilation time)
  2.640314 seconds (3.45 M allocations: 193.259 MiB, 1.89% gc time)

So I'd like to conclude there is not much difference between 1.6 and 1.7.
I remember #39305 slightly increased the memory allocation of this super simple benchmark, but I think we prefer the performance benefit from that PR over this minor latency difference.

timholy · 2021-09-15T16:28:51Z

There's a very tiny using regression that I see between 1.6 (julia) and 1.8 (juliam):

Fairly recent master:

tim@diva:~/src/julia-master$ juliam -q --project --startup-file=no
julia> @time using Requires
  0.069032 seconds (50.07 k allocations: 4.413 MiB, 50.21% compilation time)

julia> 
tim@diva:~/src/julia-master$ juliam -q --project --startup-file=no
julia> tstart = time(); using Requires; time()-tstart
0.0838780403137207

1.6:

julia> 
tim@diva:~/src/julia-master$ julia -q --project --startup-file=no
julia> @time using Requires
  0.038621 seconds (108.95 k allocations: 6.865 MiB, 92.67% compilation time)

julia> 
tim@diva:~/src/julia-master$ julia -q --project --startup-file=no
julia> tstart = time(); using Requires; time()-tstart
0.06404614448547363

There's run-to-run variability, but these particular instances are pretty typical.

Requires is a small package so we are sensitive to any workload in the using pipeline that hasn't been precompiled. SnoopCompile can't help here because you first have to load it, but we can mimic what it does:

tim@diva:~/src/julia-master$ juliam -q --startup-file=no
julia> Core.Compiler.Timings.reset_timings(); Core.Compiler.__set_measure_typeinf(true); using Requires; Core.Compiler.__set_measure_typeinf(false); Core.Compiler.Timings.close_current_timer()

julia> traw = deepcopy(Core.Compiler.Timings._timings[1]);

julia> using SnoopCompile

julia> tinf = SnoopCompile.SnoopCompileCore.InferenceTimingNode(traw)
InferenceTimingNode: 0.073816/0.073816 on Core.Compiler.Timings.ROOT() with 0 direct children

The 0 direct children means it's not an inference issue.

Of course, this doesn't stress-test things like method table insertion, but the benchmarks above make it seem like this isn't a big issue.

KristofferC added regression Regression in behavior compared to a previous version compiler:latency Compiler latency labels Aug 18, 2021

JeffBezanson added the priority This should be addressed urgently label Sep 14, 2021

This comment has been minimized.

Sign in to view

KristofferC added this to the 1.7 milestone Sep 15, 2021

aviatesk removed the priority This should be addressed urgently label Sep 15, 2021

aviatesk closed this as completed Sep 15, 2021

KristofferC removed this from the 1.7 milestone Sep 15, 2021

KristofferC mentioned this issue Sep 24, 2021

Slowdown in loading packages on master #42373

Closed

aviatesk mentioned this issue Nov 20, 2021

TTFP regression since 1.7 #43157

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Latency regression in Plots on master vs 1.7 #41914

Latency regression in Plots on master vs 1.7 #41914

KristofferC commented Aug 18, 2021

aviatesk commented Aug 20, 2021 •

edited

Loading

timholy commented Aug 20, 2021

KristofferC commented Aug 25, 2021 •

edited

Loading

JeffBezanson commented Sep 14, 2021

This comment has been minimized.

aviatesk commented Sep 15, 2021

KristofferC commented Sep 15, 2021 •

edited

Loading

KristofferC commented Sep 15, 2021

aviatesk commented Sep 15, 2021

KristofferC commented Sep 15, 2021 •

edited

Loading

vchuravy commented Sep 15, 2021

KristofferC commented Sep 15, 2021

aviatesk commented Sep 15, 2021

KristofferC commented Sep 15, 2021

timholy commented Sep 15, 2021

aviatesk commented Sep 15, 2021

KristofferC commented Sep 15, 2021

aviatesk commented Sep 15, 2021

aviatesk commented Sep 15, 2021 •

edited

Loading

timholy commented Sep 15, 2021

Latency regression in Plots on master vs 1.7 #41914

Latency regression in Plots on master vs 1.7 #41914

Comments

KristofferC commented Aug 18, 2021

aviatesk commented Aug 20, 2021 • edited Loading

timholy commented Aug 20, 2021

KristofferC commented Aug 25, 2021 • edited Loading

JeffBezanson commented Sep 14, 2021

This comment has been minimized.

aviatesk commented Sep 15, 2021

KristofferC commented Sep 15, 2021 • edited Loading

KristofferC commented Sep 15, 2021

aviatesk commented Sep 15, 2021

KristofferC commented Sep 15, 2021 • edited Loading

vchuravy commented Sep 15, 2021

KristofferC commented Sep 15, 2021

aviatesk commented Sep 15, 2021

KristofferC commented Sep 15, 2021

timholy commented Sep 15, 2021

aviatesk commented Sep 15, 2021

KristofferC commented Sep 15, 2021

aviatesk commented Sep 15, 2021

aviatesk commented Sep 15, 2021 • edited Loading

timholy commented Sep 15, 2021

aviatesk commented Aug 20, 2021 •

edited

Loading

KristofferC commented Aug 25, 2021 •

edited

Loading

KristofferC commented Sep 15, 2021 •

edited

Loading

KristofferC commented Sep 15, 2021 •

edited

Loading

aviatesk commented Sep 15, 2021 •

edited

Loading