update precompilation statements #2718

bkamins · 2021-04-13T14:46:14Z

Here is the impact:

0.22.7:

julia> using DataFrames

julia> @time df = DataFrame(some_col=1);
  0.066630 seconds (124.23 k allocations: 7.668 MiB)

julia> @time combine(df, :some_col => x -> x);
  0.748427 seconds (2.63 M allocations: 168.141 MiB, 8.86% gc time, 2.41% compilation time)

julia> @time combine(df, :some_col => x -> x);
  0.062810 seconds (80.87 k allocations: 5.105 MiB, 99.50% compilation time)

main:

julia> using DataFrames

julia> @time df = DataFrame(some_col=1);
  0.079420 seconds (114.59 k allocations: 6.918 MiB, 26.56% gc time, 99.84% compilation time)

julia> @time combine(df, :some_col => x -> x);
  0.459968 seconds (1.15 M allocations: 72.700 MiB, 4.77% gc time, 100.20% compilation time)

julia> @time combine(df, :some_col => x -> x);
  0.024984 seconds (6.70 k allocations: 446.336 KiB, 98.04% compilation time)

this PR:

julia> using DataFrames

julia> @time df = DataFrame(some_col=1);
  0.046273 seconds (114.70 k allocations: 6.926 MiB, 99.77% compilation time)

julia> @time combine(df, :some_col => x -> x);
  0.245429 seconds (213.39 k allocations: 12.944 MiB, 100.38% compilation time)

julia> @time combine(df, :some_col => x -> x);
  0.025664 seconds (6.70 k allocations: 446.336 KiB, 97.91% compilation time)

Of course more testing is welcome (and I will report additional results during the week).

CC @pdeffebach as this affects DataFramesMeta.jl

bkamins · 2021-04-13T14:48:11Z

@nalimilan - as you can see after despecialization + hand editing the file is much smaller.
The thing I did in particular is that I removed all precompile statements that depended on specific column name sets passed like e.g. [:x, :y] or [:a, :b, :c], which are common in tests, but probably not seen in practice. The only case I left is :x1 which is auto generated in aggregation.

nalimilan

Cool!

nalimilan · 2021-04-13T14:52:19Z

src/other/precompile.jl

-# sed -i '/categorical/di' src/other/precompile_tmp.jl # Remove CategoricalArrays uses
-# sed -i '/var"/d' src/other/precompile_tmp.jl # var"" is not supported on old Julia versions
-# sed -i '/materialize!/d' src/other/precompile_tmp.jl # Work around an inference bug
-# sed -i '/setindex_widen_up_to/d' src/other/precompile_tmp.jl # Not present on Julia 1.0
-# sed -i '/restart_copyto_nonleaf!/d' src/other/precompile_tmp.jl # Not present on Julia 1.0


So you didn't use these at all? Any step that can limit the amount of hand editing is good to have.

None of signatures that were generated by SnoopCompile.jl matched any of these rules 😄. Things change fast. (ah - maybe there were some var" ones)

But I see that something is failing on 1.0 - I will check what.

…nt64

bkamins · 2021-04-13T17:49:38Z

OK. So we have two problems. On Julia 1.0.5 we have the following error:

signal (11): Segmentation fault
in expression starting at /home/runner/work/DataFrames.jl/DataFrames.jl/test/join.jl:1017
jl_compile_linfo at /buildworker/worker/package_linux64/build/src/codegen.cpp:1191
emit_invoke at /buildworker/worker/package_linux64/build/src/codegen.cpp:3094
emit_expr at /buildworker/worker/package_linux64/build/src/codegen.cpp:3893
emit_ssaval_assign at /buildworker/worker/package_linux64/build/src/codegen.cpp:3615
emit_stmtpos at /buildworker/worker/package_linux64/build/src/codegen.cpp:3801 [inlined]
emit_function at /buildworker/worker/package_linux64/build/src/codegen.cpp:6262
jl_compile_linfo at /buildworker/worker/package_linux64/build/src/codegen.cpp:1159
jl_fptr_trampoline at /buildworker/worker/package_linux64/build/src/gf.c:1774
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2162
find_inner_rows at /home/runner/work/DataFrames.jl/DataFrames.jl/src/join/core.jl:558
compose_inner_table at /home/runner/work/DataFrames.jl/DataFrames.jl/src/join/composer.jl:98

which seems a bug on Julia side.

The other problem is on x86 architecture where we get a problem of Int64.

What I did was:

change all Int64 to Int and UInt64 to UInt
disable precompilation if VERSION >= v"1.6" is false

and added information on what editing steps were made for the future.

What do you think?

CC @timholy - both the Julia 1.0 problem and the Int64 problem seem general for SnoopCompile.jl I think so this might be something you might want to have a look at. Thank you!

bkamins · 2021-04-13T17:56:58Z

The timings for joins are as follows (they uniformly improve).

innerjoin

0.22.7:

julia> using DataFrames

julia> name = DataFrame(ID = [1, 2, 3], Name = ["John Doe", "Jane Doe", "Joe Blogs"]);

julia> job = DataFrame(ID = [1, 2, 4], Job = ["Lawyer", "Doctor", "Farmer"]);
julia> @time innerjoin(name, job, on = :ID);
  1.463330 seconds (2.63 M allocations: 149.568 MiB, 2.21% gc time, 99.95% compilation time)

julia> @time innerjoin(name, job, on = :ID);
  0.000135 seconds (185 allocations: 13.750 KiB)

this PR:

julia> using DataFrames

julia> name = DataFrame(ID = [1, 2, 3], Name = ["John Doe", "Jane Doe", "Joe Blogs"]);

julia> job = DataFrame(ID = [1, 2, 4], Job = ["Lawyer", "Doctor", "Farmer"]);

julia> @time innerjoin(name, job, on = :ID);
  0.918935 seconds (894.04 k allocations: 51.921 MiB, 3.93% gc time, 99.93% compilation time)

julia> @time innerjoin(name, job, on = :ID);
  0.000099 seconds (151 allocations: 11.625 KiB)

outerjoin

0.22.7:

julia> using DataFrames

julia> name = DataFrame(ID = [1, 2, 3], Name = ["John Doe", "Jane Doe", "Joe Blogs"]);

julia> job = DataFrame(ID = [1, 2, 4], Job = ["Lawyer", "Doctor", "Farmer"]);

julia> @time outerjoin(name, job, on = :ID);
  1.836445 seconds (3.38 M allocations: 192.316 MiB, 2.37% gc time, 99.94% compilation time)

julia> @time outerjoin(name, job, on = :ID);
  0.000109 seconds (196 allocations: 14.734 KiB)

this PR:

julia> using DataFrames

julia> name = DataFrame(ID = [1, 2, 3], Name = ["John Doe", "Jane Doe", "Joe Blogs"]);

julia> job = DataFrame(ID = [1, 2, 4], Job = ["Lawyer", "Doctor", "Farmer"]);

julia> @time outerjoin(name, job, on = :ID);
  1.116747 seconds (1.24 M allocations: 72.672 MiB, 3.09% gc time, 99.91% compilation time)

julia> @time outerjoin(name, job, on = :ID);
  0.000130 seconds (222 allocations: 16.453 KiB)

crossjoin

0.22.7:

julia> using DataFrames

julia> name = DataFrame(ID = [1, 2, 3], Name = ["John Doe", "Jane Doe", "Joe Blogs"]);

julia> job = DataFrame(ID = [1, 2, 4], Job = ["Lawyer", "Doctor", "Farmer"]);

julia> @time crossjoin(name, job, makeunique=true);
  0.731172 seconds (2.15 M allocations: 142.263 MiB, 6.85% gc time, 99.95% compilation time)

julia> @time crossjoin(name, job, makeunique=true);
  0.000060 seconds (50 allocations: 3.719 KiB)

this PR:

julia> using DataFrames

julia> name = DataFrame(ID = [1, 2, 3], Name = ["John Doe", "Jane Doe", "Joe Blogs"]);

julia> job = DataFrame(ID = [1, 2, 4], Job = ["Lawyer", "Doctor", "Farmer"]);

julia> @time crossjoin(name, job, makeunique=true);
  0.523210 seconds (1.17 M allocations: 71.671 MiB, 7.16% gc time, 99.93% compilation time)

julia> @time crossjoin(name, job, makeunique=true);
  0.000058 seconds (47 allocations: 3.547 KiB)

bkamins · 2021-04-13T18:06:34Z

Some more complex select is better.

0.22.7:

julia> using DataFrames

julia> name = DataFrame(ID = [1, 2, 3], Name = ["John Doe", "Jane Doe", "Joe Blogs"]);

julia> @time select(name, :ID, :Name => ByRow(uppercase), );
  0.905749 seconds (3.04 M allocations: 192.974 MiB, 8.24% gc time)

julia> @time select(name, :ID, :Name => ByRow(uppercase), x -> x.Name);
  0.115613 seconds (216.37 k allocations: 13.594 MiB, 100.25% compilation time)

this PR:

julia> using DataFrames

julia> name = DataFrame(ID = [1, 2, 3], Name = ["John Doe", "Jane Doe", "Joe Blogs"]);

julia> @time select(name, :ID, :Name => ByRow(uppercase), );
  0.309512 seconds (374.96 k allocations: 22.441 MiB, 99.79% compilation time)

julia> @time select(name, :ID, :Name => ByRow(uppercase), x -> x.Name);
  0.067501 seconds (92.51 k allocations: 5.735 MiB, 101.94% compilation time)

in reshaping we do not have an improvement, but we can work on it in the future:

0.22.7

julia> using DataFrames

julia> wide = DataFrame(id = 1:6,
                        a  = repeat([1:3;], inner = [2]),
                        b  = repeat([1:2;], inner = [3]),
                        c  = 1.0:6.0, d  = 1.0:6.0);

julia> @time long = stack(wide);
  1.674097 seconds (4.89 M allocations: 323.914 MiB, 5.07% gc time, 99.88% compilation time)

julia> @time long = stack(wide);
  0.000068 seconds (86 allocations: 7.406 KiB)

julia> @time unstack(long);
  1.604386 seconds (3.99 M allocations: 232.169 MiB, 3.06% gc time, 99.94% compilation time)

julia> @time unstack(long);
  0.000105 seconds (177 allocations: 16.102 KiB)

this PR

julia> using DataFrames

julia> wide = DataFrame(id = 1:6,
                        a  = repeat([1:3;], inner = [2]),
                        b  = repeat([1:2;], inner = [3]),
                        c  = 1.0:6.0, d  = 1.0:6.0);

julia> @time long = stack(wide);
  1.375234 seconds (3.46 M allocations: 206.148 MiB, 3.79% gc time, 99.86% compilation time)

julia> @time long = stack(wide);
  0.000077 seconds (87 allocations: 7.422 KiB)

julia> @time unstack(long);
  1.900927 seconds (3.22 M allocations: 185.532 MiB, 2.10% gc time, 99.89% compilation time)

julia> @time unstack(long);
  0.000155 seconds (247 allocations: 17.891 KiB)

bkamins · 2021-04-13T18:11:52Z

@nalimilan - this is where we have a problem. In split-apply-combine the first call in fast path is more expensive (but other are on par). Also gorupby is more expensive unfortunately. However, given the increased complexity we have there probably this cannot be helped:

fast path in split-apply combine

0.22.7

julia> using DataFrames

julia> df = DataFrame(a=1);

julia> @time gdf = groupby(df, :a);
  0.350983 seconds (201.17 k allocations: 11.875 MiB, 6.47% gc time, 99.98% compilation time)

julia> @time gdf = groupby(df, :a);
  0.000018 seconds (24 allocations: 2.344 KiB)

julia> @time combine(gdf, :a => sum);
  1.127700 seconds (1.46 M allocations: 87.028 MiB, 2.59% gc time, 60.86% compilation time)

julia> @time combine(gdf, :a => sum);
  0.000092 seconds (159 allocations: 12.531 KiB)

this PR

julia> using DataFrames

julia> df = DataFrame(a=1);

julia> @time gdf = groupby(df, :a);
  0.806360 seconds (1.48 M allocations: 91.026 MiB, 4.77% gc time, 99.90% compilation time)

julia> @time gdf = groupby(df, :a);
  0.000052 seconds (54 allocations: 3.203 KiB)

julia> @time combine(gdf, :a => sum);
  1.657268 seconds (3.39 M allocations: 213.557 MiB, 2.80% gc time, 99.94% compilation time)

julia> @time combine(gdf, :a => sum);
  0.000134 seconds (200 allocations: 14.312 KiB)

normal path in split-apply-combine

0.22.7

julia> using DataFrames

julia> df = DataFrame(a=1,b=1);

julia> @time gdf = groupby(df, 1:2);
  0.477830 seconds (701.55 k allocations: 41.500 MiB, 5.44% gc time, 99.99% compilation time)

julia> @time gdf = groupby(df, 1:2);
  0.000020 seconds (23 allocations: 2.266 KiB)

julia> @time combine(gdf, :a => x -> x);
  1.701886 seconds (3.29 M allocations: 204.798 MiB, 3.20% gc time, 38.33% compilation time)

julia> @time combine(gdf, :a => x -> x);
  0.104231 seconds (256.41 k allocations: 15.384 MiB, 99.63% compilation time)

this PR

julia> using DataFrames

julia> df = DataFrame(a=1,b=1);

julia> @time gdf = groupby(df, 1:2);
  1.083430 seconds (2.75 M allocations: 164.270 MiB, 5.27% gc time, 99.92% compilation time)

julia> @time gdf = groupby(df, 1:2);
  0.000054 seconds (70 allocations: 3.719 KiB)

julia> @time combine(gdf, :a => x -> x);
  1.804282 seconds (3.68 M allocations: 230.970 MiB, 2.59% gc time, 100.00% compilation time)

julia> @time combine(gdf, :a => x -> x);
  0.083977 seconds (46.77 k allocations: 2.755 MiB, 99.17% compilation time)

nalimilan · 2021-04-14T07:19:12Z

src/other/precompile.jl

+# * disabling precompilation on Julia older than 1.6
 function precompile(all=false)
+    VERSION >= v"1.6" || return nothing


Maybe let's check that this works on 1.5? Better avoid regressions for people who haven't moved to 1.6 yet.

Suggested change

# * disabling precompilation on Julia older than 1.6

function precompile(all=false)

VERSION >= v"1.6" || return nothing

# * disabling precompilation on Julia older than 1.5

function precompile(all=false)

VERSION >= v"1.5" || return nothing

Also, maybe the crash on 1.0.5 is due to a particular precompile call? IIRC the materialize! line above was there to remove a line that created issues on 1.0 too.

yes, but the difference was that materialize! was crashing the call to precompile function. Here we have a problem that the crash happens after precompilation - during actual function call Julia compiler gets confused. I am OK to turn on 1.5 though. I will test if all works on 1.5.

nalimilan · 2021-04-14T07:24:24Z

@nalimilan - this is where we have a problem. In split-apply-combine the first call in fast path is more expensive (but other are on par). Also gorupby is more expensive unfortunately. However, given the increased complexity we have there probably this cannot be helped:

Too bad. Yet I see several row_group_slots methods precompiled with IntegerRefPool arguments. But since precompilation currently doesn't cover LLVM compilation, we probably still pay a significant price.

nalimilan · 2021-04-14T07:28:06Z

We could take the fast path only when the number of rows is large, just like we do for threads. But it's not totally obvious why the fallback should have a lower overhead, so some experimentation is needed.

bkamins · 2021-04-14T07:28:10Z

But since precompilation currently doesn't cover LLVM compilation, we probably still pay a significant price.

This is the reason I assume.

Co-authored-by: Milan Bouchet-Valat <[email protected]>

bkamins · 2021-04-14T07:32:13Z

We could take the fast path only when the number of rows is large, just like we do for threads.

If I understand the situation correctly number of rows is not a compile time constant, so the compilation will happen always no matter how many rows we have and no matter what column types we have.

bkamins · 2021-04-14T10:13:22Z

I have checked Julia 1.5.4 both single and multi-thereaded on Win10 and all was OK. However, given the problems we had I would merge this sooner than later and ask users to do a bit of testing on different platforms.

nalimilan · 2021-04-14T10:34:24Z

If I understand the situation correctly number of rows is not a compile time constant, so the compilation will happen always no matter how many rows we have and no matter what column types we have.

Yeah, but maybe there's a way to prevent Julia from compiling all functions that may be called until that's actually the case? Not sure.

bkamins · 2021-04-14T10:42:36Z

Yeah, but maybe there's a way to prevent Julia from compiling all functions that may be called until that's actually the case?

It would be possible if the compiler could prove that it is impossible to have some function called. If e.g. we had DataFrame type stable and if it would have information about number of rows in its type signature like DataFrame{NROW, NamedTuple{COLS}} then it would be possible.

What I was fighting for with despecialization is to make sure that - where possible - only one method gets compiled for each function (as before despecialization many were compiled for various variants of possible argument types).

OK - I am merging this to allow for user testing. Thank you!

update precompile

faacfcf

bkamins requested a review from nalimilan April 13, 2021 14:46

bkamins added the performance label Apr 13, 2021

bkamins added this to the 1.0 milestone Apr 13, 2021

bkamins linked an issue Apr 13, 2021 that may be closed by this pull request

Re-generate precompile statemenst before 1.0 release #2642

Closed

nalimilan reviewed Apr 13, 2021

View reviewed changes

disable precompliation on Julia prior to 1.6 and use Int instead of I…

cf08591

…nt64

nalimilan reviewed Apr 14, 2021

View reviewed changes

Update src/other/precompile.jl

7ff5c43

Co-authored-by: Milan Bouchet-Valat <[email protected]>

nalimilan approved these changes Apr 14, 2021

View reviewed changes

bkamins merged commit 3b530e0 into main Apr 14, 2021

bkamins deleted the bk/precompilation branch April 14, 2021 10:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update precompilation statements #2718

update precompilation statements #2718

bkamins commented Apr 13, 2021

bkamins commented Apr 13, 2021

nalimilan left a comment

nalimilan Apr 13, 2021

bkamins Apr 13, 2021

bkamins commented Apr 13, 2021

bkamins commented Apr 13, 2021

bkamins commented Apr 13, 2021

bkamins commented Apr 13, 2021

nalimilan Apr 14, 2021

bkamins Apr 14, 2021

nalimilan commented Apr 14, 2021

nalimilan commented Apr 14, 2021

bkamins commented Apr 14, 2021

bkamins commented Apr 14, 2021

bkamins commented Apr 14, 2021

nalimilan commented Apr 14, 2021

bkamins commented Apr 14, 2021

update precompilation statements #2718

update precompilation statements #2718

Conversation

bkamins commented Apr 13, 2021

bkamins commented Apr 13, 2021

nalimilan left a comment

Choose a reason for hiding this comment

nalimilan Apr 13, 2021

Choose a reason for hiding this comment

bkamins Apr 13, 2021

Choose a reason for hiding this comment

bkamins commented Apr 13, 2021

bkamins commented Apr 13, 2021

innerjoin

outerjoin

crossjoin

bkamins commented Apr 13, 2021

bkamins commented Apr 13, 2021

fast path in split-apply combine

normal path in split-apply-combine

nalimilan Apr 14, 2021

Choose a reason for hiding this comment

bkamins Apr 14, 2021

Choose a reason for hiding this comment

nalimilan commented Apr 14, 2021

nalimilan commented Apr 14, 2021

bkamins commented Apr 14, 2021

bkamins commented Apr 14, 2021

bkamins commented Apr 14, 2021

nalimilan commented Apr 14, 2021

bkamins commented Apr 14, 2021