Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update precompilation statements #2718

Merged
merged 3 commits into from
Apr 14, 2021
Merged

update precompilation statements #2718

merged 3 commits into from
Apr 14, 2021

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Apr 13, 2021

Here is the impact:

0.22.7:

julia> using DataFrames

julia> @time df = DataFrame(some_col=1);
  0.066630 seconds (124.23 k allocations: 7.668 MiB)

julia> @time combine(df, :some_col => x -> x);
  0.748427 seconds (2.63 M allocations: 168.141 MiB, 8.86% gc time, 2.41% compilation time)

julia> @time combine(df, :some_col => x -> x);
  0.062810 seconds (80.87 k allocations: 5.105 MiB, 99.50% compilation time)

main:

julia> using DataFrames

julia> @time df = DataFrame(some_col=1);
  0.079420 seconds (114.59 k allocations: 6.918 MiB, 26.56% gc time, 99.84% compilation time)

julia> @time combine(df, :some_col => x -> x);
  0.459968 seconds (1.15 M allocations: 72.700 MiB, 4.77% gc time, 100.20% compilation time)

julia> @time combine(df, :some_col => x -> x);
  0.024984 seconds (6.70 k allocations: 446.336 KiB, 98.04% compilation time)

this PR:

julia> using DataFrames

julia> @time df = DataFrame(some_col=1);
  0.046273 seconds (114.70 k allocations: 6.926 MiB, 99.77% compilation time)

julia> @time combine(df, :some_col => x -> x);
  0.245429 seconds (213.39 k allocations: 12.944 MiB, 100.38% compilation time)

julia> @time combine(df, :some_col => x -> x);
  0.025664 seconds (6.70 k allocations: 446.336 KiB, 97.91% compilation time)

Of course more testing is welcome (and I will report additional results during the week).

CC @pdeffebach as this affects DataFramesMeta.jl

@bkamins bkamins requested a review from nalimilan April 13, 2021 14:46
@bkamins
Copy link
Member Author

bkamins commented Apr 13, 2021

@nalimilan - as you can see after despecialization + hand editing the file is much smaller.
The thing I did in particular is that I removed all precompile statements that depended on specific column name sets passed like e.g. [:x, :y] or [:a, :b, :c], which are common in tests, but probably not seen in practice. The only case I left is :x1 which is auto generated in aggregation.

@bkamins bkamins added this to the 1.0 milestone Apr 13, 2021
@bkamins bkamins linked an issue Apr 13, 2021 that may be closed by this pull request
Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

Comment on lines -10 to -14
# sed -i '/categorical/di' src/other/precompile_tmp.jl # Remove CategoricalArrays uses
# sed -i '/var"/d' src/other/precompile_tmp.jl # var"" is not supported on old Julia versions
# sed -i '/materialize!/d' src/other/precompile_tmp.jl # Work around an inference bug
# sed -i '/setindex_widen_up_to/d' src/other/precompile_tmp.jl # Not present on Julia 1.0
# sed -i '/restart_copyto_nonleaf!/d' src/other/precompile_tmp.jl # Not present on Julia 1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you didn't use these at all? Any step that can limit the amount of hand editing is good to have.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of signatures that were generated by SnoopCompile.jl matched any of these rules 😄. Things change fast. (ah - maybe there were some var" ones)

But I see that something is failing on 1.0 - I will check what.

@bkamins
Copy link
Member Author

bkamins commented Apr 13, 2021

OK. So we have two problems. On Julia 1.0.5 we have the following error:

signal (11): Segmentation fault
in expression starting at /home/runner/work/DataFrames.jl/DataFrames.jl/test/join.jl:1017
jl_compile_linfo at /buildworker/worker/package_linux64/build/src/codegen.cpp:1191
emit_invoke at /buildworker/worker/package_linux64/build/src/codegen.cpp:3094
emit_expr at /buildworker/worker/package_linux64/build/src/codegen.cpp:3893
emit_ssaval_assign at /buildworker/worker/package_linux64/build/src/codegen.cpp:3615
emit_stmtpos at /buildworker/worker/package_linux64/build/src/codegen.cpp:3801 [inlined]
emit_function at /buildworker/worker/package_linux64/build/src/codegen.cpp:6262
jl_compile_linfo at /buildworker/worker/package_linux64/build/src/codegen.cpp:1159
jl_fptr_trampoline at /buildworker/worker/package_linux64/build/src/gf.c:1774
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2162
find_inner_rows at /home/runner/work/DataFrames.jl/DataFrames.jl/src/join/core.jl:558
compose_inner_table at /home/runner/work/DataFrames.jl/DataFrames.jl/src/join/composer.jl:98

which seems a bug on Julia side.

The other problem is on x86 architecture where we get a problem of Int64.

What I did was:

  • change all Int64 to Int and UInt64 to UInt
  • disable precompilation if VERSION >= v"1.6" is false

and added information on what editing steps were made for the future.

What do you think?

CC @timholy - both the Julia 1.0 problem and the Int64 problem seem general for SnoopCompile.jl I think so this might be something you might want to have a look at. Thank you!

@bkamins
Copy link
Member Author

bkamins commented Apr 13, 2021

The timings for joins are as follows (they uniformly improve).

innerjoin

0.22.7:

julia> using DataFrames

julia> name = DataFrame(ID = [1, 2, 3], Name = ["John Doe", "Jane Doe", "Joe Blogs"]);

julia> job = DataFrame(ID = [1, 2, 4], Job = ["Lawyer", "Doctor", "Farmer"]);
julia> @time innerjoin(name, job, on = :ID);
  1.463330 seconds (2.63 M allocations: 149.568 MiB, 2.21% gc time, 99.95% compilation time)

julia> @time innerjoin(name, job, on = :ID);
  0.000135 seconds (185 allocations: 13.750 KiB)

this PR:

julia> using DataFrames

julia> name = DataFrame(ID = [1, 2, 3], Name = ["John Doe", "Jane Doe", "Joe Blogs"]);

julia> job = DataFrame(ID = [1, 2, 4], Job = ["Lawyer", "Doctor", "Farmer"]);

julia> @time innerjoin(name, job, on = :ID);
  0.918935 seconds (894.04 k allocations: 51.921 MiB, 3.93% gc time, 99.93% compilation time)

julia> @time innerjoin(name, job, on = :ID);
  0.000099 seconds (151 allocations: 11.625 KiB)

outerjoin

0.22.7:

julia> using DataFrames

julia> name = DataFrame(ID = [1, 2, 3], Name = ["John Doe", "Jane Doe", "Joe Blogs"]);

julia> job = DataFrame(ID = [1, 2, 4], Job = ["Lawyer", "Doctor", "Farmer"]);

julia> @time outerjoin(name, job, on = :ID);
  1.836445 seconds (3.38 M allocations: 192.316 MiB, 2.37% gc time, 99.94% compilation time)

julia> @time outerjoin(name, job, on = :ID);
  0.000109 seconds (196 allocations: 14.734 KiB)

this PR:

julia> using DataFrames

julia> name = DataFrame(ID = [1, 2, 3], Name = ["John Doe", "Jane Doe", "Joe Blogs"]);

julia> job = DataFrame(ID = [1, 2, 4], Job = ["Lawyer", "Doctor", "Farmer"]);

julia> @time outerjoin(name, job, on = :ID);
  1.116747 seconds (1.24 M allocations: 72.672 MiB, 3.09% gc time, 99.91% compilation time)

julia> @time outerjoin(name, job, on = :ID);
  0.000130 seconds (222 allocations: 16.453 KiB)

crossjoin

0.22.7:

julia> using DataFrames

julia> name = DataFrame(ID = [1, 2, 3], Name = ["John Doe", "Jane Doe", "Joe Blogs"]);

julia> job = DataFrame(ID = [1, 2, 4], Job = ["Lawyer", "Doctor", "Farmer"]);

julia> @time crossjoin(name, job, makeunique=true);
  0.731172 seconds (2.15 M allocations: 142.263 MiB, 6.85% gc time, 99.95% compilation time)

julia> @time crossjoin(name, job, makeunique=true);
  0.000060 seconds (50 allocations: 3.719 KiB)

this PR:

julia> using DataFrames

julia> name = DataFrame(ID = [1, 2, 3], Name = ["John Doe", "Jane Doe", "Joe Blogs"]);

julia> job = DataFrame(ID = [1, 2, 4], Job = ["Lawyer", "Doctor", "Farmer"]);

julia> @time crossjoin(name, job, makeunique=true);
  0.523210 seconds (1.17 M allocations: 71.671 MiB, 7.16% gc time, 99.93% compilation time)

julia> @time crossjoin(name, job, makeunique=true);
  0.000058 seconds (47 allocations: 3.547 KiB)

@bkamins
Copy link
Member Author

bkamins commented Apr 13, 2021

Some more complex select is better.

0.22.7:

julia> using DataFrames

julia> name = DataFrame(ID = [1, 2, 3], Name = ["John Doe", "Jane Doe", "Joe Blogs"]);

julia> @time select(name, :ID, :Name => ByRow(uppercase), );
  0.905749 seconds (3.04 M allocations: 192.974 MiB, 8.24% gc time)

julia> @time select(name, :ID, :Name => ByRow(uppercase), x -> x.Name);
  0.115613 seconds (216.37 k allocations: 13.594 MiB, 100.25% compilation time)

this PR:

julia> using DataFrames

julia> name = DataFrame(ID = [1, 2, 3], Name = ["John Doe", "Jane Doe", "Joe Blogs"]);

julia> @time select(name, :ID, :Name => ByRow(uppercase), );
  0.309512 seconds (374.96 k allocations: 22.441 MiB, 99.79% compilation time)

julia> @time select(name, :ID, :Name => ByRow(uppercase), x -> x.Name);
  0.067501 seconds (92.51 k allocations: 5.735 MiB, 101.94% compilation time)

in reshaping we do not have an improvement, but we can work on it in the future:

0.22.7

julia> using DataFrames

julia> wide = DataFrame(id = 1:6,
                        a  = repeat([1:3;], inner = [2]),
                        b  = repeat([1:2;], inner = [3]),
                        c  = 1.0:6.0, d  = 1.0:6.0);

julia> @time long = stack(wide);
  1.674097 seconds (4.89 M allocations: 323.914 MiB, 5.07% gc time, 99.88% compilation time)

julia> @time long = stack(wide);
  0.000068 seconds (86 allocations: 7.406 KiB)

julia> @time unstack(long);
  1.604386 seconds (3.99 M allocations: 232.169 MiB, 3.06% gc time, 99.94% compilation time)

julia> @time unstack(long);
  0.000105 seconds (177 allocations: 16.102 KiB)

this PR

julia> using DataFrames

julia> wide = DataFrame(id = 1:6,
                        a  = repeat([1:3;], inner = [2]),
                        b  = repeat([1:2;], inner = [3]),
                        c  = 1.0:6.0, d  = 1.0:6.0);

julia> @time long = stack(wide);
  1.375234 seconds (3.46 M allocations: 206.148 MiB, 3.79% gc time, 99.86% compilation time)

julia> @time long = stack(wide);
  0.000077 seconds (87 allocations: 7.422 KiB)

julia> @time unstack(long);
  1.900927 seconds (3.22 M allocations: 185.532 MiB, 2.10% gc time, 99.89% compilation time)

julia> @time unstack(long);
  0.000155 seconds (247 allocations: 17.891 KiB)

@bkamins
Copy link
Member Author

bkamins commented Apr 13, 2021

@nalimilan - this is where we have a problem. In split-apply-combine the first call in fast path is more expensive (but other are on par). Also gorupby is more expensive unfortunately. However, given the increased complexity we have there probably this cannot be helped:

fast path in split-apply combine

0.22.7

julia> using DataFrames

julia> df = DataFrame(a=1);

julia> @time gdf = groupby(df, :a);
  0.350983 seconds (201.17 k allocations: 11.875 MiB, 6.47% gc time, 99.98% compilation time)

julia> @time gdf = groupby(df, :a);
  0.000018 seconds (24 allocations: 2.344 KiB)

julia> @time combine(gdf, :a => sum);
  1.127700 seconds (1.46 M allocations: 87.028 MiB, 2.59% gc time, 60.86% compilation time)

julia> @time combine(gdf, :a => sum);
  0.000092 seconds (159 allocations: 12.531 KiB)

this PR

julia> using DataFrames

julia> df = DataFrame(a=1);

julia> @time gdf = groupby(df, :a);
  0.806360 seconds (1.48 M allocations: 91.026 MiB, 4.77% gc time, 99.90% compilation time)

julia> @time gdf = groupby(df, :a);
  0.000052 seconds (54 allocations: 3.203 KiB)

julia> @time combine(gdf, :a => sum);
  1.657268 seconds (3.39 M allocations: 213.557 MiB, 2.80% gc time, 99.94% compilation time)

julia> @time combine(gdf, :a => sum);
  0.000134 seconds (200 allocations: 14.312 KiB)

normal path in split-apply-combine

0.22.7

julia> using DataFrames

julia> df = DataFrame(a=1,b=1);

julia> @time gdf = groupby(df, 1:2);
  0.477830 seconds (701.55 k allocations: 41.500 MiB, 5.44% gc time, 99.99% compilation time)

julia> @time gdf = groupby(df, 1:2);
  0.000020 seconds (23 allocations: 2.266 KiB)

julia> @time combine(gdf, :a => x -> x);
  1.701886 seconds (3.29 M allocations: 204.798 MiB, 3.20% gc time, 38.33% compilation time)

julia> @time combine(gdf, :a => x -> x);
  0.104231 seconds (256.41 k allocations: 15.384 MiB, 99.63% compilation time)

this PR

julia> using DataFrames

julia> df = DataFrame(a=1,b=1);

julia> @time gdf = groupby(df, 1:2);
  1.083430 seconds (2.75 M allocations: 164.270 MiB, 5.27% gc time, 99.92% compilation time)

julia> @time gdf = groupby(df, 1:2);
  0.000054 seconds (70 allocations: 3.719 KiB)

julia> @time combine(gdf, :a => x -> x);
  1.804282 seconds (3.68 M allocations: 230.970 MiB, 2.59% gc time, 100.00% compilation time)

julia> @time combine(gdf, :a => x -> x);
  0.083977 seconds (46.77 k allocations: 2.755 MiB, 99.17% compilation time)

Comment on lines 15 to 17
# * disabling precompilation on Julia older than 1.6
function precompile(all=false)
VERSION >= v"1.6" || return nothing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe let's check that this works on 1.5? Better avoid regressions for people who haven't moved to 1.6 yet.

Suggested change
# * disabling precompilation on Julia older than 1.6
function precompile(all=false)
VERSION >= v"1.6" || return nothing
# * disabling precompilation on Julia older than 1.5
function precompile(all=false)
VERSION >= v"1.5" || return nothing

Also, maybe the crash on 1.0.5 is due to a particular precompile call? IIRC the materialize! line above was there to remove a line that created issues on 1.0 too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but the difference was that materialize! was crashing the call to precompile function. Here we have a problem that the crash happens after precompilation - during actual function call Julia compiler gets confused. I am OK to turn on 1.5 though. I will test if all works on 1.5.

@nalimilan
Copy link
Member

@nalimilan - this is where we have a problem. In split-apply-combine the first call in fast path is more expensive (but other are on par). Also gorupby is more expensive unfortunately. However, given the increased complexity we have there probably this cannot be helped:

Too bad. Yet I see several row_group_slots methods precompiled with IntegerRefPool arguments. But since precompilation currently doesn't cover LLVM compilation, we probably still pay a significant price.

@nalimilan
Copy link
Member

We could take the fast path only when the number of rows is large, just like we do for threads. But it's not totally obvious why the fallback should have a lower overhead, so some experimentation is needed.

@bkamins
Copy link
Member Author

bkamins commented Apr 14, 2021

But since precompilation currently doesn't cover LLVM compilation, we probably still pay a significant price.

This is the reason I assume.

Co-authored-by: Milan Bouchet-Valat <[email protected]>
@bkamins
Copy link
Member Author

bkamins commented Apr 14, 2021

We could take the fast path only when the number of rows is large, just like we do for threads.

If I understand the situation correctly number of rows is not a compile time constant, so the compilation will happen always no matter how many rows we have and no matter what column types we have.

@bkamins
Copy link
Member Author

bkamins commented Apr 14, 2021

I have checked Julia 1.5.4 both single and multi-thereaded on Win10 and all was OK. However, given the problems we had I would merge this sooner than later and ask users to do a bit of testing on different platforms.

@nalimilan
Copy link
Member

If I understand the situation correctly number of rows is not a compile time constant, so the compilation will happen always no matter how many rows we have and no matter what column types we have.

Yeah, but maybe there's a way to prevent Julia from compiling all functions that may be called until that's actually the case? Not sure.

@bkamins
Copy link
Member Author

bkamins commented Apr 14, 2021

Yeah, but maybe there's a way to prevent Julia from compiling all functions that may be called until that's actually the case?

It would be possible if the compiler could prove that it is impossible to have some function called. If e.g. we had DataFrame type stable and if it would have information about number of rows in its type signature like DataFrame{NROW, NamedTuple{COLS}} then it would be possible.

What I was fighting for with despecialization is to make sure that - where possible - only one method gets compiled for each function (as before despecialization many were compiled for various variants of possible argument types).

OK - I am merging this to allow for user testing. Thank you!

@bkamins bkamins merged commit 3b530e0 into main Apr 14, 2021
@bkamins bkamins deleted the bk/precompilation branch April 14, 2021 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Re-generate precompile statemenst before 1.0 release
2 participants