-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explicit loop in _findall
to avoid allocations
#2771
Conversation
This is strange. @mbauman - broadcasting in these cases should be fast - right? The only potential benefit should be that we avoid extra checking of dimensions of the broadcasted objects, but no allocations should be made I think. Still we might want to remove it to reduce compilation latency, but I would like to understand the reason of the problem with broadcasting. |
As problem occurred when dealing with trues blocks, here are benchmarks that we see difference in: BD = OrderedDict(
"TFT small" => [trues(85); falses(100); trues(85)],
"FTFFT small" => [falses(64 + 32); trues(32); falses(128); trues(32)],
"TFTF small" => [falses(64); trues(64); falses(64); trues(64)],
"TFT small2" => [trues(64); falses(10); trues(100)],
"TFT Big" => [trues(8500); falses(100000); trues(65000)],
"FTFTFTF Big" => [falses(65000); trues(65000); falses(65000); trues(65000); falses(65000); trues(65000); falses(65000)],
"FTFR small" => [falses(85); trues(100); falses(65); rand([true, false], 20)],
"RT Big" => [BitVector(rand([true, false], 100000)) ; trues(100000)],
"FTFR Big" => [falses(65000); trues(65000); falses(65000); rand([true, false], 20000)],
"T256 R100" => [trues(256); rand([true, false], 100)],
);
for (l, B) in BD
println("Processing $l")
@show Base.findall(B) == DataFrames._findall(B)
@btime Base.findall($B)
@btime DataFrames._findall($B)
end This PR:
Main:
|
It is weird for me as I checked at some point number of allocations with |
|
_findall
to avoid allocations_findall
to avoid allocations
Thank you. I have reviewed it again and all looks good. |
Thanks, do we know what is the cause of allocations? |
No. It is very strange. If I use Revise.jl and modify the body of the function in an unrelated place then in your old code I go from:
to 1 allocation (as in your code with broadcasting replaced by a loop). So it seems that for some reason the compiler gets confused and is not generating optimal code. @vtjnash - would you be interested in reporting the details? (for our use case it is a minor issue so merged the PR, but maybe for some reason you would find it useful to have reproduced) |
While testing #2724 and looking at #2769 benchmarks I've seen that sometimes
_findall
allocates a lot and has some weird overhead in memory. After digging in, changingI[i:i+64-1] .= i1:(i1+64-1)
into
removed unnecessary allocations, and sped up execution a bit.
I'll post benchmarks tommorow.