Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed of filter #3208

Closed
gugatr0n1c opened this issue Oct 26, 2022 · 1 comment
Closed

Speed of filter #3208

gugatr0n1c opened this issue Oct 26, 2022 · 1 comment

Comments

@gugatr0n1c
Copy link

gugatr0n1c commented Oct 26, 2022

Hello,

using julia 1.8.2, DataFrames 1.4.1

Suppose this code:

import Random
using BenchmarkTools
using DataFrames

sdf = DataFrame()
sdf[!, "SOME_STR"] = [Random.randstring() for i in 1:1_000_000]
sdf[!, "SOME_FLT"] = [Random.rand() for i in 1:1_000_000]

@btime filter(row -> occursin("E8", row.SOME_STR), sdf); # ---> 126.400 ms
@btime sdf[occursin.("E8", sdf[!, "SOME_STR"]), :]; # ---> 56.947 ms

@btime filter(row -> row.SOME_FLT > 0.5, sdf); # ---> 126.400 ms
@btime sdf[sdf[!, "SOME_FLT"] .> 0.5, :]; # --->  4.093 ms

Actually, I like code with filter more, but speed in some cases is much worse. Am I doing something wrong? Or filter does much more under the hood that cause this speed regression?

Cheers,
Lubo

@bkamins
Copy link
Member

bkamins commented Oct 26, 2022

This is a standard way to do it using filter:

julia> @btime sdf[occursin.("E8", sdf[!, "SOME_STR"]), :];
  50.419 ms (2000021 allocations: 122.24 MiB)

julia> @btime filter("SOME_STR" => contains("E8"), sdf);
  50.332 ms (2000023 allocations: 122.24 MiB)

julia> @btime sdf[sdf[!, "SOME_FLT"] .> 0.5, :];
  5.831 ms (23 allocations: 11.58 MiB)

julia> @btime filter("SOME_FLT" => >(0.5), sdf);
  5.871 ms (26 allocations: 11.58 MiB)

The style of filter you use is accepted because it is convenient, but it is not type stable, so it is expected to be slower.

@bkamins bkamins closed this as completed Oct 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants