Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing Float32 much slower than Float64 #117

Closed
jaakkor2 opened this issue Nov 27, 2017 · 4 comments
Closed

Parsing Float32 much slower than Float64 #117

jaakkor2 opened this issue Nov 27, 2017 · 4 comments

Comments

@jaakkor2
Copy link

On CSV v0.2.0, Julia v0.6.0, parsing Float32's is more than 2x slower than parsing Float64's. Also, if some of the types is Int32, it seems to be faster to parse all as Float64 and then convert into Int32.

using DataFrames, CSV, BenchmarkTools
aa = randn(Float32, 100_000, 10)
fn_debug = "debug.csv"
CSV.write(fn_debug, convert(DataFrame, aa))
types64_debug = [Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64]
types32_debug = [Float32, Float32, Float32, Float32, Float32, Float32, Float32, Float32, Float32, Float32]
@btime data_csv32 = CSV.read(fn_debug, types=types32_debug)
#  192.573 ms (8000519 allocations: 125.91 MiB)
@btime data_csv64 = CSV.read(fn_debug, types=types64_debug, nullable=false)
#  93.242 ms (1000522 allocations: 22.91 MiB)
@btime data_csv64 = CSV.read(fn_debug, types=types64_debug)
#  81.459 ms (1000519 allocations: 22.91 MiB)
@nalimilan
Copy link
Member

Thanks for the report. Smells like a type instability issue somewhere in the code.

@jaakkor2
Copy link
Author

Closing the issue, on Julia v1.0.0 and CSV v0.3.0 there is no practical difference between parsing Float32's and Float64's.

julia> @btime data_csv32 = CSV.read(fn_debug, types=types32_debug)
  68.506 ms (1000781 allocations: 19.11 MiB)
julia> @btime data_csv64 = CSV.read(fn_debug, types=types64_debug, allowmissing=:none)
  66.618 ms (1000786 allocations: 22.92 MiB)
julia> @btime data_csv64 = CSV.read(fn_debug, types=types64_debug)
  66.417 ms (1000781 allocations: 22.92 MiB)

In the original example, CSV.write errors, opened an issue #237

@jaakkor2
Copy link
Author

Julia v1.0.0, CSV v0.4.1 regressed a lot. Reading Float32s is 18x slower than v0.3.0.

julia> @btime data_csv32 = CSV.read(fn_debug, types=types32_debug)
  1.205 s (11567462 allocations: 213.71 MiB)
julia> @btime data_csv64 = CSV.read(fn_debug, types=types64_debug)
  112.069 ms (100336 allocations: 23.11 MiB)

@jaakkor2 jaakkor2 reopened this Sep 27, 2018
@quinnj
Copy link
Member

quinnj commented Sep 27, 2018

It's not an issue w/ all Floats, only full-precision (see comment here); closing as duplicate of JuliaData/Parsers.jl#5

@quinnj quinnj closed this as completed Sep 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants