-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor readcsv speed #16015
Comments
This is a well-known problem. See https://github.com/JuliaDB/CSV.jl for a faster alternative. |
Thanks. So I'm closing. |
julia> @time df=CSV.read("AUDUSD-2014-01.csv");
0.168033 seconds (17.87 k allocations: 85.767 MB, 5.83% gc time) That's much better... I wonder if DataFrames.jl readtable shouldn't use it ? |
Oh... I did wrong... julia> df
89103612-element Array{UInt8,1}:
0x41
0x55
0x44
0x2f
0x55
0x53
0x44
0x2c
0x32
0x30
0x31
0x34
0x30
0x31
0x30
⋮
0x2e
0x38
0x37
0x35
0x33
0x31
0x2c
0x30
0x2e
0x38
0x37
0x35
0x37
0x34
0x0a |
See this blog post about how to use it: http://julialang.org/blog/2015/10/datastreams Anyway, I think the plan is to use it when it's ready, but for now the data management packages are quite in flux. |
Thanks |
Correction: I was talking about DataFrames.jl, but I'm not sure what's the plan as regards Julia Base. |
@ViralBShah Since you noted in the other issue that |
I should clarify that I didn't test pandas on my machine and was using 5 sec as the benchmark, but this is what I see. Perhaps if we can make this more efficient wrt GC, this could be faster.
|
Pandas is almost twice as fast as reported:
|
I get: >>> timeit.timeit("df=pandas.read_csv('/Users/jacobquinn/Downloads/AUDUSD-2014-01.csv', names=['Symbol', 'Date', 'Bid', 'Ask'])",setup='import pandas',number=1)
1.7587840557098389 julia> @time CSV.read("/Users/jacobquinn/Downloads/AUDUSD-2014-01.csv"; header=["Symbol","Date","Bid","Ask"]);
1.809735 seconds (12.13 M allocations: 323.253 MB, 9.30% gc time) (note this is CSV master) |
Duplicate of #10428 |
Should we be asking people to start using |
Hello,
I try to read 1 month of tick data of AUD/USD
Sample data can be found here
https://drive.google.com/file/d/0B8iUtWjZOTqla3ZZTC1FS0pkZXc/view?usp=sharing
see also pydata/pandas-datareader#153
AUDUSD-2014-01.zip
is a 11M file and containsAUDUSD-2014-01.csv
which is a 85M filewhich is not so big!
With Python / Pandas
With Julia / readcsv
It's even worse with DataFrames.jl
readtable
see JuliaData/DataFrames.jl#942
Kind regards
The text was updated successfully, but these errors were encountered: