-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
readdlm() performance problems #10384
Comments
The comparison with the binary representation isn't really fair. Have you tried reading the CSV data using another program instead? That will give a better idea of the improvements that can be achieved in the Julia implementation. (FWIW, if you're looking for a good competitor, the R data.table package contains a Other than that, have you tried specifying the types of the columns? The |
I converted to binary using This is Julia "Version 0.3.5 (2015-01-08 22:33 UTC)" on (from |
The real problem is whether the parser attempts to load the whole thing into memory. With a file that large, you either need a really big RAM machine or you need to do incremental parsing. |
My guess is the new GC in 0.4 will help a lot with this. |
Cc: @tanmaykm |
@theran How much RAM does your computer have? |
I tried a 900MB file and readdlm is about the same in 0.3 and 0.4. However writedlm seems to be significantly slower in 0.4. Not good! |
Ah, I think much of the writedlm regression is due to #8972. |
I think specifying the Also, since the column type is known, specifying the type to |
We don't really need both this and #10428 open. The latter has more numbers and is specific to master, so let's close this one. |
@ViralBShah I think 1T but it might be 2T. (This is a fat node on my department's cluster.) |
I have a 42.5G CSV file with 10 columns of doubles. When I try
readdlm(filename,';')
, julia eventually allocates over 150G and doesn't finish reading in 45 minutes or so. For comparison, on this machine, reading the same data from the binary representation takes around 3 minutes.The text was updated successfully, but these errors were encountered: