Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the best way to write large DataFrames efficiently and with high performance in Julia while minimizing memory usage? #3406

Closed
Ujjwal4CULS opened this issue Dec 6, 2023 · 4 comments

Comments

@Ujjwal4CULS
Copy link

No description provided.

@bkamins
Copy link
Member

bkamins commented Dec 6, 2023

Writing a data frame is not a part of DataFrames.jl functionality. I would assume that serializing it using the Serialization module should use least memory to write data (I assume this is what you ask for).

@Ujjwal4CULS
Copy link
Author

Which data format is the most efficient in Julia? The .arrow format consumes a lot of memory. For example, in R, the .fst format is considered the best for efficient memory usage and high performance. Similarly, in Julia, which format is optimal for writing DataFrames with high performance and memory efficiency?

@bkamins
Copy link
Member

bkamins commented Dec 6, 2023

Which data format is the most efficient in Julia?

There is no single format that is best in all aspects, so "the best" depends on many factors.
If you want to see a comparison of various formats performance you can have a look here: https://github.com/bkamins/Julia-DataFrames-Tutorial/blob/master/04_loadsave.ipynb. Note that these benchmarks report:

  • write time
  • load time
  • size on disk
  • memory used to write
  • memory used to read

(and even here you see that your question ends up a 5-criteria problem)

Also these are reported when run on a laptop using 1 thread. Benchmarks might be different when wanting best performance on a mulit-core server scenario.

Your question is essentially open ended and unrelated with DataFrames.jl (it is a general Julia question). Such questions are welcome, but it is best to discuss them in an open-ended forum, as you might get the best advice there (as me or other DataFrames.jl maintainers might not be aware of all the options). I recommend you to post it on https://discourse.julialang.org/.

@bkamins bkamins closed this as completed Dec 6, 2023
@Ujjwal4CULS
Copy link
Author

Thank you very much for this great information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants