Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Medium term plan: Use XLSX.jl and LibXLS.jl instead of ExcelReader.jl #26

Open
3 tasks
davidanthoff opened this issue Feb 20, 2019 · 1 comment
Open
3 tasks

Comments

@davidanthoff
Copy link
Member

Main benefit would be that we can get rid of the Python dependency that ExcelReaders.jl brings along and still support both old a new Excel file formats. The Python dependency has been problematic pretty regularly in terms of deployment.

Stuff still todo:

  • get LibXLS.jl in shape. I got the cross building sorted out, and I have some local code that can read meta data from old school Excel files, but there is still a lot to finish before this is ready.
  • Do some performance comparisons of ExcelReader.jl and XLSX.jl, just to be sure (I don't really expect any real issues there)
  • Code things up here :)

CC @felipenoris

@hhaensel
Copy link

I compared XLSX.jl vs. CSV.jl for a data set of approx 160 MB and saw a huge performance difference.

julia> @time d1 = DataFrame(CSV.File("demo.csv"));
  0.263584 seconds (2.01 M allocations: 531.367 MiB)

julia> @time d2 = DataFrame(XLSX.readtable("demo.xlsx", 1)...)
100.617655 seconds (489.23 M allocations: 17.850 GiB, 28.54% gc time)

Also the memory and compile time fingerprints differ by an impressive amount. Maybe anonymous functions are used in a loop?
It would be definitely good to have a performant reader for xlsx files ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants