-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataFrames with many columns are too slow (because of show()) #2739
Comments
@ronisbr I thought we had this issue resolved. Since we crop we do not need to process all columns of the table - only as much as is needed up to cropping point. |
@ronisbr - should we transfer it to PrettyTables.jl? |
This was very interesting! Indeed, we are only processing the columns that are printed. However, the code that treat the alignment regex was based on a dictionary. The keys of a dictionary are not sorted. Hence, we have something like:
The sole fact of finding if a key refers to a printed column of something that has 10^5 keys was taking that long. I now sorted the keys first, and just break the loop. Can you please test against PrettyTables |
It is working now fast, so I am closing it here (@sl-solution - pleaese reopen if it is not resolved on your side):
|
I will tag a new version now! Thanks! |
The problem seems is not resolved yet: df = DataFrame(rand(100,10^5),:auto);
show(df) # ok
allowmissing!(df)
show(df) # not fixed |
Indeed - I can reproduce it, so re-opening the issue. |
Hi @bkamins ! It turns out that the problem now is not inside PrettyTables.jl, but with the call:
It is taking too long to process the names of the columns. You can see this by executing: julia> df = DataFrame(rand(100,10^5),:auto);
julia> allowmissing!(df);
julia> DataFrames.compacttype.(eltype.(eachcol(df)), 9) I am not sure how we can solve this, because PrettyTables.jl needs to receive the header of the entire table. Maybe we can preallocate a vector and only fill the ones we are 100% sure they will be printed. Ideas? |
I will fix it by memoization. |
I come across this issue, in the following example:
The text was updated successfully, but these errors were encountered: