-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved REPL printing for GroupedDataFrames #3107
Conversation
Makes sense. I will wait for @ronisbr to review first as he is maintaining all display stuff 😄. |
@ronisbr - could you please review this PR (especially in terms of how it interacts with PrettyTables.jl kwargs) while we are waiting for HTML PR to be merged. Then I will review it. Thank you! |
Sorry! I completely missed this one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me! I would just add tests for corner cases in displaysize
. For example, if we are printing to a file, dislaysize
will take the values from LINES
and COLUMNS
:
julia> io = IOBuffer()
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1)
julia> ENV["LINES"] = -1
-1
julia> displaysize(io)
(-1, 80)
In this case, I suppose the data frames will be printed without limits given how show
works. However, it will important to have tests for those cases with invalid options.
Another corner case that I always have problems when dealing with limited display is when the display is just too small. What will happen if the display here has only 3 lines? Will the algorithm break at some point? I think h = div(v, 2)
can return a number equal or lower than 0, which will print the entire DataFrame. Can you check this please?
Hi @ronisbr, thanks for the comment! |
@Jollywatt - unfortunately due to Tables.jl 1.8 release we needed to make some changes in the internals of DataFrames.jl. Can you please rebase (or merge if rebasing is problematic) this PR against |
src/groupeddataframe/show.jl
Outdated
print(io, "\nFirst Group ($nrows $rows): ") | ||
join(io, identified_groups, ", ") | ||
println(io) | ||
(h, w) = get(io, :displaysize, displaysize(io)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ronisbr - do you use the same pattern in PrettyTables.jl? (to ensure consistency)
@Jollywatt - should not we use the get(io, :limit, false)
test first and only implement the logic you propose if the :limit
property is set? And add tests for both cases.
Note the following for standard printing of data frames:
julia> df = DataFrame(rand(20, 5), :auto) # display sets limit
20×5 DataFrame
Row │ x1 x2 x3 x4 x5
│ Float64 Float64 Float64 Float64 Float64
─────┼───────────────────────────────────────────────────────
1 │ 0.799571 0.711126 0.0829037 0.153365 0.65724
2 │ 0.013505 0.516158 0.289836 0.772033 0.236464
3 │ 0.539347 0.866731 0.527391 0.763274 0.445696
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮
18 │ 0.285917 0.83837 0.412981 0.283734 0.734432
19 │ 0.539428 0.281714 0.503087 0.236329 0.980165
20 │ 0.19463 0.818894 0.080396 0.274849 0.678159
14 rows omitted
julia> show(stdout, df) # stdout does not have limit
20×5 DataFrame
Row │ x1 x2 x3 x4 x5
│ Float64 Float64 Float64 Float64 Float64
─────┼────────────────────────────────────────────────────────
1 │ 0.799571 0.711126 0.0829037 0.153365 0.65724
2 │ 0.013505 0.516158 0.289836 0.772033 0.236464
3 │ 0.539347 0.866731 0.527391 0.763274 0.445696
4 │ 0.220234 0.94367 0.740741 0.141462 0.121854
5 │ 0.32305 0.267344 0.519681 0.331716 0.396574
6 │ 0.127974 0.282572 0.640234 0.841006 0.526737
7 │ 0.322138 0.925825 0.00683361 0.0276174 0.140838
8 │ 0.832967 0.744335 0.808838 0.516439 0.219536
9 │ 0.895046 0.853179 0.487567 0.81502 0.721893
10 │ 0.236999 0.481331 0.310607 0.0200533 0.00837921
11 │ 0.142544 0.310369 0.121286 0.781769 0.0948759
12 │ 0.923275 0.205102 0.097269 0.806569 0.273017
13 │ 0.177222 0.266212 0.965611 0.345097 0.745195
14 │ 0.157075 0.851299 0.121473 0.281857 0.1338
15 │ 0.938703 0.183045 0.769088 0.562905 0.0955595
16 │ 0.592258 0.387368 0.912301 0.292013 0.401606
17 │ 0.949321 0.0872963 0.421829 0.0586528 0.39613
18 │ 0.285917 0.83837 0.412981 0.283734 0.734432
19 │ 0.539428 0.281714 0.503087 0.236329 0.980165
20 │ 0.19463 0.818894 0.080396 0.274849 0.678159
julia> show(IOContext(stdout, :limit => true), df) # but you can manually enable limit
20×5 DataFrame
Row │ x1 x2 x3 x4 x5
│ Float64 Float64 Float64 Float64 Float64
─────┼───────────────────────────────────────────────────────
1 │ 0.799571 0.711126 0.0829037 0.153365 0.65724
2 │ 0.013505 0.516158 0.289836 0.772033 0.236464
3 │ 0.539347 0.866731 0.527391 0.763274 0.445696
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮
18 │ 0.285917 0.83837 0.412981 0.283734 0.734432
19 │ 0.539428 0.281714 0.503087 0.236329 0.980165
20 │ 0.19463 0.818894 0.080396 0.274849 0.678159
14 rows omitted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ronisbr - do you use the same pattern in PrettyTables.jl? (to ensure consistency)
Yes, kind of. The idea is the same, but I check for :limit
if the user did not pass a specific keyword to crop the output table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this redundant? displaysize(io)
already extracts the :displaysize
property of io
if it's set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it does not have to, see e.g. https://github.com/JuliaLang/julia/blob/d5cde865f24d2c3e7041aee8ea464eb6b6045a2a/base/stream.jl#L564 (and custom targets could define similar rules).
OTOH, I do not see a situation when what is proposed now would be incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither DataFrames nor PrettyTables use that pattern AFAICT. Julia Base doesn't. So better be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataFrames.jl uses this pattern in
DataFrames.jl/src/abstractdataframe/io.jl
Line 389 in 8acb679
tty_rows, tty_cols = get(io, :displaysize, displaysize(io)) |
Blame shows that I added it 3 years ago in #1761. The reason why it was added is #1761 (comment) (you had the same comment there as here). @nalimilan - so what do we do?
@bkamins I hope I did that right — I synced my fork using the Github UI, which made a merge, not a rebase. |
merge should be OK. |
src/groupeddataframe/show.jl
Outdated
@@ -44,7 +44,7 @@ function Base.show(io::IO, gd::GroupedDataFrame; | |||
h -= 2 # two lines are already used for header and gap between groups | |||
|
|||
h1 = h2 = h # display heights available for first and last groups | |||
if N > 1 | |||
if !allrows && N > 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ronisbr - there is also an additional kwarg in PrettyTables.jl that governs number of rows to be printed in a table. Can you please remind it its name? We should check how passing it interacts with this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One for sure I now see is display_size
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should point out that change to line 47 doesn't change any behaviour: arguments like allrows
get forwarded to show
methods later on. (All !allrows && …
does is save a few arithmetic operations in the case where we know that we don't need to calculate heights.) So if there are other kwargs
that override allrows
, then it might be better to always set the default display_size
like before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before I did not suggest to make a change to check allrows
, but rather to check get(io, :limit, false)
. However, looking at the code - maybe it is OK, as if :limit
is false
then display size will be ignored in show
for groups. Just please confirm that we have a test for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, does :limit
take precedence over allrows
? I have added a single test for whether allrows=true
works even when display size is small, but don't have tests where I set :limit
directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does
:limit
take precedence overallrows
?
It depends how you define precedence. allrows
will override :limit
if allrows
is set to true
.
However, you can have allrows=false
and :limit=false
. We should make sure that even if we pass custom display_size
to show
internally all works OK (I think it will, but let us make sure).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ronisbr - there is also an additional kwarg in PrettyTables.jl that governs number of rows to be printed in a table. Can you please remind it its name? We should check how passing it interacts with this PR.
You can select the number of rows that will be printed if crop
is set to :both
or :vertical
. In this case, the number of printed rows (including header, table dividers, etc) are selected based on the display size.
In summary - the PR mostly looks good (it works OK under standard settings). However, please carefully consider |
Also documentation needs to be updated to reflect the changes rules of printing of |
From my side, I do not see problems related to kwargs in PrettyTables. The arguments |
OK - so we can drop |
@nalimilan - do you have time to have a look at this PR (and as usual spot the tiny details that I usually miss; thank you!) |
@Jollywatt - do you have time to have a look at this PR and finalize it. Thank you! Soon we will make a feature freeze for DataFrames.jl 1.4 release (after this - the PR would still be merged but would be released later). |
@bkamins Ok, done (I think)! Let me know if there’s anything else to do:) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests are failing
I’ve been trying to debug those failures but haven’t gotten to the bottom of them. It seems like |
docs build is failing. |
Regarding failing CI: @ronisbr - this is a bug in PrettyTables.jl. You use The offending operation is:
where in the test This is in The issue is most likely some non standard table printing heights that get passed in this PR to PrettyTables.jl. So the fix should have two parts:
|
Oops, sorry about that! I always assumed that the header is always rendered. However, given the huge rewriting of PrettyTables in v2, I forgot to add some checks. In this case, the number of rendered lines was set to the number of display lines (1), which is lower than the number of header lines (2). This must never happen. I fixed this bug in |
@Jollywatt by the way, the output will change. Notice that the header will always be printed, no matter how small the display is. |
I think it is OK that the header is always printed. |
It is the solution that leads to the least amount of problems. Otherwise, the omitted cell summary would be "wrong" since we were also omitting header cells. Anyway, it only affects when you are using a very small terminal, which should be a very corner case. |
Done! PrettyTables v2.1.1 is released. Can you please test it again? |
Testing now (probably docstrings still need to be fixed, but at least we will see if the problem you addressed is fixed) |
@Jollywatt - after PrettyTables.jl changes docstrings + testset outputs need to be updated. Thank you! |
…`allrows` show option is false.
… GroupedDataFrames, not only when `allrows` is false.
- Make grouped data frames fit in small text displays as best as possible - To break ties, display the first group with one more row than the second
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Co-authored-by: Bogumił Kamiński <[email protected]>
(A bug was fixed in PrettyTables which simplified the printing logic a bit.) Update doctests with grouped data frames.
@bkamins I’ve fixed the display logic and tests to work for PrettyTables 2.1.1, and fixed the doctests. Hopefully I haven’t missed anything:) (Apologies for the highly nonlinear git history…!) |
Thank you! I need to review the PR before merging from scratch anyway so it is not a problem that history is complex. |
Thank you! |
This is a small enhancement to make
GroupedDataFrame
s display so that they fit in one screen, like regularDataFrame
s with many rows.Before, the first and last groups of a
GroupedDataFrame
would be displayed at full size; now, the two groups are made smaller (using the:displaysize
parameter inIOContext
) so that together they fill the required line height.If the first group is smaller than half the available REPL height, and the last group is larger, or vice versa, then the smaller group is shown in full and the larger one is squashed (displayed with missing rows). This means that the group with the most rows takes up the most room visually, even after some of its rows are skipped.
See the
"GroupedDataFrame displaysize test"
test set for example output.