Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use PrettyTables.jl as HTML backend #3096

Merged
merged 33 commits into from
Sep 23, 2022
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
898d730
Use PrettyTables.jl as HTML backend
ronisbr Jul 2, 2022
11e3c20
Limit the num of rows and cols based on ENV vars
ronisbr Jul 3, 2022
e7185fa
Wrap table inside a div
ronisbr Jul 3, 2022
4129e96
Redirect kwargs to PrettyTables.jl
ronisbr Aug 20, 2022
6688343
Update tests related to HTML backend
ronisbr Aug 20, 2022
adfe1b8
Fix test related to the text backend
ronisbr Aug 20, 2022
d7d3169
Do not allow kwargs rowid and title in show (HTML)
ronisbr Aug 21, 2022
0952200
Add support to truncate in HTML show
ronisbr Aug 21, 2022
2b19705
Update documentation
ronisbr Aug 21, 2022
ba81066
Add tests for DataFrameRow
ronisbr Aug 21, 2022
95c8816
Fix bug
ronisbr Aug 21, 2022
cdfa097
Update the code to the new PrettyTables interface
ronisbr Aug 27, 2022
98a849b
Add info about output customization in Jupyter
ronisbr Sep 4, 2022
ea7ec0c
Apply suggestions from code review
ronisbr Sep 4, 2022
da1d1f8
Apply suggestions from code review
bkamins Sep 4, 2022
6a9b814
Bump PrettyTables.jl version
ronisbr Sep 8, 2022
987844d
Add the requested tests
ronisbr Sep 10, 2022
2e9e5d4
Add the information asked by the reviewer
ronisbr Sep 13, 2022
384b7b9
Bump PrettyTables version to 2.1
ronisbr Sep 13, 2022
608f6ed
Add margin to the bottom of the table in HTML
ronisbr Sep 13, 2022
2d6f2db
Add tests for invalid data in ENV vars.
ronisbr Sep 14, 2022
e2b1da6
Apply suggestions from code review
ronisbr Sep 19, 2022
75c3604
Apply suggestions from code review
ronisbr Sep 19, 2022
d375572
Apply suggestions from code review
ronisbr Sep 19, 2022
63af286
Apply suggestions from code review
ronisbr Sep 19, 2022
b4f0e15
Apply suggestion from code reviewer
ronisbr Sep 19, 2022
5ca32f1
Change truncate to max_column_width
ronisbr Sep 20, 2022
2a4126a
Update src/abstractdataframe/io.jl
ronisbr Sep 21, 2022
5d64974
Change max_column_with type to AbstractString
ronisbr Sep 21, 2022
caf27a9
Add kwargs check in all types of text backend
ronisbr Sep 21, 2022
e8637b0
Improve test coverage
ronisbr Sep 21, 2022
42d51fe
Fix tests
ronisbr Sep 21, 2022
732a546
Apply suggestions from code review
bkamins Sep 23, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ InvertedIndices = "1"
IteratorInterfaceExtensions = "0.1.1, 1"
Missings = "0.4.2, 1"
PooledArrays = "1.4.2"
PrettyTables = "0.12, 1"
PrettyTables = "2.1"
Reexport = "0.1, 0.2, 1"
ShiftedArrays = "1"
SortingAlgorithms = "0.1, 0.2, 0.3, 1"
Expand Down
26 changes: 17 additions & 9 deletions docs/src/man/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,19 +15,27 @@ relevant variables into your current namespace.

!!! note

By default Jupyter Notebook will limit the number of rows and columns when displaying a data frame to roughly
fit the screen size (like in the REPL).

You can override this behavior by changing the values of the `ENV["COLUMNS"]` and `ENV["LINES"]`
variables to hold the maximum width and height of output in characters respectively.
By default DataFrames.jl limits the number of rows and columns when displaying a data frame in a Jupyter
ronisbr marked this conversation as resolved.
Show resolved Hide resolved
Notebook to 25 and 100, respectively. You can override this behavior by changing the values of the
`ENV["DATAFRAMES_COLUMNS"]` and `ENV["DATAFRAMES_ROWS"]` variables to hold the maximum number of columns
ronisbr marked this conversation as resolved.
Show resolved Hide resolved
and rows of the output. All columns or rows will be printed if those numbers are equal or lower than 0.

Alternatively, you may want to set the maximum number of data frame rows to print to `100` and the maximum
output width in characters to `1000` for every Julia session using some Jupyter kernel file (numbers `100`
and `1000` are only examples and can be adjusted). In such case add a `"COLUMNS": "1000", "LINES": "100"`
entry to the `"env"` variable in this Jupyter kernel file.
See [here](https://jupyter-client.readthedocs.io/en/stable/kernels.html) for information about location
number of columns to print to `1000` for every Julia session using some Jupyter kernel file (numbers `100`
and `1000` are only examples and can be adjusted). In such case add a
`"DATAFRAME_COLUMNS": "1000", "DATAFRAMES_ROWS": "100"` entry to the `"env"` variable in this Jupyter kernel
file. See [here](https://jupyter-client.readthedocs.io/en/stable/kernels.html) for information about location
and specification of Jupyter kernels.

The package [PrettyTables.jl](https://github.com/ronisbr/PrettyTables.jl) renders the `DataFrame` in the
Jupyter notebook. Users can customize the output by passing keywords arguments `kwargs...` to the
function `show`: `show(stdout, MIME("text/html"), df; kwargs...)`, where `df` is the `DataFrame`. Any
argument supported by PrettyTables.jl in the HTML backend can be used here. Hence, for example, if the user
wants to change the color of all numbers smaller than 0 to red in Jupyter, they can execute:
`show(stdout, MIME("text/html"), df; highlighters = hl_lt(0, HtmlDecoration(color = "red")))` after
`using PrettyTables`. For more information about the available options, check
bkamins marked this conversation as resolved.
Show resolved Hide resolved
[PrettyTables.jl documentation](https://ronisbr.github.io/PrettyTables.jl/stable/man/usage/).

## The `DataFrame` Type

Objects of the `DataFrame` type represent a data table as a series of vectors,
Expand Down
274 changes: 150 additions & 124 deletions src/abstractdataframe/io.jl
Original file line number Diff line number Diff line change
Expand Up @@ -101,8 +101,15 @@ Render a data frame to an I/O stream in MIME type `mime`.
Additionally selected MIME types support passing the following keyword arguments:
- MIME type `"text/plain"` accepts all listed keyword arguments and their behavior
is identical as for `show(::IO, ::AbstractDataFrame)`
- MIME type `"text/html"` accepts `summary` keyword argument which
allows to choose whether to print a brief string summary of the data frame.
- MIME type `"text/html"` accepts the following keywords:
ronisbr marked this conversation as resolved.
Show resolved Hide resolved
- `eltypes::Bool = true`: Whether to print the column types under column names.
- `summary::Bool = true`: Whether to print a brief string summary of the data frame.
- `max_column_width::String = ""`: The maximum column width. It must be a string
ronisbr marked this conversation as resolved.
Show resolved Hide resolved
containing a valid CSS length. For example, passing "100px" will limit the
width of all columns to 100 pixels. If empty, the columns will be rendered
without limits.
- `kwargs...`: Any keyword argument supported by the function `pretty_table`
of PrettyTables.jl can be passed here to customize the output.

# Examples
```jldoctest
Expand All @@ -126,9 +133,14 @@ julia> show(stdout, MIME("text/csv"), DataFrame(A=1:3, B=["x", "y", "z"]))
```
"""
Base.show(io::IO, mime::MIME, df::AbstractDataFrame)
Base.show(io::IO, mime::MIME"text/html", df::AbstractDataFrame;
summary::Bool=true, eltypes::Bool=true) =
_show(io, mime, df, summary=summary, eltypes=eltypes)
function Base.show(io::IO, mime::MIME"text/html", df::AbstractDataFrame;
summary::Bool=true, eltypes::Bool=true, max_column_width::String="",
kwargs...)
_verify_kwargs_for_html(; kwargs...)
return _show(io, mime, df; summary=summary, eltypes=eltypes,
max_column_width=max_column_width, kwargs...)
end

Base.show(io::IO, mime::MIME"text/latex", df::AbstractDataFrame; eltypes::Bool=true) =
_show(io, mime, df, eltypes=eltypes)
Base.show(io::IO, mime::MIME"text/csv", df::AbstractDataFrame) =
Expand All @@ -144,15 +156,6 @@ Base.show(io::IO, mime::MIME"text/plain", df::AbstractDataFrame; kwargs...) =
#
##############################################################################

function digitsep(value::Integer)
# Adapted from https://github.com/IainNZ/Humanize.jl
value = string(abs(value))
group_ends = reverse(collect(length(value):-3:1))
groups = [value[max(end_index - 2, 1):end_index]
for end_index in group_ends]
return join(groups, ',')
end

function html_escape(cell::AbstractString)
cell = replace(cell, "&"=>"&")
cell = replace(cell, "<"=>"&lt;")
Expand All @@ -164,128 +167,140 @@ function html_escape(cell::AbstractString)
return cell
end

function _show(io::IO, ::MIME"text/html", df::AbstractDataFrame;
summary::Bool=true, eltypes::Bool=true, rowid::Union{Int, Nothing}=nothing)
function _show(io::IO,
::MIME"text/html",
df::AbstractDataFrame;
summary::Bool=true,
eltypes::Bool=true,
rowid::Union{Int, Nothing}=nothing,
title::String="",
bkamins marked this conversation as resolved.
Show resolved Hide resolved
bkamins marked this conversation as resolved.
Show resolved Hide resolved
max_column_width::String="",
kwargs...)
_check_consistency(df)

# we will pass around this buffer to avoid its reallocation in ourstrwidth
buffer = IOBuffer(Vector{UInt8}(undef, 80), read=true, write=true)
names_str = names(df)
types = Any[eltype(c) for c in eachcol(df)]
types_str = batch_compacttype(types, 9)
types_str_complete = batch_compacttype(types, 256)

if rowid !== nothing
if size(df, 2) == 0
rowid = nothing
elseif size(df, 1) != 1
throw(ArgumentError("rowid may be passed only with a single row data frame"))
# For consistency, if `kwargs` has `compact_printing`, we must use it.
compact_printing::Bool = get(kwargs, :compact_printing, get(io, :compact, true))

num_rows, num_cols = size(df)

# By default, we align the columns to the left unless they are numbers,
# which is checked in the following.
alignment = fill(:l, num_cols)

for i = 1:num_cols
type_i = nonmissingtype(types[i])
bkamins marked this conversation as resolved.
Show resolved Hide resolved

if type_i <: Number
alignment[i] = :r
end
end

mxrow, mxcol = size(df)
if get(io, :limit, false)
tty_rows, tty_cols = displaysize(io)
mxrow = min(mxrow, tty_rows)
maxwidths = getmaxwidths(df, io, 1:mxrow, 0:-1, :X, nothing, true, buffer, 0) .+ 2
mxcol = min(mxcol, searchsortedfirst(cumsum(maxwidths), tty_cols))
# Obtain the maximum number of rows and columns that we can print from
# environment variables.
mxrow = something(tryparse(Int, get(ENV, "DATAFRAMES_ROWS", "25")), 25)
mxcol = something(tryparse(Int, get(ENV, "DATAFRAMES_COLUMNS", "100")), 100)
else
mxrow = -1
mxcol = -1
end

cnames = _names(df)[1:mxcol]
write(io, "<div class=\"data-frame\">")
# Check if the user wants to display a summary about the DataFrame that is
# being printed. This will be shown using the `title` option of
# `pretty_table`.
if summary
write(io, "<p>$(digitsep(nrow(df))) rows × $(digitsep(ncol(df))) columns")
if mxcol < size(df, 2)
write(io, " (omitted printing of $(size(df, 2)-mxcol) columns)")
end
write(io, "</p>")
end
write(io, "<table class=\"data-frame\">")
write(io, "<thead>")
write(io, "<tr>")
write(io, "<th></th>")
for column_name in cnames
write(io, "<th>$(html_escape(String(column_name)))</th>")
end
write(io, "</tr>")
if eltypes
write(io, "<tr>")
write(io, "<th></th>")
# We put a longer string for the type into the title argument of the <th> element,
# which the users can hover over. The limit of 256 characters is arbitrary, but
# we want some maximum limit, since the types can sometimes get really-really long.
types = Any[eltype(df[!, idx]) for idx in 1:mxcol]
ct, ct_title = batch_compacttype(types, 9), batch_compacttype(types, 256)
for j in 1:mxcol
s = html_escape(ct[j])
title = html_escape(ct_title[j])
write(io, "<th title=\"$title\">$s</th>")
if isempty(title)
title = Base.summary(df)
end
write(io, "</tr>")
else
title = ""
end
write(io, "</thead>")
write(io, "<tbody>")
for row in 1:mxrow
write(io, "<tr>")
if rowid === nothing
write(io, "<th>$row</th>")

# If `rowid` is not `nothing`, then we are printing a data row. In this
# case, we will add this information using the row name column of
# PrettyTables.jl. Otherwise, we can just use the row number column.
if (rowid === nothing) || (ncol(df) == 0)
show_row_number::Bool = get(kwargs, :show_row_number, true)
row_labels = nothing

# If the columns with row numbers is not shown, then we should not
# display a vertical line after the first column.
vlines = fill(1, show_row_number)
else
nrow(df) != 1 &&
throw(ArgumentError("rowid may be passed only with a single row data frame"))

# In this case, if the user does not want to show the row number, then
# we must hide the row name column, which is used to display the
# `rowid`.
if !get(kwargs, :show_row_number, true)
ronisbr marked this conversation as resolved.
Show resolved Hide resolved
row_labels = nothing
vlines = Int[]
else
write(io, "<th>$rowid</th>")
row_labels = [string(rowid)]
vlines = Int[1]
end
for column_name in cnames
if isassigned(df[!, column_name], row)
cell_val = df[row, column_name]
if ismissing(cell_val)
write(io, "<td><em>missing</em></td>")
elseif cell_val isa Markdown.MD
write(io, "<td>")
show(io, "text/html", cell_val)
write(io, "</td>")
elseif cell_val isa SHOW_TABULAR_TYPES
write(io, "<td><em>")
cell = sprint(ourshow, cell_val, 0)
write(io, html_escape(cell))
write(io, "</em></td>")
else
cell = sprint(ourshow, cell_val, 0)
write(io, "<td>$(html_escape(cell))</td>")
end
else
write(io, "<td><em>#undef</em></td>")
end
end
write(io, "</tr>")
end
if size(df, 1) > mxrow
write(io, "<tr>")
write(io, "<th>&vellip;</th>")
for column_name in cnames
write(io, "<td>&vellip;</td>")
end
write(io, "</tr>")

show_row_number = false
end
write(io, "</tbody>")
write(io, "</table>")
write(io, "</div>")

pretty_table(io, df;
alignment = alignment,
backend = Val(:html),
compact_printing = compact_printing,
formatters = (_pretty_tables_general_formatter,),
header = (names_str, types_str),
header_alignment = :l,
header_cell_titles = (nothing, types_str_complete),
highlighters = (_PRETTY_TABLES_HTML_HIGHLIGHTER,),
max_num_of_columns = mxcol,
max_num_of_rows = mxrow,
maximum_columns_width = max_column_width,
minify = true,
row_label_column_title = "Row",
row_labels = row_labels,
row_number_alignment = :r,
row_number_column_title = "Row",
show_omitted_cell_summary = true,
show_row_number = show_row_number,
show_subheader = eltypes,
standalone = false,
table_class = "data-frame",
table_div_class = "data-frame",
table_style = _PRETTY_TABLES_HTML_TABLE_STYLE,
top_left_str = title,
bkamins marked this conversation as resolved.
Show resolved Hide resolved
top_right_str_decoration = HtmlDecoration(font_style = "italic"),
vcrop_mode = :middle,
wrap_table_in_div = true,
kwargs...)

return nothing
end

function Base.show(io::IO, mime::MIME"text/html", dfr::DataFrameRow;
summary::Bool=true, eltypes::Bool=true)
function Base.show(io::IO, mime::MIME"text/html", dfr::DataFrameRow; kwargs...)
ronisbr marked this conversation as resolved.
Show resolved Hide resolved
_verify_kwargs_for_html(; kwargs...)
r, c = parentindices(dfr)
summary && write(io, "<p>DataFrameRow ($(length(dfr)) columns)</p>")
_show(io, mime, view(parent(dfr), [r], c), summary=false, eltypes=eltypes, rowid=r)
title = "DataFrameRow ($(length(dfr)) columns)"
_show(io, mime, view(parent(dfr), [r], c); rowid=r, title=title, kwargs...)
end

function Base.show(io::IO, mime::MIME"text/html", dfrs::DataFrameRows;
summary::Bool=true, eltypes::Bool=true)
function Base.show(io::IO, mime::MIME"text/html", dfrs::DataFrameRows; kwargs...)
bkamins marked this conversation as resolved.
Show resolved Hide resolved
_verify_kwargs_for_html(; kwargs...)
df = parent(dfrs)
summary && write(io, "<p>$(nrow(df))×$(ncol(df)) DataFrameRows</p>")
_show(io, mime, df, summary=false, eltypes=eltypes)
title = "$(nrow(df))×$(ncol(df)) DataFrameRows"
_show(io, mime, df; title=title, kwargs...)
end

function Base.show(io::IO, mime::MIME"text/html", dfcs::DataFrameColumns;
summary::Bool=true, eltypes::Bool=true)
function Base.show(io::IO, mime::MIME"text/html", dfcs::DataFrameColumns; kwargs...)
_verify_kwargs_for_html(; kwargs...)
df = parent(dfcs)
if summary
write(io, "<p>$(nrow(df))×$(ncol(df)) DataFrameColumns</p>")
end
_show(io, mime, df, summary=false, eltypes=eltypes)
title = "$(nrow(df))×$(ncol(df)) DataFrameColumns"
_show(io, mime, df; title=title, kwargs...)
end

function Base.show(io::IO, mime::MIME"text/html", gd::GroupedDataFrame)
Expand All @@ -298,31 +313,42 @@ function Base.show(io::IO, mime::MIME"text/html", gd::GroupedDataFrame)
nrows = size(gd[1], 1)
rows = nrows > 1 ? "rows" : "row"

identified_groups = [html_escape(string(col, " = ",
repr(first(gd[1][!, col]))))
identified_groups = [string(col, " = ", repr(first(gd[1][!, col])))
ronisbr marked this conversation as resolved.
Show resolved Hide resolved
for col in gd.cols]

write(io, "<p><i>First Group ($nrows $rows): ")
join(io, identified_groups, ", ")
write(io, "</i></p>")
show(io, mime, gd[1], summary=false)
title = "First Group ($nrows $rows): " * join(identified_groups, ", ")
_show(io, mime, gd[1], title=title)
end
if N > 1
nrows = size(gd[N], 1)
rows = nrows > 1 ? "rows" : "row"

identified_groups = [html_escape(string(col, " = ",
repr(first(gd[N][!, col]))))
identified_groups = [string(col, " = ", repr(first(gd[N][!, col])))
for col in gd.cols]

write(io, "<p>&vellip;</p>")
write(io, "<p><i>Last Group ($nrows $rows): ")
join(io, identified_groups, ", ")
write(io, "</i></p>")
show(io, mime, gd[N], summary=false)
title = "Last Group ($nrows $rows): " * join(identified_groups, ", ")
_show(io, mime, gd[N], title=title)
end
end

# Internal function to verify the keywords in show functions using the HTML
# backend.
function _verify_kwargs_for_html(; kwargs...)
haskey(kwargs, :rowid) &&
throw(ArgumentError("Keyword argument `rowid` is reserved and must not be used."))

haskey(kwargs, :title) &&
throw(ArgumentError("Use the `top_left_str` keyword argument instead of `title` " *
"to change the label above the data frame."))

haskey(kwargs, :truncate) &&
throw(ArgumentError("`truncate` is not supported in HTML. " *
"Use `max_column_width` to limit the size of the columns in this case."))

return nothing
end

##############################################################################
#
# LaTeX output
Expand Down
Loading