Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: to_html() with formatters=<list> and max_cols fixed #28183

Merged
merged 6 commits into from
Sep 16, 2019
Merged

BUG: to_html() with formatters=<list> and max_cols fixed #28183

merged 6 commits into from
Sep 16, 2019

Conversation

gabriellm1
Copy link
Contributor

@hugoecarl
@guipleite

@simonjayhawkins simonjayhawkins added Bug IO HTML read_html, to_html, Styler.apply, Styler.applymap labels Aug 29, 2019
@pep8speaks
Copy link

pep8speaks commented Aug 29, 2019

Hello @gabriellm1! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-09-11 00:08:07 UTC

@gabriellm1
Copy link
Contributor Author

Hey @simonjayhawkins , just fixed what you reviewed 👍

@@ -656,6 +656,10 @@ def _chk_truncate(self) -> None:
frame = concat(
(frame.iloc[:, :col_num], frame.iloc[:, -col_num:]), axis=1
)
# truncate formatter
if is_list_like(self.formatters) and self.formatters:
truncate_fmt = cast(List[Callable], self.formatters)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity what complain was this giving? cast doesn't seem necessary given assignment back to self.formatters but I may be missing something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually a mypy complain, when running ci/code_checks.sh, about the possibility of self.formatters not been a list.
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm OK. I think this is somewhat misleading though as is_list_like will return True for Tuples as well, so restricting this to List might not be totally correct (though I can certainly see where one would think that...)

Does changing this annotation to Sequence[Callable] still pass Mypy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's passes! Although it's necessary to import Sequences in typing and change the concat operation from self.formatters = fmt[slice] + fmt[slice] to self.formatters = [*fmt[slice],*fmt[slice]] . May I make a commit with theses changes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm what type fails if you keep the code as is but just change annotation? Want to make sure if we change that we have appropriate test coverage

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to avoid any / all conflicts between annotations, documentation and implementation

makes sense. At the moment we have a conflict between the documentation and the implementation. so that could be confusing matters.

Also, the wording for to_latex is slightly different from to_string and to_html.

they all share the same code.

maybe we should have a def _validate_formatters_kwarg(formatters): since I think the length check is important if this truncation is added.

at the moment, if the list is too short, it raises and if the list is too long it ignores the additional elements.

however, with the slicing from the start and end, weird things might happen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gabriellm1 in summary..

we want to avoid the cast.

so for now we should probably go with if isinstance(self.formatters, (list, tuple)): instead of is_list_like(self.formatters) and self.formatters

this would be consistent with the check in _get_formatter

however, you would still need to do self.formatters = [*fmt[slice],*fmt[slice]] to satisfy mypy. I can't find a mypy issue for this case, otherwise we could have added a # type: ignore with the issue number.

I think validating the length of the list should be done in this PR, but updating the docs is outside the scope so could be done as a follow-up.

@WillAyd lmk if you disagree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cast is gone. Now what about length validation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we only do the length validation (and don't check that the parameter is not a string or that the items are callables) in this PR, then a separate function is probably unnecessary.

so just raise with a message along the lines of "The 'formatters' parameter must be of length equal to the number of columns."?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I got it. The validation would happen in the same place I made the changes? And what did you mean with "raise with a message"?

pandas/io/formats/format.py Outdated Show resolved Hide resolved
@@ -656,6 +656,10 @@ def _chk_truncate(self) -> None:
frame = concat(
(frame.iloc[:, :col_num], frame.iloc[:, -col_num:]), axis=1
)
# truncate formatter
if is_list_like(self.formatters) and self.formatters:
truncate_fmt = cast(List[Callable], self.formatters)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm OK. I think this is somewhat misleading though as is_list_like will return True for Tuples as well, so restricting this to List might not be totally correct (though I can certainly see where one would think that...)

Does changing this annotation to Sequence[Callable] still pass Mypy?

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just minor comment to hash out on annotation otherwise we can get this merged soon. Thanks for the PR

@@ -656,6 +656,10 @@ def _chk_truncate(self) -> None:
frame = concat(
(frame.iloc[:, :col_num], frame.iloc[:, -col_num:]), axis=1
)
# truncate formatter
if is_list_like(self.formatters) and self.formatters:
truncate_fmt = cast(List[Callable], self.formatters)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm what type fails if you keep the code as is but just change annotation? Want to make sure if we change that we have appropriate test coverage

@TomAugspurger
Copy link
Contributor

Are we good here @WillAyd? I didn't follow the typing discussion.

@@ -656,6 +656,10 @@ def _chk_truncate(self) -> None:
frame = concat(
(frame.iloc[:, :col_num], frame.iloc[:, -col_num:]), axis=1
)
# truncate formatter
if is_list_like(self.formatters) and self.formatters:
truncate_fmt = cast(List[Callable], self.formatters)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one of two things is wrong here, either:

  1. A Tuple is an allowable value in this block, in which case this would currently fail OR
  2. Our annotations are wrong an a Tuple isn't allowable any way

I think the problem here is point 1 above - can you verify that and if so add a test case? Definitely want to get rid of the cast unless absolutely necessary. Using cast like this can actually hide potential bugs, so need to be careful with it

@simonjayhawkins
Copy link
Member

formatters_type is a union of list and tuple (and dict) due to the isinstance check in _get_formatter. if that is changed to be less restrictive, the following could be a potential solution.

diff --git a/pandas/io/formats/format.py b/pandas/io/formats/format.py
index 5ce4bf6ea..0d3e57990 100644
--- a/pandas/io/formats/format.py
+++ b/pandas/io/formats/format.py
@@ -20,7 +20,9 @@ from typing import (
     Dict,
     Iterable,
     List,
+    Mapping,
     Optional,
+    Sequence,
     Tuple,
     Type,
     Union,
@@ -76,9 +78,7 @@ from pandas.io.formats.printing import adjoin, justify, pprint_thing
 if TYPE_CHECKING:
     from pandas import Series, DataFrame, Categorical
 
-formatters_type = Union[
-    List[Callable], Tuple[Callable, ...], Dict[Union[str, int], Callable]
-]
+formatters_type = Union[Sequence[Callable], Mapping[Union[str, int], Callable]]
 float_format_type = Union[str, Callable, "EngFormatter"]
 
 common_docstring = """
@@ -458,7 +458,7 @@ class TableFormatter:
         )
 
     def _get_formatter(self, i: Union[str, int]) -> Optional[Callable]:
-        if isinstance(self.formatters, (list, tuple)):
+        if isinstance(self.formatters, Sequence):
             if is_integer(i):
                 i = cast(int, i)
                 return self.formatters[i]
@@ -658,8 +658,8 @@ class DataFrameFormatter(TableFormatter):
                     (frame.iloc[:, :col_num], frame.iloc[:, -col_num:]), axis=1
                 )
                 # truncate formatter
-                if is_list_like(self.formatters) and self.formatters:
-                    truncate_fmt = cast(List[Callable], self.formatters)
+                if isinstance(self.formatters, Sequence):
+                    truncate_fmt = list(self.formatters)
                     self.formatters = truncate_fmt[:col_num] + truncate_fmt[-col_num:]
             self.tr_col_num = col_num
         if truncate_v:

in the docs, it states "List must be of length equal to the number of columns.". I don't think this is enforced and could causes issues with the truncated list if the length of the list (or sequence) doesn't match the number of columns.

@TomAugspurger
Copy link
Contributor

Sorry, I haven't followed the discussion here but CI is passing.

Is this good to go, or is there an outstanding item @simonjayhawkins (something to do with validating lengths)?

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Also slightly lost on back and forth but I think there's a follow up on validating the length of the input? Can someone open an issue for that?

@simonjayhawkins
Copy link
Member

@WillAyd @TomAugspurger The fix here slices from the end of the list of formatters. There is no check that the list of formatters is the same as the number of columns. So IMO this solution is incomplete, but i'll defer if you think this could be a follow-up.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Sep 12, 2019

This raises on master

In [14]: df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})

In [15]: df.to_html(formatters=['{}'.format])
IndexError: list index out of range

Does this PR introduce new cases where that will happen? With will happen with

In [15]: df.to_html(formatters=['{}'.format], max_cols=1)
Out[15]: '<table border="1" class="dataframe">\n  <thead>\n    <tr style="text-align: right;">\n      ...

on your branch @gabriellm1?

If we aren't changing behavior here, then this still seems like a strict improvement and we'll do a followup.

@gabriellm1
Copy link
Contributor Author

Sorry if I got it in the wrong way. My branch happens exactly what you described. Can you explain what's the problem here? The first code should behave like the second one? If you guys want to resolve this here just explain what should I do or if you prefer another PR I can make it happen.

@simonjayhawkins
Copy link
Member

this better shows the incorrect behavior when the list is to short

image

and when the list is too long

image

@gabriellm1
Copy link
Contributor Author

Hm, I see. So we need to check the size of formatters and raise an error when size is different?

@TomAugspurger
Copy link
Contributor

Hm, I see. So we need to check the size of formatters and raise an error when size is different?

I think we need to define a behavior. Does the length of formatters need to match the number of original columns, or the number of columns that actually get formatted (which is I think min(len(df.columns), max_cols)?

@simonjayhawkins
Copy link
Member

Hm, I see. So we need to check the size of formatters and raise an error when size is different?

I think we need to define a behavior. Does the length of formatters need to match the number of original columns, or the number of columns that actually get formatted (which is I think min(len(df.columns), max_cols)?

If the length matches the number of displayed columns the truncation added here would be unnecessary.

It would also be confusing for users, as i think the number of displayed columns in to_string (shares this code) depends on the terminal width and column contents?

so I would assume that the length of the formatters list should match the number of columns in the dataframe, as currently stated (but not enforced) in the api docs.

@gabriellm1
Copy link
Contributor Author

So which direction are we going?

@TomAugspurger
Copy link
Contributor

Opened #28469 as a followup. I think we're good here.

@TomAugspurger TomAugspurger merged commit 0c67f13 into pandas-dev:master Sep 16, 2019
@TomAugspurger
Copy link
Contributor

Thanks @gabriellm1!

@gabriellm1
Copy link
Contributor Author

Thanks for the assist guys !

@gabriellm1 gabriellm1 deleted the Bug-issue-25955 branch October 19, 2019 01:22
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO HTML read_html, to_html, Styler.apply, Styler.applymap
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug in to_html() with formatters=<list> and max_cols
5 participants