-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: to_html() with formatters=<list> and max_cols fixed #28183
Conversation
Co-Authored-By: Simon Hawkins <[email protected]>
Hello @gabriellm1! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-09-11 00:08:07 UTC |
Hey @simonjayhawkins , just fixed what you reviewed 👍 |
pandas/io/formats/format.py
Outdated
@@ -656,6 +656,10 @@ def _chk_truncate(self) -> None: | |||
frame = concat( | |||
(frame.iloc[:, :col_num], frame.iloc[:, -col_num:]), axis=1 | |||
) | |||
# truncate formatter | |||
if is_list_like(self.formatters) and self.formatters: | |||
truncate_fmt = cast(List[Callable], self.formatters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity what complain was this giving? cast doesn't seem necessary given assignment back to self.formatters but I may be missing something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm OK. I think this is somewhat misleading though as is_list_like
will return True for Tuples
as well, so restricting this to List
might not be totally correct (though I can certainly see where one would think that...)
Does changing this annotation to Sequence[Callable]
still pass Mypy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's passes! Although it's necessary to import Sequences in typing and change the concat operation from self.formatters = fmt[slice] + fmt[slice] to self.formatters = [*fmt[slice],*fmt[slice]] . May I make a commit with theses changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm what type fails if you keep the code as is but just change annotation? Want to make sure if we change that we have appropriate test coverage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to avoid any / all conflicts between annotations, documentation and implementation
makes sense. At the moment we have a conflict between the documentation and the implementation. so that could be confusing matters.
Also, the wording for to_latex is slightly different from to_string and to_html.
they all share the same code.
maybe we should have a def _validate_formatters_kwarg(formatters):
since I think the length check is important if this truncation is added.
at the moment, if the list is too short, it raises and if the list is too long it ignores the additional elements.
however, with the slicing from the start and end, weird things might happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gabriellm1 in summary..
we want to avoid the cast.
so for now we should probably go with if isinstance(self.formatters, (list, tuple)):
instead of is_list_like(self.formatters) and self.formatters
this would be consistent with the check in _get_formatter
however, you would still need to do self.formatters = [*fmt[slice],*fmt[slice]]
to satisfy mypy. I can't find a mypy issue for this case, otherwise we could have added a # type: ignore
with the issue number.
I think validating the length of the list should be done in this PR, but updating the docs is outside the scope so could be done as a follow-up.
@WillAyd lmk if you disagree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cast is gone. Now what about length validation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we only do the length validation (and don't check that the parameter is not a string or that the items are callables) in this PR, then a separate function is probably unnecessary.
so just raise with a message along the lines of "The 'formatters' parameter must be of length equal to the number of columns."?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I got it. The validation would happen in the same place I made the changes? And what did you mean with "raise with a message"?
pandas/io/formats/format.py
Outdated
@@ -656,6 +656,10 @@ def _chk_truncate(self) -> None: | |||
frame = concat( | |||
(frame.iloc[:, :col_num], frame.iloc[:, -col_num:]), axis=1 | |||
) | |||
# truncate formatter | |||
if is_list_like(self.formatters) and self.formatters: | |||
truncate_fmt = cast(List[Callable], self.formatters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm OK. I think this is somewhat misleading though as is_list_like
will return True for Tuples
as well, so restricting this to List
might not be totally correct (though I can certainly see where one would think that...)
Does changing this annotation to Sequence[Callable]
still pass Mypy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just minor comment to hash out on annotation otherwise we can get this merged soon. Thanks for the PR
pandas/io/formats/format.py
Outdated
@@ -656,6 +656,10 @@ def _chk_truncate(self) -> None: | |||
frame = concat( | |||
(frame.iloc[:, :col_num], frame.iloc[:, -col_num:]), axis=1 | |||
) | |||
# truncate formatter | |||
if is_list_like(self.formatters) and self.formatters: | |||
truncate_fmt = cast(List[Callable], self.formatters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm what type fails if you keep the code as is but just change annotation? Want to make sure if we change that we have appropriate test coverage
Are we good here @WillAyd? I didn't follow the typing discussion. |
pandas/io/formats/format.py
Outdated
@@ -656,6 +656,10 @@ def _chk_truncate(self) -> None: | |||
frame = concat( | |||
(frame.iloc[:, :col_num], frame.iloc[:, -col_num:]), axis=1 | |||
) | |||
# truncate formatter | |||
if is_list_like(self.formatters) and self.formatters: | |||
truncate_fmt = cast(List[Callable], self.formatters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So one of two things is wrong here, either:
- A Tuple is an allowable value in this block, in which case this would currently fail OR
- Our annotations are wrong an a Tuple isn't allowable any way
I think the problem here is point 1 above - can you verify that and if so add a test case? Definitely want to get rid of the cast
unless absolutely necessary. Using cast
like this can actually hide potential bugs, so need to be careful with it
diff --git a/pandas/io/formats/format.py b/pandas/io/formats/format.py
index 5ce4bf6ea..0d3e57990 100644
--- a/pandas/io/formats/format.py
+++ b/pandas/io/formats/format.py
@@ -20,7 +20,9 @@ from typing import (
Dict,
Iterable,
List,
+ Mapping,
Optional,
+ Sequence,
Tuple,
Type,
Union,
@@ -76,9 +78,7 @@ from pandas.io.formats.printing import adjoin, justify, pprint_thing
if TYPE_CHECKING:
from pandas import Series, DataFrame, Categorical
-formatters_type = Union[
- List[Callable], Tuple[Callable, ...], Dict[Union[str, int], Callable]
-]
+formatters_type = Union[Sequence[Callable], Mapping[Union[str, int], Callable]]
float_format_type = Union[str, Callable, "EngFormatter"]
common_docstring = """
@@ -458,7 +458,7 @@ class TableFormatter:
)
def _get_formatter(self, i: Union[str, int]) -> Optional[Callable]:
- if isinstance(self.formatters, (list, tuple)):
+ if isinstance(self.formatters, Sequence):
if is_integer(i):
i = cast(int, i)
return self.formatters[i]
@@ -658,8 +658,8 @@ class DataFrameFormatter(TableFormatter):
(frame.iloc[:, :col_num], frame.iloc[:, -col_num:]), axis=1
)
# truncate formatter
- if is_list_like(self.formatters) and self.formatters:
- truncate_fmt = cast(List[Callable], self.formatters)
+ if isinstance(self.formatters, Sequence):
+ truncate_fmt = list(self.formatters)
self.formatters = truncate_fmt[:col_num] + truncate_fmt[-col_num:]
self.tr_col_num = col_num
if truncate_v: in the docs, it states "List must be of length equal to the number of columns.". I don't think this is enforced and could causes issues with the truncated list if the length of the list (or sequence) doesn't match the number of columns. |
Sorry, I haven't followed the discussion here but CI is passing. Is this good to go, or is there an outstanding item @simonjayhawkins (something to do with validating lengths)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Also slightly lost on back and forth but I think there's a follow up on validating the length of the input? Can someone open an issue for that?
@WillAyd @TomAugspurger The fix here slices from the end of the list of formatters. There is no check that the list of formatters is the same as the number of columns. So IMO this solution is incomplete, but i'll defer if you think this could be a follow-up. |
This raises on master In [14]: df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
In [15]: df.to_html(formatters=['{}'.format])
IndexError: list index out of range Does this PR introduce new cases where that will happen? With will happen with In [15]: df.to_html(formatters=['{}'.format], max_cols=1)
Out[15]: '<table border="1" class="dataframe">\n <thead>\n <tr style="text-align: right;">\n ... on your branch @gabriellm1? If we aren't changing behavior here, then this still seems like a strict improvement and we'll do a followup. |
Sorry if I got it in the wrong way. My branch happens exactly what you described. Can you explain what's the problem here? The first code should behave like the second one? If you guys want to resolve this here just explain what should I do or if you prefer another PR I can make it happen. |
Hm, I see. So we need to check the size of formatters and raise an error when size is different? |
I think we need to define a behavior. Does the length of |
If the length matches the number of displayed columns the truncation added here would be unnecessary. It would also be confusing for users, as i think the number of displayed columns in to_string (shares this code) depends on the terminal width and column contents? so I would assume that the length of the formatters list should match the number of columns in the dataframe, as currently stated (but not enforced) in the api docs. |
So which direction are we going? |
Opened #28469 as a followup. I think we're good here. |
Thanks @gabriellm1! |
Thanks for the assist guys ! |
…28183) * BUG: issue-25955 fixed
…28183) * BUG: issue-25955 fixed
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
@hugoecarl
@guipleite