-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add built-in function for Styler to format the text displayed for missing values #29118
ENH: Add built-in function for Styler to format the text displayed for missing values #29118
Conversation
…ing values As described in GH pandas-dev#28358, user who wants to control how NA values are printed while applying styles to the output will have to implement their own formatter. (so that the underlying data will not change and can be used for styling) Since the behavior is common in styling (for reports etc.), suggest to add this shortcut function to enable users format their NA values as something like '--' or 'Not Available' easily. example usage: `df.style.highlight_max().format_null('--')`
doc/source/whatsnew/v1.0.0.rst
Outdated
@@ -110,6 +110,7 @@ Other enhancements | |||
- :meth:`DataFrame.to_json` now accepts an ``indent`` integer argument to enable pretty printing of JSON output (:issue:`12004`) | |||
- :meth:`read_stata` can read Stata 119 dta files. (:issue:`28250`) | |||
- Added ``encoding`` argument to :func:`DataFrame.to_html` for non-ascii text (:issue:`28663`) | |||
- :meth:`Styler.format_null` is now added into the built-in functions to help formatting missing values (:issue:`28358`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u add this into the user guide as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like the name format_nans
better, to be similar to fillna
, hasnans
etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll be glad to!
@topper-123 My understanding is that the "Style" tag is refers to code style, linting stuff, not the Styler class. Do I have this backwards? |
The doc has been added, please have a look! 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've been moving away from null and to NA. So I think this should be called format_na
if anything.
I'm a bit concerned about the implementation. I suspect this would be better handled as an argument to Styler
and a method like set_na_format
(see how uuid is handled). Then the
default_display_func` would check if the value is NA and use the na format if it's NA.
pandas/io/formats/style.py
Outdated
------- | ||
self : Styler | ||
""" | ||
self.format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it will overwrite the formatting of a previously applied formatter for non-NA values. Something like
df.style.format("hi-{}".format).format_null()
is that the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @TomAugspurger, I like the name format_na
! and yes, I was intended to make an overwriting implementation. Actually, I have considered the set_*
approach as you, but it seems confusing for the case:
df.style.format('{:.2%}').set_na_format('-') # got 'nan%' instead of '-'
I've got a new idea, how about interface like this?
.format_na('-', subset=['col1','col2'])
.format('{:.2%}', na_rep='-', subset=['col3','col4'])
And the docstring for format_na
rephrase to:
Format the text display value using default formatter but represent nan as `na_rep`.
For more advanced formatting, use Styler.format() with your custom formatter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may not have been clear about my concern. It's fine that na_format
overwrites the formatting for NA values. I'm concerned tht it overwrites the formatting for non-NA values. In my .format("hi-{}".format).format_na('NA')
example, the NA values should be formatted as 'NA' and the non-NA values should be formatted as hi-<value>
. But I suspect that right now the non-NA formatting is lost (though perhaps it's not).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about adding an na_rep
to the .format
function... That's probably fine. I think it'd still be useful for users to have a way to control the default NA formatting at the table level.
But if we add an na_rep
to format
, then we wouldn't need a new format_na
method, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, so a setting at the table level: self.na_rep
and the def format(self, formatter, subset=None):
becomes
def format(self, formatter=None, subset=None, na_rep=None):
drop .format_na('-')
, use .format(na_rep='-')
instead, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that sounds correct. I'm not sure what the default should be, but probably just None (no special formatting for NA values).
…r.format method Change the implementation to integrate with the original `.format()` method by `na_rep` parameter Add a new table-wise default `na_rep` setting, which can be set through the new `.set_na_rep()` method Also enhanced the `.highlight_null()` method to be able to use `subset` parameter Add a few user guide examples and test cases
Revision done, please help to review, thanks! :D |
I think this PR can also resolve #21527 (ENH: Styler display of NaN / null Values) |
Hi @TomAugspurger, would you be available to review this PR? thanks! 😉 |
Most likely not for a couple weeks.
…On Tue, Oct 29, 2019 at 9:31 AM Max Chen ***@***.***> wrote:
Hi @TomAugspurger <https://github.com/TomAugspurger>, would you be
available to review this PR? thanks! 😉
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#29118?email_source=notifications&email_token=AAKAOIR4YVQVJA2WYY3VZPDQRBCNZA5CNFSM4JCVRYEKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECQWUEY#issuecomment-547449363>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKAOISPN5GHSCPUW5BP4YLQRBCNZANCNFSM4JCVRYEA>
.
|
Ouch. 😂 |
pandas/io/formats/style.py
Outdated
@@ -416,16 +423,20 @@ def format_attr(pair): | |||
table_attributes=table_attr, | |||
) | |||
|
|||
def format(self, formatter, subset=None): | |||
def format(self, formatter=None, subset=None, na_rep=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the point of changing formatter
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you William!
Making formatter
optional can enable this syntax .format(na_rep="-")
for "only" formatting NA display value. otherwise will have to use .format(None, na_rep="-")
-- less intuitive.
pandas/io/formats/style.py
Outdated
@@ -891,6 +908,23 @@ def set_table_styles(self, table_styles): | |||
self.table_styles = table_styles | |||
return self | |||
|
|||
def set_na_rep(self, na_rep): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you annotate this function? I think should be
def set_na_rep(self, na_rep): | |
def set_na_rep(self, na_rep: str) -> "Styler": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem! will do.
pandas/io/formats/style.py
Outdated
@@ -935,19 +969,21 @@ def _highlight_null(v, null_color): | |||
"background-color: {color}".format(color=null_color) if pd.isna(v) else "" | |||
) | |||
|
|||
def highlight_null(self, null_color="red"): | |||
def highlight_null(self, null_color="red", subset=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this related to na_rep
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually not related. 😂
Just making it consistent with highlight_min
and highlight_max
, also for completeness of missing values styling and formatting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea but would prefer if you take this out then. We try to keep PRs to changing one thing at a time as it helps speed up review process
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can always do as a follow up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that is a great advice!
…ortcut-for-missing-values-formatting-gh28358 sync with the latest upstream/master
1. keep formatter as mandatory in `.format` method 2. annotate the new method `.set_na_rep` 3. remove changes in `.highlight_null` to another PR 4. minor refinement to the whats new and user guide
Revision done, please have a look. 😁 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Can you add
- A test exercising passing na_rep to Styler,
Styler(df, na_rep="-")
- A test ensuring that non-numeric NA values are handled correctly? At least NaT for datetime data.
pandas/io/formats/style.py
Outdated
@@ -71,6 +71,11 @@ class Styler: | |||
The ``id`` takes the form ``T_<uuid>_row<num_row>_col<num_col>`` | |||
where ``<uuid>`` is the unique identifier, ``<num_row>`` is the row | |||
number and ``<num_col>`` is the column number. | |||
na_rep : str or None, default None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be str, optional
.
pandas/io/formats/style.py
Outdated
@@ -1480,3 +1518,13 @@ def _maybe_wrap_formatter(formatter): | |||
"instead".format(formatter=formatter) | |||
) | |||
raise TypeError(msg) | |||
|
|||
|
|||
def _maybe_wrap_na_formatter(formatter, na_rep): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be folded into _maybe_wrap_formatter
? I'm finding the multiple layers of wrapping hard to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, all done!
…ortcut-for-missing-values-formatting-gh28358 sync with the latest upstream/master
Hi Tom, William, please have a look at the revision when you are available, thanks~ |
doc/source/whatsnew/v1.0.0.rst
Outdated
@@ -114,6 +114,8 @@ Other enhancements | |||
- Added ``encoding`` argument to :meth:`DataFrame.to_string` for non-ascii text (:issue:`28766`) | |||
- Added ``encoding`` argument to :func:`DataFrame.to_html` for non-ascii text (:issue:`28663`) | |||
- :meth:`Styler.background_gradient` now accepts ``vmin`` and ``vmax`` arguments (:issue:`12145`) | |||
- :class:`Styler` added :meth:`Styler.set_na_rep` method to set default missing values representation for the entire table. | |||
:meth:`Styler.format` added the ``na_rep`` parameter to help format the missing values (:issue:`21527`, :issue:`28358`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just this note is fine (can delete the line above)
pandas/io/formats/style.py
Outdated
@@ -126,6 +131,7 @@ def __init__( | |||
caption=None, | |||
table_attributes=None, | |||
cell_ids=True, | |||
na_rep=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you annotate this? I think Optional[str]
pandas/io/formats/style.py
Outdated
@@ -416,16 +425,22 @@ def format_attr(pair): | |||
table_attributes=table_attr, | |||
) | |||
|
|||
def format(self, formatter, subset=None): | |||
def format(self, formatter, subset=None, na_rep=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
annotation here as well
pandas/io/formats/style.py
Outdated
@@ -1487,14 +1523,22 @@ def _get_level_lengths(index, hidden_elements=None): | |||
return non_zero_lengths | |||
|
|||
|
|||
def _maybe_wrap_formatter(formatter): | |||
def _maybe_wrap_formatter(formatter, na_rep): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here
elif isinstance(na_rep, str): | ||
return lambda x: na_rep if pd.isna(x) else formatter_func(x) | ||
else: | ||
msg = "Expected a string, got {na_rep} instead".format(na_rep=na_rep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test case that hits this?
After adding the annotation for
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Minor issue with the new test rest of the comments are optional and can be done as follow ups
pandas/io/formats/style.py
Outdated
self.ctx = defaultdict(list) | ||
self._todo = [] | ||
self.ctx = defaultdict(list) # type: DefaultDict[Tuple[int, int], List[str]] | ||
self._todo = [] # type: List[Tuple[Callable, Tuple, Dict]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add sub-types for Callable, Tuple, Dict here?
@@ -416,16 +427,22 @@ def format_attr(pair): | |||
table_attributes=table_attr, | |||
) | |||
|
|||
def format(self, formatter, subset=None): | |||
def format(self, formatter, subset=None, na_rep: Optional[str] = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not required here but if you wanted to in a follow up annotate other arguments would be super helpful
def test_format_with_bad_na_rep(self): | ||
# GH 21527 28358 | ||
df = pd.DataFrame([[None, None], [1.1, 1.2]], columns=["A", "B"]) | ||
df.style.format(None, na_rep=-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of marking this as fail should use pytest.raises as a context manager (you'll find other examples throughout tests)
…ortcut-for-missing-values-formatting-gh28358
Thanks @WillAyd, I have revised using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK last edit - I know you want to refine types further later but in the interim can you convert to the Py36 syntax? We have #29741 outstanding which is doing that for existing types, so want to use that in new PRs as well
Hi @WillAyd, revision done for py36 type syntax, please have a look :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice PR this lgtm. @TomAugspurger care to look?
Thanks @immaxchen! |
…ndexing-1row-df * upstream/master: (185 commits) ENH: add BooleanArray extension array (pandas-dev#29555) DOC: Add link to dev calendar and meeting notes (pandas-dev#29737) ENH: Add built-in function for Styler to format the text displayed for missing values (pandas-dev#29118) DEPR: remove statsmodels/seaborn compat shims (pandas-dev#29822) DEPR: remove Index.summary (pandas-dev#29807) DEPR: passing an int to read_excel use_cols (pandas-dev#29795) STY: fstrings in io.pytables (pandas-dev#29758) BUG: Fix melt with mixed int/str columns (pandas-dev#29792) TST: add test for ffill/bfill for non unique multilevel (pandas-dev#29763) Changed description of parse_dates in read_excel(). (pandas-dev#29796) BUG: pivot_table not returning correct type when margin=True and aggfunc='mean' (pandas-dev#28248) REF: Create _lib/window directory (pandas-dev#29817) Fixed small mistake (pandas-dev#29815) minor cleanups (pandas-dev#29798) DEPR: enforce deprecations in core.internals (pandas-dev#29723) add test for unused level raises KeyError (pandas-dev#29760) Add documentation linking to sqlalchemy (pandas-dev#29373) io/parsers: ensure decimal is str on PythonParser (pandas-dev#29743) Reenabled no-unused-function (pandas-dev#29767) CLN:F-string in pandas/_libs/tslibs/*.pyx (pandas-dev#29775) ... # Conflicts: # pandas/tests/frame/indexing/test_indexing.py
…r missing values (pandas-dev#29118) * Add built-in funcion for Styler to format the text displayed for missing values As described in GH pandas-dev#28358, user who wants to control how NA values are printed while applying styles to the output will have to implement their own formatter. (so that the underlying data will not change and can be used for styling)
…r missing values (pandas-dev#29118) * Add built-in funcion for Styler to format the text displayed for missing values As described in GH pandas-dev#28358, user who wants to control how NA values are printed while applying styles to the output will have to implement their own formatter. (so that the underlying data will not change and can be used for styling)
As described in the issues, user who wants to control how NA values are printed
while applying styles to the output will have to implement their own formatter.
(so that the underlying data will not change and can be used for styling)
Since the behavior is common in styling (for reports etc.), suggest to add this
shortcut function to enable users format their NA values as something like '--'
or 'Not Available' easily.
EDIT:
.format()
usingna_rep
argumentna_rep
setting, which can be set through.set_na_rep()
example usage:
df.style.highlight_max().format(None, na_rep="-")
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff