ENH: Add built-in function for Styler to format the text displayed for missing values #29118

immaxchen · 2019-10-20T18:49:38Z

As described in the issues, user who wants to control how NA values are printed
while applying styles to the output will have to implement their own formatter.
(so that the underlying data will not change and can be used for styling)

Since the behavior is common in styling (for reports etc.), suggest to add this
shortcut function to enable users format their NA values as something like '--'
or 'Not Available' easily.

EDIT:

Change implementation to integrate with the original .format() using na_rep argument
Add a new table-wise default na_rep setting, which can be set through .set_na_rep()
Minor fix: formatter should be wrapped outside of locs-loop
Added a few user guide examples and test cases

example usage: df.style.highlight_max().format(None, na_rep="-")

closes ENH: Styler display of NaN / null Values #21527 and passing string value to pandas.DataFrame.fillna() & pandas.PivotTable(fill_value) 'breaks' pandas.DataFrame.style.highlight_* inside jupyter notebook #28358
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

…ing values As described in GH pandas-dev#28358, user who wants to control how NA values are printed while applying styles to the output will have to implement their own formatter. (so that the underlying data will not change and can be used for styling) Since the behavior is common in styling (for reports etc.), suggest to add this shortcut function to enable users format their NA values as something like '--' or 'Not Available' easily. example usage: `df.style.highlight_max().format_null('--')`

jreback · 2019-10-20T18:57:43Z

doc/source/whatsnew/v1.0.0.rst

@@ -110,6 +110,7 @@ Other enhancements
 - :meth:`DataFrame.to_json` now accepts an ``indent`` integer argument to enable pretty printing of JSON output (:issue:`12004`)
 - :meth:`read_stata` can read Stata 119 dta files. (:issue:`28250`)
 - Added ``encoding`` argument to :func:`DataFrame.to_html` for non-ascii text (:issue:`28663`)
+- :meth:`Styler.format_null` is now added into the built-in functions to help formatting missing values (:issue:`28358`)


can u add this into the user guide as well

I'd like the name format_nans better, to be similar to fillna, hasnans etc.

@jreback?

Sure, I'll be glad to!

jbrockmendel · 2019-10-21T01:19:57Z

@topper-123 My understanding is that the "Style" tag is refers to code style, linting stuff, not the Styler class. Do I have this backwards?

immaxchen · 2019-10-21T17:07:29Z

The doc has been added, please have a look! 😄
How about the method name? do we have a conclusion?

TomAugspurger

We've been moving away from null and to NA. So I think this should be called format_na if anything.

I'm a bit concerned about the implementation. I suspect this would be better handled as an argument to Styler and a method like set_na_format (see how uuid is handled). Then the default_display_func` would check if the value is NA and use the na format if it's NA.

TomAugspurger · 2019-10-21T17:46:55Z

pandas/io/formats/style.py

+        -------
+        self : Styler
+        """
+        self.format(


This looks like it will overwrite the formatting of a previously applied formatter for non-NA values. Something like

df.style.format("hi-{}".format).format_null()

is that the case?

Thanks @TomAugspurger, I like the name format_na! and yes, I was intended to make an overwriting implementation. Actually, I have considered the set_* approach as you, but it seems confusing for the case:

df.style.format('{:.2%}').set_na_format('-') # got 'nan%' instead of '-'

I've got a new idea, how about interface like this?

.format_na('-', subset=['col1','col2']) .format('{:.2%}', na_rep='-', subset=['col3','col4'])

And the docstring for format_na rephrase to:

Format the text display value using default formatter but represent nan as `na_rep`. For more advanced formatting, use Styler.format() with your custom formatter.

I may not have been clear about my concern. It's fine that na_format overwrites the formatting for NA values. I'm concerned tht it overwrites the formatting for non-NA values. In my .format("hi-{}".format).format_na('NA') example, the NA values should be formatted as 'NA' and the non-NA values should be formatted as hi-<value>. But I suspect that right now the non-NA formatting is lost (though perhaps it's not).

Not sure about adding an na_rep to the .format function... That's probably fine. I think it'd still be useful for users to have a way to control the default NA formatting at the table level.

But if we add an na_rep to format, then we wouldn't need a new format_na method, right?

Sounds good, so a setting at the table level: self.na_rep
and the def format(self, formatter, subset=None): becomes
def format(self, formatter=None, subset=None, na_rep=None):
drop .format_na('-'), use .format(na_rep='-') instead, right?

I think that sounds correct. I'm not sure what the default should be, but probably just None (no special formatting for NA values).

…r.format method Change the implementation to integrate with the original `.format()` method by `na_rep` parameter Add a new table-wise default `na_rep` setting, which can be set through the new `.set_na_rep()` method Also enhanced the `.highlight_null()` method to be able to use `subset` parameter Add a few user guide examples and test cases

immaxchen · 2019-10-23T12:50:45Z

Revision done, please help to review, thanks! :D

immaxchen · 2019-10-27T14:35:13Z

I think this PR can also resolve #21527 (ENH: Styler display of NaN / null Values)

immaxchen · 2019-10-29T14:31:36Z

Hi @TomAugspurger, would you be available to review this PR? thanks! 😉

TomAugspurger · 2019-10-29T16:57:55Z

Most likely not for a couple weeks.

…

On Tue, Oct 29, 2019 at 9:31 AM Max Chen ***@***.***> wrote: Hi @TomAugspurger <https://github.com/TomAugspurger>, would you be available to review this PR? thanks! 😉 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#29118?email_source=notifications&email_token=AAKAOIR4YVQVJA2WYY3VZPDQRBCNZA5CNFSM4JCVRYEKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECQWUEY#issuecomment-547449363>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKAOISPN5GHSCPUW5BP4YLQRBCNZANCNFSM4JCVRYEA> .

immaxchen · 2019-10-30T14:46:50Z

Ouch. 😂
@jreback can you help to review? 😁

WillAyd · 2019-10-30T16:24:57Z

pandas/io/formats/style.py

@@ -416,16 +423,20 @@ def format_attr(pair):
            table_attributes=table_attr,
        )

-    def format(self, formatter, subset=None):
+    def format(self, formatter=None, subset=None, na_rep=None):


What was the point of changing formatter here?

Thank you William!
Making formatter optional can enable this syntax .format(na_rep="-") for "only" formatting NA display value. otherwise will have to use .format(None, na_rep="-") -- less intuitive.

WillAyd · 2019-10-30T16:44:17Z

pandas/io/formats/style.py

@@ -891,6 +908,23 @@ def set_table_styles(self, table_styles):
        self.table_styles = table_styles
        return self

+    def set_na_rep(self, na_rep):


Can you annotate this function? I think should be

Suggested change

def set_na_rep(self, na_rep):

def set_na_rep(self, na_rep: str) -> "Styler":

No problem! will do.

WillAyd · 2019-10-30T16:45:05Z

pandas/io/formats/style.py

@@ -935,19 +969,21 @@ def _highlight_null(v, null_color):
            "background-color: {color}".format(color=null_color) if pd.isna(v) else ""
        )

-    def highlight_null(self, null_color="red"):
+    def highlight_null(self, null_color="red", subset=None):


How is this related to na_rep?

Actually not related. 😂
Just making it consistent with highlight_min and highlight_max, also for completeness of missing values styling and formatting.

Nice idea but would prefer if you take this out then. We try to keep PRs to changing one thing at a time as it helps speed up review process

Can always do as a follow up PR

Thanks, that is a great advice!

…ortcut-for-missing-values-formatting-gh28358 sync with the latest upstream/master

1. keep formatter as mandatory in `.format` method 2. annotate the new method `.set_na_rep` 3. remove changes in `.highlight_null` to another PR 4. minor refinement to the whats new and user guide

immaxchen · 2019-11-02T16:40:38Z

Revision done, please have a look. 😁

TomAugspurger

Thanks. Can you add

A test exercising passing na_rep to Styler, Styler(df, na_rep="-")
A test ensuring that non-numeric NA values are handled correctly? At least NaT for datetime data.

TomAugspurger · 2019-11-03T13:01:51Z

pandas/io/formats/style.py

@@ -71,6 +71,11 @@ class Styler:
        The ``id`` takes the form ``T_<uuid>_row<num_row>_col<num_col>``
        where ``<uuid>`` is the unique identifier, ``<num_row>`` is the row
        number and ``<num_col>`` is the column number.
+    na_rep : str or None, default None


I think this should be str, optional.

TomAugspurger · 2019-11-03T13:04:26Z

pandas/io/formats/style.py

@@ -1480,3 +1518,13 @@ def _maybe_wrap_formatter(formatter):
            "instead".format(formatter=formatter)
        )
        raise TypeError(msg)
+
+
+def _maybe_wrap_na_formatter(formatter, na_rep):


Can this be folded into _maybe_wrap_formatter? I'm finding the multiple layers of wrapping hard to follow.

No problem, all done!

…ortcut-for-missing-values-formatting-gh28358 sync with the latest upstream/master

immaxchen · 2019-11-06T14:48:52Z

Hi Tom, William, please have a look at the revision when you are available, thanks~

WillAyd · 2019-11-09T19:23:34Z

doc/source/whatsnew/v1.0.0.rst

@@ -114,6 +114,8 @@ Other enhancements
 - Added ``encoding`` argument to :meth:`DataFrame.to_string` for non-ascii text (:issue:`28766`)
 - Added ``encoding`` argument to :func:`DataFrame.to_html` for non-ascii text (:issue:`28663`)
 - :meth:`Styler.background_gradient` now accepts ``vmin`` and ``vmax`` arguments (:issue:`12145`)
+- :class:`Styler` added :meth:`Styler.set_na_rep` method to set default missing values representation for the entire table.
+  :meth:`Styler.format` added the ``na_rep`` parameter to help format the missing values (:issue:`21527`, :issue:`28358`)


I think just this note is fine (can delete the line above)

WillAyd · 2019-11-09T19:23:56Z

pandas/io/formats/style.py

@@ -126,6 +131,7 @@ def __init__(
        caption=None,
        table_attributes=None,
        cell_ids=True,
+        na_rep=None,


Can you annotate this? I think Optional[str]

WillAyd · 2019-11-09T19:24:15Z

pandas/io/formats/style.py

@@ -416,16 +425,22 @@ def format_attr(pair):
            table_attributes=table_attr,
        )

-    def format(self, formatter, subset=None):
+    def format(self, formatter, subset=None, na_rep=None):


annotation here as well

WillAyd · 2019-11-09T19:24:39Z

pandas/io/formats/style.py

@@ -1487,14 +1523,22 @@ def _get_level_lengths(index, hidden_elements=None):
    return non_zero_lengths


-def _maybe_wrap_formatter(formatter):
+def _maybe_wrap_formatter(formatter, na_rep):


WillAyd · 2019-11-09T19:35:41Z

pandas/io/formats/style.py

+    elif isinstance(na_rep, str):
+        return lambda x: na_rep if pd.isna(x) else formatter_func(x)
+    else:
+        msg = "Expected a string, got {na_rep} instead".format(na_rep=na_rep)


Can you add a test case that hits this?

immaxchen · 2019-11-12T14:37:13Z

After adding the annotation for na_rep, mypy complained as below, so I have no choice but to add more annotations. 😂

pandas/io/formats/style.py:136: error: Need type annotation for 'ctx'
pandas/io/formats/style.py:137: error: Need type annotation for '_todo' (hint: "_todo: List[<type>] = ...")
pandas/io/formats/style.py:158: error: Need type annotation for 'hidden_columns' (hint: "hidden_columns: List[<type>] = ...")
pandas/io/formats/style.py:173: error: Need type annotation for '_display_funcs'
pandas/io/formats/style.py:470: error: "None" not callable

WillAyd

Looks good. Minor issue with the new test rest of the comments are optional and can be done as follow ups

WillAyd · 2019-11-18T01:52:03Z

pandas/io/formats/style.py

-        self.ctx = defaultdict(list)
-        self._todo = []
+        self.ctx = defaultdict(list)  # type: DefaultDict[Tuple[int, int], List[str]]
+        self._todo = []  # type: List[Tuple[Callable, Tuple, Dict]]


Is it possible to add sub-types for Callable, Tuple, Dict here?

WillAyd · 2019-11-18T01:52:42Z

pandas/io/formats/style.py

@@ -416,16 +427,22 @@ def format_attr(pair):
            table_attributes=table_attr,
        )

-    def format(self, formatter, subset=None):
+    def format(self, formatter, subset=None, na_rep: Optional[str] = None):


Not required here but if you wanted to in a follow up annotate other arguments would be super helpful

WillAyd · 2019-11-18T01:54:33Z

pandas/tests/io/formats/test_style.py

+    def test_format_with_bad_na_rep(self):
+        # GH 21527 28358
+        df = pd.DataFrame([[None, None], [1.1, 1.2]], columns=["A", "B"])
+        df.style.format(None, na_rep=-1)


Instead of marking this as fail should use pytest.raises as a context manager (you'll find other examples throughout tests)

…ortcut-for-missing-values-formatting-gh28358

immaxchen · 2019-11-21T15:55:25Z

Thanks @WillAyd, I have revised using pytest.raises and resolve merge conflict. Since this PR is getting quite lengthy, I will try to add annotations in the follow-ups.

WillAyd

OK last edit - I know you want to refine types further later but in the interim can you convert to the Py36 syntax? We have #29741 outstanding which is doing that for existing types, so want to use that in new PRs as well

immaxchen · 2019-11-24T15:41:55Z

Hi @WillAyd, revision done for py36 type syntax, please have a look :)

WillAyd

Nice PR this lgtm. @TomAugspurger care to look?

TomAugspurger · 2019-11-25T13:54:28Z

Thanks @immaxchen!

…ndexing-1row-df * upstream/master: (185 commits) ENH: add BooleanArray extension array (pandas-dev#29555) DOC: Add link to dev calendar and meeting notes (pandas-dev#29737) ENH: Add built-in function for Styler to format the text displayed for missing values (pandas-dev#29118) DEPR: remove statsmodels/seaborn compat shims (pandas-dev#29822) DEPR: remove Index.summary (pandas-dev#29807) DEPR: passing an int to read_excel use_cols (pandas-dev#29795) STY: fstrings in io.pytables (pandas-dev#29758) BUG: Fix melt with mixed int/str columns (pandas-dev#29792) TST: add test for ffill/bfill for non unique multilevel (pandas-dev#29763) Changed description of parse_dates in read_excel(). (pandas-dev#29796) BUG: pivot_table not returning correct type when margin=True and aggfunc='mean' (pandas-dev#28248) REF: Create _lib/window directory (pandas-dev#29817) Fixed small mistake (pandas-dev#29815) minor cleanups (pandas-dev#29798) DEPR: enforce deprecations in core.internals (pandas-dev#29723) add test for unused level raises KeyError (pandas-dev#29760) Add documentation linking to sqlalchemy (pandas-dev#29373) io/parsers: ensure decimal is str on PythonParser (pandas-dev#29743) Reenabled no-unused-function (pandas-dev#29767) CLN:F-string in pandas/_libs/tslibs/*.pyx (pandas-dev#29775) ... # Conflicts: # pandas/tests/frame/indexing/test_indexing.py

…r missing values (pandas-dev#29118) * Add built-in funcion for Styler to format the text displayed for missing values As described in GH pandas-dev#28358, user who wants to control how NA values are printed while applying styles to the output will have to implement their own formatter. (so that the underlying data will not change and can be used for styling)

jreback reviewed Oct 20, 2019

View reviewed changes

topper-123 changed the title ~~Add built-in function for Styler to format the text displayed for missing values~~ ENH: Add built-in function for Styler to format the text displayed for missing values Oct 20, 2019

topper-123 added Code Style Code style, linting, code_checks Enhancement labels Oct 20, 2019

Add Styler.format_null into user_guide and reference Doc

01632ce

TomAugspurger reviewed Oct 21, 2019

View reviewed changes

simonjayhawkins added IO HTML read_html, to_html, Styler.apply, Styler.applymap and removed Code Style Code style, linting, code_checks labels Oct 21, 2019

immaxchen added 3 commits October 23, 2019 07:36

resolve merge conflicts

7a5dd65

add more tests for styling with NA values

da3cb43

immaxchen requested a review from TomAugspurger October 24, 2019 12:28

WillAyd requested changes Oct 30, 2019

View reviewed changes

immaxchen added 2 commits November 2, 2019 21:46

Merge remote-tracking branch 'upstream/master' into styler-builtin-sh…

bdfff98

…ortcut-for-missing-values-formatting-gh28358 sync with the latest upstream/master

revision based on the requested changes

b86bdc6

1. keep formatter as mandatory in `.format` method 2. annotate the new method `.set_na_rep` 3. remove changes in `.highlight_null` to another PR 4. minor refinement to the whats new and user guide

TomAugspurger reviewed Nov 3, 2019

View reviewed changes

immaxchen added 2 commits November 3, 2019 23:16

add tests, enhance doc string and formatter wrapping

a1e9a9e

Merge remote-tracking branch 'upstream/master' into styler-builtin-sh…

def71c9

…ortcut-for-missing-values-formatting-gh28358 sync with the latest upstream/master

resolve merge conflict

af396b1

immaxchen requested a review from TomAugspurger November 9, 2019 09:31

immaxchen requested a review from WillAyd November 9, 2019 09:31

WillAyd requested changes Nov 9, 2019

View reviewed changes

add type-hint and xfail test

3d4cfd0

immaxchen requested a review from WillAyd November 17, 2019 07:02

WillAyd requested changes Nov 18, 2019

View reviewed changes

immaxchen added 2 commits November 19, 2019 22:25

revise test using pytest.raises

bd99db9

Merge remote-tracking branch 'upstream/master' into styler-builtin-sh…

346eee6

…ortcut-for-missing-values-formatting-gh28358

WillAyd requested changes Nov 21, 2019

View reviewed changes

using py36 syntax for annotations

7935359

WillAyd approved these changes Nov 24, 2019

View reviewed changes

TomAugspurger approved these changes Nov 25, 2019

View reviewed changes

TomAugspurger merged commit cc3daa6 into pandas-dev:master Nov 25, 2019

TomAugspurger added this to the 1.0 milestone Nov 25, 2019

This was referenced Jan 27, 2020

ENH: Styler.highlight_null to accept subset argument #31345

Closed

ENH: Styler.highlight_null can accepts subset argument #31350

Merged

	def set_na_rep(self, na_rep):
	def set_na_rep(self, na_rep: str) -> "Styler":

ENH: Add built-in function for Styler to format the text displayed for missing values #29118

ENH: Add built-in function for Styler to format the text displayed for missing values #29118

Conversation

immaxchen commented Oct 20, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Oct 21, 2019

immaxchen commented Oct 21, 2019

TomAugspurger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

immaxchen Oct 22, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

immaxchen commented Oct 23, 2019

immaxchen commented Oct 27, 2019

immaxchen commented Oct 29, 2019

TomAugspurger commented Oct 29, 2019 via email

immaxchen commented Oct 30, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

immaxchen commented Nov 2, 2019

TomAugspurger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

immaxchen commented Nov 6, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

immaxchen commented Nov 12, 2019 • edited Loading

WillAyd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

immaxchen commented Nov 21, 2019

WillAyd left a comment

Choose a reason for hiding this comment

immaxchen commented Nov 24, 2019

WillAyd left a comment

Choose a reason for hiding this comment

TomAugspurger commented Nov 25, 2019

immaxchen commented Oct 20, 2019 •

edited

Loading

immaxchen Oct 22, 2019 •

edited

Loading

immaxchen commented Nov 12, 2019 •

edited

Loading