Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Validate keyword arguments to fillna #19684

Merged
merged 10 commits into from
Feb 22, 2018
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -570,6 +570,7 @@ Other API Changes
- Set operations (union, difference...) on :class:`IntervalIndex` with incompatible index types will now raise a ``TypeError`` rather than a ``ValueError`` (:issue:`19329`)
- :class:`DateOffset` objects render more simply, e.g. "<DateOffset: days=1>" instead of "<DateOffset: kwds={'days': 1}>" (:issue:`19403`)
- :func:`pandas.merge` provides a more informative error message when trying to merge on timezone-aware and timezone-naive columns (:issue:`15800`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a datetimelike section for other api changes now

- ``Categorical.fillna`` now validates its ``value`` and ``method`` keyword arguments. It now raises when both or none are specified, matching the behavior of :meth:`Series.fillna` (:issue:`19682`)

.. _whatsnew_0230.deprecations:

Expand Down
5 changes: 4 additions & 1 deletion pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
Appender, cache_readonly, deprecate_kwarg, Substitution)

from pandas.io.formats.terminal import get_terminal_size
from pandas.util._validators import validate_bool_kwarg
from pandas.util._validators import validate_bool_kwarg, validate_fillna_kwargs
from pandas.core.config import get_option

from .base import ExtensionArray
Expand Down Expand Up @@ -1607,6 +1607,9 @@ def fillna(self, value=None, method=None, limit=None):
-------
filled : Categorical with NA/NaN filled
"""
value, method = validate_fillna_kwargs(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, I added this keyword since it's possible have a tuple / list for a category.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add a test for this? (I mean filling a categorical with such categories, so the purpose of validate_scalar_dict_value=False is exercised)

value, method, validate_scalar_dict_value=False
)

if value is None:
value = np.nan
Expand Down
12 changes: 3 additions & 9 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@
import pandas.core.nanops as nanops
from pandas.util._decorators import (Appender, Substitution,
deprecate_kwarg)
from pandas.util._validators import validate_bool_kwarg
from pandas.util._validators import validate_bool_kwarg, validate_fillna_kwargs
from pandas.core import config

# goal is to be able to define the docs close to function, while still being
Expand Down Expand Up @@ -4697,10 +4697,8 @@ def infer_objects(self):
def fillna(self, value=None, method=None, axis=None, inplace=False,
limit=None, downcast=None):
inplace = validate_bool_kwarg(inplace, 'inplace')
value, method = validate_fillna_kwargs(value, method)

if isinstance(value, (list, tuple)):
raise TypeError('"value" parameter must be a scalar or dict, but '
'you passed a "{0}"'.format(type(value).__name__))
self._consolidate_inplace()

# set the default here, so functions examining the signaure
Expand All @@ -4711,8 +4709,7 @@ def fillna(self, value=None, method=None, axis=None, inplace=False,
method = missing.clean_fill_method(method)
from pandas import DataFrame
if value is None:
if method is None:
raise ValueError('must specify a fill method or value')

if self._is_mixed_type and axis == 1:
if inplace:
raise NotImplementedError()
Expand Down Expand Up @@ -4746,9 +4743,6 @@ def fillna(self, value=None, method=None, axis=None, inplace=False,
coerce=True,
downcast=downcast)
else:
if method is not None:
raise ValueError('cannot specify both a fill method and value')

if len(self._get_axis(axis)) == 0:
return self

Expand Down
16 changes: 16 additions & 0 deletions pandas/tests/categorical/test_missing.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# -*- coding: utf-8 -*-

import numpy as np
import pytest

import pandas.util.testing as tm
from pandas import (Categorical, Index, isna)
Expand Down Expand Up @@ -53,3 +54,18 @@ def test_set_item_nan(self):

exp = Categorical([1, np.nan, 3], categories=[1, 2, 3])
tm.assert_categorical_equal(cat, exp)

@pytest.mark.parametrize('fillna_kwargs, msg', [
(dict(value=1, method='ffill'),
"Cannot specify both 'value' and 'method'."),
(dict(),
"Must specify a fill 'value' or 'method'."),
(dict(method='bad'),
"Invalid fill method. Expecting .* bad"),
])
def test_fillna_raises(self, fillna_kwargs, msg):
# https://github.com/pandas-dev/pandas/issues/19682
cat = Categorical([1, 2, 3])

with tm.assert_raises_regex(ValueError, msg):
cat.fillna(**fillna_kwargs)
36 changes: 36 additions & 0 deletions pandas/util/_validators.py
Original file line number Diff line number Diff line change
Expand Up @@ -320,3 +320,39 @@ def validate_axis_style_args(data, args, kwargs, arg_name, method_name):
msg = "Cannot specify all of '{}', 'index', 'columns'."
raise TypeError(msg.format(arg_name))
return out


def validate_fillna_kwargs(value, method, validate_scalar_dict_value=True):
"""Validate the keyword arguments to 'fillna'.

This checks that exactly one of 'value' and 'method' is specified.
If 'method' is specified, this validates that it's a valid method.

Parameters
----------
value, method : object
The 'value' and 'method' keyword arguments for 'fillna'.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to validate_scalar_dict_value in the docstring?

validate_scalar_dict_value : bool, default True
Whether to validate that 'value' is a scalar or dict; specifically
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: replace the semicolon with a comma.

that it is not a list or tuple.

Returns
-------
value, method : object
"""
from pandas.core.missing import clean_fill_method

if value is None and method is None:
raise ValueError("Must specify a fill 'value' or 'method'.")
elif value is None and method is not None:
method = clean_fill_method(method)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra line here (make consistent across this function)

elif value is not None and method is None:
if validate_scalar_dict_value and isinstance(value, (list, tuple)):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check doesn't seem especially useful / correct given the error message.

There are many things that are not scalars or dicts, but also not lists or tuples, so we would fail to hit this TypeError.

But I was trying to keep this as a straight refactor for NDFrame.fillna, with the only behavior change being Categorical.fillna.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont' think this is odd to want to validate that a scalar can be only a certain type. maybe would change this

validate_scalar=lambda x: is_scalar(x) or is_dict_like(x)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this parameter name is terrible, ideally would like to change this

Copy link
Contributor Author

@TomAugspurger TomAugspurger Feb 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  1. Please don't call a change by a contributor "terrible". It's not helpful.
  2. The error message says "'value' parameter must be a scalar or dict". That seems to match the parameter name pretty well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the parameter name is terrible, obviously NOT the contributor, which I didn't say or imply at all. It needs to be changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter name is certainly descriptive in what it does, and it is for an internal method, so it's not much of a problem it is that long. Why bother for the rest?

raise TypeError('"value" parameter must be a scalar or dict, but '
'you passed a "{0}"'.format(type(value).__name__))

elif value is not None and method is not None:
raise ValueError("Cannot specify both 'value' and 'method'.")

return value, method