Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Depr] raise_on_error kwarg with errors kwarg in astype#14878 #14967

Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.20.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ Deprecations
- ``Index.repeat()`` and ``MultiIndex.repeat()`` have deprecated the ``n`` parameter in favor of ``repeats`` (:issue:`12662`)
- ``Categorical.searchsorted()`` and ``Series.searchsorted()`` have deprecated the ``v`` parameter in favor of ``value`` (:issue:`12662`)
- ``TimedeltaIndex.searchsorted()``, ``DatetimeIndex.searchsorted()``, and ``PeriodIndex.searchsorted()`` have deprecated the ``key`` parameter in favor of ``value`` (:issue:`12662`)

- ``DataFrame.astype()`` has deprecated the ``raise_on_error`` parameter in favor of ``errors`` (:issue:`14878`)



Expand Down
19 changes: 14 additions & 5 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -3061,7 +3061,9 @@ def blocks(self):
"""Internal property, property synonym for as_blocks()"""
return self.as_blocks()

def astype(self, dtype, copy=True, raise_on_error=True, **kwargs):
@deprecate_kwarg(old_arg_name='raise_on_error', new_arg_name='errors',
mapping={True: 'raise', False: 'ignore'})
def astype(self, dtype, copy=True, errors='raise', **kwargs):
"""
Cast object to input numpy.dtype
Return a copy when copy = True (be really careful with this!)
Expand All @@ -3073,7 +3075,14 @@ def astype(self, dtype, copy=True, raise_on_error=True, **kwargs):
the same type. Alternatively, use {col: dtype, ...}, where col is a
column label and dtype is a numpy.dtype or Python type to cast one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave this in here and just say DEPRECATED.

@jorisvandenbossche IIRC that is our convention?

add a versionadded tag 0.20.0 for errors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, can leave it (but put it at the end)

or more of the DataFrame's columns to column-specific types.
raise_on_error : raise on invalid input
raise_on_error : raise on invalid input. DEPRECATED use ``errors``
instead
errors : {'raise', 'ignore'}, default 'raise'
- ``raise`` : allow exceptions to be raised on invalid input
- ``ignore`` : suppress raising exceptions on invalid input
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe specify what happens of there is an error? (original is returned)


.. versionadded:: 0.20.0

kwargs : keyword arguments to pass on to the constructor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need a blank line before kwargs (to make the sub-list work)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just checked and the sublist is rendered fine with, or without a line between ignore
and kwargs. I can add an extra line if that is the convention,


Returns
Expand All @@ -3086,7 +3095,7 @@ def astype(self, dtype, copy=True, raise_on_error=True, **kwargs):
raise KeyError('Only the Series name can be used for '
'the key in Series dtype mappings.')
new_type = list(dtype.values())[0]
return self.astype(new_type, copy, raise_on_error, **kwargs)
return self.astype(new_type, copy, errors, **kwargs)
elif self.ndim > 2:
raise NotImplementedError(
'astype() only accepts a dtype arg of type dict when '
Expand All @@ -3107,8 +3116,8 @@ def astype(self, dtype, copy=True, raise_on_error=True, **kwargs):
return concat(results, axis=1, copy=False)

# else, only a single dtype is given
new_data = self._data.astype(dtype=dtype, copy=copy,
raise_on_error=raise_on_error, **kwargs)
new_data = self._data.astype(dtype=dtype, copy=copy, errors=errors,
**kwargs)
return self._constructor(new_data).__finalize__(self)

def copy(self, deep=True):
Expand Down
20 changes: 13 additions & 7 deletions pandas/core/internals.py
Original file line number Diff line number Diff line change
Expand Up @@ -455,17 +455,23 @@ def downcast(self, dtypes=None, mgr=None):

return blocks

def astype(self, dtype, copy=False, raise_on_error=True, values=None,
**kwargs):
return self._astype(dtype, copy=copy, raise_on_error=raise_on_error,
values=values, **kwargs)
def astype(self, dtype, copy=False, errors='raise', values=None, **kwargs):
return self._astype(dtype, copy=copy, errors=errors, values=values,
**kwargs)

def _astype(self, dtype, copy=False, raise_on_error=True, values=None,
def _astype(self, dtype, copy=False, errors='raise', values=None,
klass=None, mgr=None, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check the errors in ['raise', 'ignore'] at the beginning of the function and raise a ValueError otherwise (and add a test for this)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, there are two 'public' astype(...) methods:

  • NDFrame.astype(...) in pandas/core/generic.py
  • Block.astype(...) in pandas/core/internals.py

In addition there is a 'protected' Block._astype(...) method in
pandas/core/internals.py. Should I only put the checks in the 'public'
methods?

Bearing in mind that the raise_on_error kwarg is going to be deprecated for
DataFrame.where() and replaced with the errors kwarg it would make sense to
put the code that checks the validity of the arguments in one place. Do we have
any existing code where we put such validity checking functions/methods?

I notice that both NDFrame & Block inherit from PandasObject but, I'm not
sure that this is the correct thing to do. Should I put a function in
pandas/core/base.py?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Block is completely internal
u can put the check there

we have centralized checks but no need in this case

out in _astype as that's where it's actually used
parameter validation is best down where it's actually

"""
Coerce to the new type (if copy=True, return a new copy)
raise on an except if raise == True
"""
errors_legal_values = ('raise', 'ignore')

if errors not in errors_legal_values:
invalid_arg = "Expected value of kwarg 'errors' to be one of %s. "\
"Supplied value is '%s'" % (', '.join("'%s'" % arg for arg in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style comment: can you do the string continuation with putting ( ) around it instead of the \. And can you use ´.format(..)´ instead of % (can be on a separate line)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. I'll change the output message to:

Expected value of kwarg 'errors' to be one of ['raise', 'ignore']. Supplied value 'True'

This will do away with the need for the messy formatting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@m-charlton can you change this to the {} formatting syntax?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

errors_legal_values), errors)
raise ValueError(invalid_arg)

# may need to convert to categorical
# this is only called for non-categoricals
Expand Down Expand Up @@ -507,7 +513,7 @@ def _astype(self, dtype, copy=False, raise_on_error=True, values=None,
newb = make_block(values, placement=self.mgr_locs, dtype=dtype,
klass=klass)
except:
if raise_on_error is True:
if errors == 'raise':
raise
newb = self.copy() if copy else self

Expand Down Expand Up @@ -2147,7 +2153,7 @@ def take_nd(self, indexer, axis=0, new_mgr_locs=None, fill_tuple=None):

return self.make_block_same_class(new_values, new_mgr_locs)

def _astype(self, dtype, copy=False, raise_on_error=True, values=None,
def _astype(self, dtype, copy=False, errors='raise', values=None,
klass=None, mgr=None):
"""
Coerce to the new type (if copy=True, return a new copy)
Expand Down
4 changes: 2 additions & 2 deletions pandas/tests/frame/test_dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -357,15 +357,15 @@ def test_astype_with_exclude_string(self):
df = self.frame.copy()
expected = self.frame.astype(int)
df['string'] = 'foo'
casted = df.astype(int, raise_on_error=False)
casted = df.astype(int, errors='ignore')

expected['string'] = 'foo'
assert_frame_equal(casted, expected)

df = self.frame.copy()
expected = self.frame.astype(np.int32)
df['string'] = 'foo'
casted = df.astype(np.int32, raise_on_error=False)
casted = df.astype(np.int32, errors='ignore')

expected['string'] = 'foo'
assert_frame_equal(casted, expected)
Expand Down
9 changes: 8 additions & 1 deletion pandas/tests/test_internals.py
Original file line number Diff line number Diff line change
Expand Up @@ -553,7 +553,7 @@ def test_astype(self):
'e: f4; f: f2; g: f8')
for t in ['float16', 'float32', 'float64', 'int32', 'int64']:
t = np.dtype(t)
tmgr = mgr.astype(t, raise_on_error=False)
tmgr = mgr.astype(t, errors='ignore')
self.assertEqual(tmgr.get('c').dtype.type, t)
self.assertEqual(tmgr.get('e').dtype.type, t)
self.assertEqual(tmgr.get('f').dtype.type, t)
Expand All @@ -566,6 +566,13 @@ def test_astype(self):
else:
self.assertEqual(tmgr.get('d').dtype.type, t)

def test_illegal_arg_for_errors_in_astype(self):
""" ValueError exception raised when illegal value used for errors """
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you turn this into a normal comment (with #)? Otherwise the test name does not show up in the nosetests output.
You can also add the issue number

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also move the test to the high-level astype tests for frame/series

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked at the existing test cases in pandas/tests/series &
pandas/tests/frame and can't find a natural place to put this test.

I was thinking of adding a new file called say test_astype.py in
pandas/tests/frame, as there is an existing test_apply.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no just grep for astype and you will see lots of tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok this test is fine here.

Copy link
Member

@jorisvandenbossche jorisvandenbossche Jan 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@m-charlton astype tests are located in pandas/tests/frame/test_dtypes.py (you actually updated some tests there)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

mgr = create_mgr('a,b,c: i8')

with self.assertRaises(ValueError):
mgr.astype(np.float64, errors=True)

def test_convert(self):
def _compare(old_mgr, new_mgr):
""" compare the blocks, numeric compare ==, object don't """
Expand Down