Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Raise when casting NaT to int #28492

Merged
merged 18 commits into from
Jan 1, 2020
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -781,6 +781,7 @@ Datetimelike
- Bug in :class:`Timestamp` subtraction when subtracting a :class:`Timestamp` from a ``np.datetime64`` object incorrectly raising ``TypeError`` (:issue:`28286`)
- Addition and subtraction of integer or integer-dtype arrays with :class:`Timestamp` will now raise ``NullFrequencyError`` instead of ``ValueError`` (:issue:`28268`)
- Bug in :class:`Series` and :class:`DataFrame` with integer dtype failing to raise ``TypeError`` when adding or subtracting a ``np.datetime64`` object (:issue:`28080`)
- Bug in :meth:`Series.astype`, :meth:`Index.astype`, and :meth:`DataFrame.astype` failing to handle ``NaT`` when casting to an integer dtype (:issue:`28492`)
- Bug in :class:`Week` with ``weekday`` incorrectly raising ``AttributeError`` instead of ``TypeError`` when adding or subtracting an invalid type (:issue:`28530`)
- Bug in :class:`DataFrame` arithmetic operations when operating with a :class:`Series` with dtype `'timedelta64[ns]'` (:issue:`28049`)
- Bug in :func:`pandas.core.groupby.generic.SeriesGroupBy.apply` raising ``ValueError`` when a column in the original DataFrame is a datetime and the column labels are not standard integers (:issue:`28247`)
Expand Down
4 changes: 4 additions & 0 deletions pandas/core/dtypes/cast.py
Original file line number Diff line number Diff line change
Expand Up @@ -823,6 +823,8 @@ def astype_nansafe(arr, dtype, copy: bool = True, skipna: bool = False):
if is_object_dtype(dtype):
return tslib.ints_to_pydatetime(arr.view(np.int64))
elif dtype == np.int64:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this likely needs to be is_integer_dtype(dtype)

Copy link
Member Author

@dsaxton dsaxton Sep 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might actually need to be np.int64 since above we check for datetime64 (getting some weird test failures after this change). Looking at the numpy documentation it seems the output of view is "unpredictable" when the precision differs between the data types (the shape of the output can even change).

if isna(arr).any():
raise ValueError("Cannot convert NaT values to integer")
return arr.view(dtype)

# allow frequency conversions
Expand All @@ -835,6 +837,8 @@ def astype_nansafe(arr, dtype, copy: bool = True, skipna: bool = False):
if is_object_dtype(dtype):
return tslibs.ints_to_pytimedelta(arr.view(np.int64))
elif dtype == np.int64:
if isna(arr).any():
raise ValueError("Cannot convert NaT values to integer")
return arr.view(dtype)

if dtype not in [_INT64_DTYPE, _TD_DTYPE]:
Expand Down
41 changes: 41 additions & 0 deletions pandas/tests/dtypes/test_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

import pandas.util._test_decorators as td

from pandas.core.dtypes.cast import astype_nansafe
import pandas.core.dtypes.common as com
from pandas.core.dtypes.dtypes import (
CategoricalDtype,
Expand All @@ -13,6 +14,7 @@
IntervalDtype,
PeriodDtype,
)
from pandas.core.dtypes.missing import isna

import pandas as pd
from pandas.conftest import (
Expand Down Expand Up @@ -721,3 +723,42 @@ def test__get_dtype_fails(input_param, expected_error_message):
)
def test__is_dtype_type(input_param, result):
assert com._is_dtype_type(input_param, lambda tipo: tipo == result)


@pytest.mark.parametrize("val", [np.datetime64("NaT"), np.timedelta64("NaT")])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an additional test for when dtype == object (which is the other path that hits M8 and m8)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a test separate from test_astype_nansafe to check that NaT is still null after casting to object?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

@pytest.mark.parametrize("typ", [np.int64])
def test_astype_nansafe(val, typ):
arr = np.array([val])

msg = "Cannot convert NaT values to integer"
with pytest.raises(ValueError, match=msg):
astype_nansafe(arr, dtype=typ)


@pytest.mark.parametrize("from_type", [np.datetime64, np.timedelta64])
@pytest.mark.parametrize(
"to_type",
[
np.uint8,
np.uint16,
np.uint32,
np.int8,
np.int16,
np.int32,
np.float16,
np.float32,
],
)
def test_astype_datetime64_bad_dtype_raises(from_type, to_type):
arr = np.array([from_type("2018")])

with pytest.raises(TypeError, match="cannot astype"):
astype_nansafe(arr, dtype=to_type)


@pytest.mark.parametrize("from_type", [np.datetime64, np.timedelta64])
def test_astype_object_preserves_datetime_na(from_type):
arr = np.array([from_type("NaT")])
result = astype_nansafe(arr, dtype="object")

assert isna(result)[0]