Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Raise when casting NaT to int #28492

Merged
merged 18 commits into from
Jan 1, 2020
Merged

BUG: Raise when casting NaT to int #28492

merged 18 commits into from
Jan 1, 2020

Conversation

dsaxton
Copy link
Member

@dsaxton dsaxton commented Sep 18, 2019

  • tests added / passed
  • passes black pandas
  • whatsnew entry

Fixes a bug in astype_nansafe where NaT was ignored when casting a datetime or timedelta to int. I put the test in pandas/tests/dtypes/test_common.py since I couldn't find another place where astype_nansafe was tested. Also adds various other tests for astype_nansafe.

Related: #28438

pandas/core/dtypes/cast.py Outdated Show resolved Hide resolved
@@ -696,6 +698,8 @@ def astype_nansafe(arr, dtype, copy=True, skipna=False):
if is_object_dtype(dtype):
return tslib.ints_to_pydatetime(arr.view(np.int64))
elif dtype == np.int64:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this likely needs to be is_integer_dtype(dtype)

Copy link
Member Author

@dsaxton dsaxton Sep 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might actually need to be np.int64 since above we check for datetime64 (getting some weird test failures after this change). Looking at the numpy documentation it seems the output of view is "unpredictable" when the precision differs between the data types (the shape of the output can even change).

@@ -731,3 +732,11 @@ def test__is_dtype_type_sparse():
result = np.dtype("int32")
assert com._is_dtype_type(ser, lambda tipo: tipo == result)
assert com._is_dtype_type(ser.dtype, lambda tipo: tipo == result)


def test__nansafe_nat_to_int():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to

test_astype_nansafe; can you parameterize this on all ints at the very least. ideally we would have more testing for this (but can be followup)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the other (non-int64) dtypes we end up hitting the TypeError here (related to the above comment):

raise TypeError(

Would you recommend breaking these out into a separate test?

pandas/tests/dtypes/test_common.py Outdated Show resolved Hide resolved
@jreback jreback added the Dtype Conversions Unexpected or buggy dtype conversions label Sep 18, 2019
@jreback
Copy link
Contributor

jreback commented Sep 18, 2019

is there a user visible level that hits this?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Sep 18, 2019

This hits both Series.astype, Index.astype, and DataFrame.astype

In [6]: a = [pd.Timestamp('2000'), pd.NaT]

In [7]: pd.Series(a).astype(int)
Out[7]:
0     946684800000000000
1   -9223372036854775808
dtype: int64

In [8]: pd.Index(a).astype(int)
Out[8]: Int64Index([946684800000000000, -9223372036854775808], dtype='int64')

In [10]: pd.DataFrame({"a": a}).astype(int)
Out[10]:
                     a
0   946684800000000000
1 -9223372036854775808

In addition to the astype_nansafe unit tests, we'd ideally also test these user facing ones.

And we should double check: we want this behavior right? Some people may be relying on this behavior (treating pd.NaT as iNaT). This feels like a bigger change than https://github.com/pandas-dev/pandas/pull/28438/files. I might prefer a warning first that this will change (though how would people work around it?).

@@ -706,3 +707,13 @@ def test__get_dtype_fails(input_param, expected_error_message):
)
def test__is_dtype_type(input_param, result):
assert com._is_dtype_type(input_param, lambda tipo: tipo == result)


@pytest.mark.parametrize("val", [np.datetime64("NaT"), np.timedelta64("NaT")])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an additional test for when dtype == object (which is the other path that hits M8 and m8)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a test separate from test_astype_nansafe to check that NaT is still null after casting to object?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

@TomAugspurger
Copy link
Contributor

Did we decide on whether we're going to deprecate this behavior before changing it? IMO, ew should

@dsaxton
Copy link
Member Author

dsaxton commented Sep 20, 2019

Did we decide on whether we're going to deprecate this behavior before changing it? IMO, ew should

I'd defer to you guys on that one. If we deprecate, the change would be to use a deprecation warning instead of an error and xfail the tests?

@jreback jreback added this to the 1.0 milestone Sep 26, 2019
@jreback
Copy link
Contributor

jreback commented Sep 26, 2019

Did we decide on whether we're going to deprecate this behavior before changing it? IMO, ew should

@TomAugspurger you feel strongly about this? I don't think its a big deal, but we could deprecate.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Sep 26, 2019 via email

@dsaxton
Copy link
Member Author

dsaxton commented Sep 26, 2019

Given how unusual the current behavior is, I'd be surprised if anyone was relying on it. I'd say it's more likely to be a source of unexpected bugs.

@WillAyd
Copy link
Member

WillAyd commented Oct 11, 2019

Is there a legitimate use case for using astype that we know of? The rules here wouldn't even match numpy right?

Does seem kind of buggy so I think OK without deprecation too

@WillAyd
Copy link
Member

WillAyd commented Nov 7, 2019

@dsaxton can you merge master to be sure? I think we can move forward with this as is, unless @TomAugspurger you feel otherwise

@jreback
Copy link
Contributor

jreback commented Nov 20, 2019

can you merge master

@WillAyd
Copy link
Member

WillAyd commented Dec 17, 2019

@jreback all green - OK to merge?

@jreback
Copy link
Contributor

jreback commented Dec 27, 2019

can you merge master and we'll look again

@jreback
Copy link
Contributor

jreback commented Jan 1, 2020

lgtm. @jbrockmendel if you'd have a glance (and merge if ok)

@jbrockmendel jbrockmendel merged commit 27b713b into pandas-dev:master Jan 1, 2020
@jbrockmendel
Copy link
Member

thanks @dsaxton

@dsaxton dsaxton deleted the cast-nat branch January 1, 2020 18:52
hweecat pushed a commit to hweecat/pandas that referenced this pull request Jan 1, 2020
* BUG: Raise when casting NaT to int

* Add release note

* Add PR number

* Use isna

* Parametrize test

* Check all integer dtypes

* Fix merge

* Edit release note

* Check for np.int64

* Handle timedelta64

* Add astype_nansafe datetime tests

* Add test for NaT object casting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants