Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: convert_datetime64 parameter in to_records() #18902

Merged
merged 6 commits into from
Apr 14, 2018

Conversation

reidy-p
Copy link
Contributor

@reidy-p reidy-p commented Dec 21, 2017

As noted in the original issue, the underlying NumPy bug seems to be fixed so this parameter may no longer be needed.

I also changed the default value for convert_datetime64 from True to False and only give a FutureWarning when convert_datetime64=True is passed but I'm not sure if this is more appropriate than leaving True as the default and always giving a FutureWarning.

@@ -216,6 +216,7 @@ Deprecations
- ``Series.valid`` is deprecated. Use :meth:`Series.dropna` instead (:issue:`18800`).
- :func:`read_excel` has deprecated the ``skip_footer`` parameter. Use ``skipfooter`` instead (:issue:`18836`)
- The ``is_copy`` attribute is deprecated and will be removed in a future version (:issue:`18801`).
- The ``convert_datetime64`` parameter has been deprecated and the default value is now ``False`` in :func:`to_records` as the NumPy bug motivating this parameter has been resolved (:issue:`18160`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should :func:`to_records` actually be :func:`DataFrame.to_records`?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. @jschendel : I believe you are correct

  2. @reidy-p : This sentence is a run-on. Break it up into two sentences (and add comma's at the conjunctions).

@gfyoung gfyoung added Deprecate Functionality to remove in pandas Dtype Conversions Unexpected or buggy dtype conversions labels Dec 22, 2017
@gfyoung
Copy link
Member

gfyoung commented Dec 22, 2017

@reidy-p : I'm a little surprised to see that you didn't have to change any tests with this deprecation, as it is also an API change. If this passes, it means coverage for this function is not that great.

Perhaps we should take the opportunity to write a few more tests...

@reidy-p
Copy link
Contributor Author

reidy-p commented Dec 22, 2017

@gfyoung yeah I was surprised too. I'll take a look at the existing tests in more detail and see if I can add some more.

@codecov
Copy link

codecov bot commented Dec 22, 2017

Codecov Report

Merging #18902 into master will decrease coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18902      +/-   ##
==========================================
- Coverage   91.84%   91.81%   -0.03%     
==========================================
  Files         153      153              
  Lines       49275    49277       +2     
==========================================
- Hits        45255    45245      -10     
- Misses       4020     4032      +12
Flag Coverage Δ
#multiple 90.2% <100%> (-0.03%) ⬇️
#single 41.91% <33.33%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/frame.py 97.16% <100%> (ø) ⬆️
pandas/plotting/_converter.py 65.07% <0%> (-1.74%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3885ced...4a1b1ba. Read the comment docs.

@reidy-p reidy-p force-pushed the to_records_datetimeindex branch from 8202f2c to 6a31ce3 Compare December 22, 2017 09:54
@@ -1187,7 +1187,7 @@ def from_records(cls, data, index=None, exclude=None, columns=None,

return cls(mgr)

def to_records(self, index=True, convert_datetime64=True):
def to_records(self, index=True, convert_datetime64=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make the default value None.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reidy-p did you see my comment here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I changed the default to None but was then asked to change it to False instead in a later review.

I think there are at least two options for handling the deprecation of the convert_datetime64 parameter:

  1. Change default from True to None. Give a warning if convert_datetime64 is not None. If convert_datetime64 is None then set convert_datetime64=True inside the function to avoid breaking existing code.

  2. Change default from True to False. Give a warning if convert_datetime64=True.

Which option is better?

@jorisvandenbossche I think you recommended Option 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I think we need to make this None, then can detect that a user is passing this. Yes we are also completely changing the behavior of this, but we really want to remove this parameter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in-line with @jorisvandenbossche comments. We want to warn if a user is passing this at ALL, and we are changing the default. (so its now False if the parameter is not specified). can you reword as appropriate. ping when ready.

@@ -1196,14 +1196,23 @@ def to_records(self, index=True, convert_datetime64=True):
----------
index : boolean, default True
Include index in resulting record array, stored in 'index' field
convert_datetime64 : boolean, default True
convert_datetime64 : boolean, default False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boolean, optional

Whether to convert the index to datetime.datetime if it is a
DatetimeIndex

Returns
-------
y : recarray
"""

if convert_datetime64:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is not None. We are going to completely remove this parameter, so we want to warn if its passed at all.

with tm.assert_produces_warning(FutureWarning):
expected = df.index[0]
result = df.to_records(convert_datetime64=True)['index'][0]
assert expected == result

rs = df.to_records(convert_datetime64=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you will need to catch the warning on the following as well

@@ -216,6 +216,7 @@ Deprecations
- ``Series.valid`` is deprecated. Use :meth:`Series.dropna` instead (:issue:`18800`).
- :func:`read_excel` has deprecated the ``skip_footer`` parameter. Use ``skipfooter`` instead (:issue:`18836`)
- The ``is_copy`` attribute is deprecated and will be removed in a future version (:issue:`18801`).
- The ``convert_datetime64`` parameter in :func:`DataFrame.to_records` has been deprecated and the default value is now ``False``. The NumPy bug motivating this parameter has been resolved (:issue:`18160`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say it will be removed

@reidy-p reidy-p force-pushed the to_records_datetimeindex branch 2 times, most recently from 5b4bc9c to cb0dfd5 Compare December 27, 2017 12:37
@@ -1196,14 +1196,23 @@ def to_records(self, index=True, convert_datetime64=True):
----------
index : boolean, default True
Include index in resulting record array, stored in 'index' field
convert_datetime64 : boolean, default True
convert_datetime64 : boolean, optional
.. deprecated:: 0.23.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, but it still defaults to True (its just we are getting rid of it).

"deprecated and will be removed in a future "
"version",
FutureWarning, stacklevel=2)

if index:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else:
    convert_datetime64 = True


rs = df.to_records(convert_datetime64=False)
assert rs['index'][0] == df.index.values[0]
with tm.assert_produces_warning(FutureWarning):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add an example where we don't pass it, so it gets defaulted to True.

@reidy-p reidy-p force-pushed the to_records_datetimeindex branch from c13b109 to ef6cf9b Compare December 27, 2017 21:30
@reidy-p reidy-p force-pushed the to_records_datetimeindex branch from ef6cf9b to 00b10db Compare January 12, 2018 21:13
@jreback jreback added this to the 0.23.0 milestone Jan 13, 2018
Whether to convert the index to datetime.datetime if it is a
DatetimeIndex

Returns
-------
y : recarray
"""

if convert_datetime64 is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche how should we handle this here. We really want to change the default to False in the future, but need True to make this backwards compat for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on 2nd thought, let's just change this to default to False. We have to change it now or later anyways.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to have to None as a default to detect the case that people now did convert_datetime64=False explicitly. Otherwise they don't see the deprecation warning, and their code will break once we remove the keyword.

@jreback
Copy link
Contributor

jreback commented Feb 24, 2018

can you rebase

@reidy-p reidy-p force-pushed the to_records_datetimeindex branch 3 times, most recently from d9c9d7c to eca4757 Compare March 1, 2018 22:00
Whether to convert the index to datetime.datetime if it is a
DatetimeIndex

Returns
-------
y : recarray
"""

if convert_datetime64 is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on 2nd thought, let's just change this to default to False. We have to change it now or later anyways.

result = df.to_records(convert_datetime64=True)['index'][0]
assert expected == result

expected = df.index[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move comment above expected

assert expected == result

with tm.assert_produces_warning(FutureWarning):
rs = df.to_records(convert_datetime64=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you test with True as well

@reidy-p reidy-p force-pushed the to_records_datetimeindex branch from eca4757 to 94ee32b Compare March 3, 2018 12:42
@reidy-p reidy-p force-pushed the to_records_datetimeindex branch from 94ee32b to 32606c5 Compare March 17, 2018 23:12
@reidy-p reidy-p force-pushed the to_records_datetimeindex branch from 32606c5 to f4c6190 Compare March 25, 2018 16:12
@jorisvandenbossche
Copy link
Member

(moving this to the main thread)

Which option is best depends on what the end goal here is.
Do we actually want to remove the option altogether in the future? In that case, we always need to warn if the keyword is manually specified, so we need a default of None to be able to catch this (which can then mean an actual default of True or False, which one of the two is independent from having None as the default)

Given the deprecation warning, it seems we actually want to remove it. So I think we need a default of None (and then best give a different warning in case True or False was specified).

But having a default of None, does not mean that if it is None, it should mean convert_datetime64=True as you indicate in your comment above. We can still actually change the behaviour at the same time if we want (the deprecation warning is currently for the keyword, not for the actual behaviour change I think?)

We could also delay the actual change in behaviour and first warn that it will change in the future (your option 1).
But given this is such a corner case and almost never the desired behaviour, I am not sure it is needed in this case.

@reidy-p reidy-p force-pushed the to_records_datetimeindex branch 4 times, most recently from eb0b207 to fd0f114 Compare April 1, 2018 13:09
@@ -1187,7 +1187,7 @@ def from_records(cls, data, index=None, exclude=None, columns=None,

return cls(mgr)

def to_records(self, index=True, convert_datetime64=True):
def to_records(self, index=True, convert_datetime64=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I think we need to make this None, then can detect that a user is passing this. Yes we are also completely changing the behavior of this, but we really want to remove this parameter.

@@ -1187,7 +1187,7 @@ def from_records(cls, data, index=None, exclude=None, columns=None,

return cls(mgr)

def to_records(self, index=True, convert_datetime64=True):
def to_records(self, index=True, convert_datetime64=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in-line with @jorisvandenbossche comments. We want to warn if a user is passing this at ALL, and we are changing the default. (so its now False if the parameter is not specified). can you reword as appropriate. ping when ready.

@jreback
Copy link
Contributor

jreback commented Apr 14, 2018

if you can update @reidy-p

@reidy-p reidy-p force-pushed the to_records_datetimeindex branch 2 times, most recently from b0cc2be to 42d75e0 Compare April 14, 2018 16:18
@reidy-p reidy-p force-pushed the to_records_datetimeindex branch from 42d75e0 to 4a1b1ba Compare April 14, 2018 17:19
@reidy-p
Copy link
Contributor Author

reidy-p commented Apr 14, 2018

@jreback I think this is ready but would be good if you could double-check the tests to make sure they make sense.

@jreback jreback merged commit d5d5a71 into pandas-dev:master Apr 14, 2018
@jreback
Copy link
Contributor

jreback commented Apr 14, 2018

lgtm. thanks @reidy-p

@reidy-p reidy-p deleted the to_records_datetimeindex branch April 15, 2018 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataFrame.to_records() converts DatetimeIndex to datetime object, not np.datetime64
5 participants