Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Masking and overflow checks for datetimeindex and timedeltaindex ops #18020

Merged
merged 17 commits into from
Nov 4, 2017

Conversation

jbrockmendel
Copy link
Member

There are a bunch of new tests (not obvious what the appropriate place is for a WIP test matrix like this, pls advise). The ones that will fail under master are test_timedeltaindex_add_timestamp_nat_masking and test_datetimeindex_sub_timestamp_overflow

Start filling out a test matrix of arithmetic ops.

# - timezone-aware variants
# - object-dtype, categorical dtype
# - PeriodIndex
# - consistency with .map(...) ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have a look thru test_ops, there is lots of coverage for things like this already (or maybe test_base). don't create a giant matrix, rather parametrize as much as possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look. After adding tests for #7996 (separate branch/PR), this class gets pretty huge. So yah, parameterization sounds nice.

(Also it looks like tests in this module don't get run, so that needs changing anyway).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also it looks like tests in this module don't get run, so that needs changing anyway).

sure they do, classes inherit from this. Pls pls pls don't create a huge matrix of tests w/o looking thru the existing. we cover quite a bit of this already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls pls pls don't create a huge matrix of tests w/o looking thru the existing. we cover quite a bit of this already.

Message received. Worrying about correctness first, brevity later.

@codecov
Copy link

codecov bot commented Oct 29, 2017

Codecov Report

Merging #18020 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18020      +/-   ##
==========================================
- Coverage   91.25%   91.23%   -0.02%     
==========================================
  Files         163      163              
  Lines       50123    50124       +1     
==========================================
- Hits        45741    45733       -8     
- Misses       4382     4391       +9
Flag Coverage Δ
#multiple 89.05% <100%> (ø) ⬆️
#single 40.32% <20%> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/timedeltas.py 91.19% <100%> (ø) ⬆️
pandas/core/indexes/datetimelike.py 97.1% <100%> (ø) ⬆️
pandas/core/indexes/datetimes.py 95.51% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3268f4e...4dbdfd7. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments

@jbrockmendel
Copy link
Member Author

OK, just removed the new file, took the two tests that currently fail on master and moved them into existing datetime and timedelta test files.

@jbrockmendel jbrockmendel mentioned this pull request Oct 30, 2017
@jorisvandenbossche
Copy link
Member

Does this also fix #7996 ?

@gfyoung gfyoung added Bug Timedelta Timedelta data type labels Oct 30, 2017
@jbrockmendel
Copy link
Member Author

Does this also fix #7996 ?

No, but it does fix a related bug that probably belongs in the same issue

In [3]: s = pd.Series(pd.date_range('20130101',periods=3))
In [4]: dti = pd.DatetimeIndex(s)

In [5]: s-np.datetime64('2013-01-01')
Out[5]: 
0   15705 days 23:59:59.999984
1   15706 days 23:59:59.999984
2   15707 days 23:59:59.999984
dtype: timedelta64[ns]

In [6]: dti - np.datetime64('2013-01-01')
Out[6]: DatetimeIndex(['1970-01-01', '1970-01-02', '1970-01-03'], dtype='datetime64[ns]', freq=None)

@jbrockmendel
Copy link
Member Author

Rebased and pushed; hoping that magically fixes the CI errors

@jbrockmendel
Copy link
Member Author

AFAICT the test failures here were caused because of fragility in test_NaT_methods and TestTimestamp.test_to_datetime_depr. Specifically, a test that this PR added (until moments ago) called ts.to_datetime() instead of ts.to_pydatetime(). This triggered a FutureWarning, which resulted in FutureWarning not being called in the tests that specifically check for it.

I think the immediate issue is now fixed, but ideally test_to_datetime_depr and test_NaT_methods would be isolated enough not to depend on test ordering.

@jreback jreback added this to the 0.21.1 milestone Oct 31, 2017
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a whatsnew note. 0.21.1 is fine.

@@ -447,6 +447,40 @@ def f():
t - offset
pytest.raises(OverflowError, f)

def test_datetimeindex_sub_timestamp_overflow(self):
dtimax = pd.to_datetime(['now', pd.Timestamp.max])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue for these

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no issue for this; I noticed it when tracking down the TimedeltaIndex bug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure there is, the PR number!


for variant in ts_neg_variants + ts_pos_variants:
res = tdinat + variant
assert res[1] is pd.NaT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check sub as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also would check both add/sub for the reverse, e.g. variant + tdinat (and -)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might as well assert fully, e.g.

tm.assert_index_equal(pd.TimedeltaIndex(['NaT', 'NaT']))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to do the full add/sub/radd/rsub matrix... that was kind of what I started out with. The question becomes where to put it, since arithmetic tests are scattered about. My preference would be new test modules test_arithmetic in each of indexes.timedeltas, indexes.datetimes, and indexes.periods where I can a) put these new tests and b) collect the arithmetic tests that are currently scattered about.

See discussion in #18026, #18036.

But for now I'll just edit the contents of the tests already introduced.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might as well assert fully, e.g.

tm.assert_index_equal(pd.TimedeltaIndex(['NaT', 'NaT']))

The first entry is not NaT, will not be constant across all of the variants (though the variants could be split into two groups over which it should be unchanging)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_arithmetic sounds like a good name. key is to share code as much as possible, via fixtures / parametrization / inheritence. We ideally want these objects to act as similar as possible, so keeping special cases to a minimum is important.

note before we do this, I think splitting out the tz-aware tests to its own hierarchy should be done.

see #17583 and #17694

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for now I'll just edit the contents of the tests already introduced.

Actually as I look at this, I'd much rather close out this bug fix and follow up with the Do It Right approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

certainly. bug fixes are good. separate, self-contained refactorings to make things more readable are better!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. After the current deluge of clears up, I'll circle back to this.


expected = pd.Timestamp.min.value - tsneg.value
for variant in ts_neg_variants:
res = dtimin - variant
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as below, assert fully the result type.

@jbrockmendel
Copy link
Member Author

Where did we land on this? My preference is to get this bug-fix in and worry about the rest in #18049+followups.

@jreback
Copy link
Contributor

jreback commented Nov 1, 2017

needs a rebase

@jbrockmendel
Copy link
Member Author

Just rebased. The two new tests are unchanged, just moved to the appropriate locations in test_arithmetic.

@jreback
Copy link
Contributor

jreback commented Nov 4, 2017

looks fine, tiny doc change. ping on green.

@@ -57,6 +57,8 @@ Documentation Changes
Bug Fixes
~~~~~~~~~
- Bug in ``DataFrame.resample(...).apply(...)`` when there is a callable that returns different columns (:issue:`15169`)
- Bug in :class:`TimedeltaIndex` subtraction could incorrectly overflow when `NaT` is present (:issue:`17791`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double backticks around NaT

@@ -57,6 +57,8 @@ Documentation Changes
Bug Fixes
~~~~~~~~~
- Bug in ``DataFrame.resample(...).apply(...)`` when there is a callable that returns different columns (:issue:`15169`)
- Bug in :class:`TimedeltaIndex` subtraction could incorrectly overflow when `NaT` is present (:issue:`17791`)
- Bug in :class:`DatetimeIndex` subtraction could fail to overflow (:issue:`18020`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you expand this a bit (e.g. subtracting what)

@jreback
Copy link
Contributor

jreback commented Nov 4, 2017

ping on green.

@jbrockmendel
Copy link
Member Author

ping

@jreback jreback merged commit 8388a47 into pandas-dev:master Nov 4, 2017
@jreback
Copy link
Contributor

jreback commented Nov 4, 2017

thanks!

@jbrockmendel jbrockmendel deleted the arith branch December 8, 2017 19:40
TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Dec 11, 2017
TomAugspurger added a commit to TomAugspurger/pandas that referenced this pull request Dec 11, 2017
TomAugspurger pushed a commit that referenced this pull request Dec 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NaT in TimedeltaIndex + Timestamp overflows
5 participants