-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Masking and overflow checks for datetimeindex and timedeltaindex ops #18020
Conversation
WIP filling out a test matrix of arithmetic ops closes pandas-dev#17991
pandas/tests/indexes/datetimelike.py
Outdated
# - timezone-aware variants | ||
# - object-dtype, categorical dtype | ||
# - PeriodIndex | ||
# - consistency with .map(...) ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have a look thru test_ops, there is lots of coverage for things like this already (or maybe test_base). don't create a giant matrix, rather parametrize as much as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look. After adding tests for #7996 (separate branch/PR), this class gets pretty huge. So yah, parameterization sounds nice.
(Also it looks like tests in this module don't get run, so that needs changing anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also it looks like tests in this module don't get run, so that needs changing anyway).
sure they do, classes inherit from this. Pls pls pls don't create a huge matrix of tests w/o looking thru the existing. we cover quite a bit of this already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pls pls pls don't create a huge matrix of tests w/o looking thru the existing. we cover quite a bit of this already.
Message received. Worrying about correctness first, brevity later.
Codecov Report
@@ Coverage Diff @@
## master #18020 +/- ##
==========================================
- Coverage 91.25% 91.23% -0.02%
==========================================
Files 163 163
Lines 50123 50124 +1
==========================================
- Hits 45741 45733 -8
- Misses 4382 4391 +9
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comments
OK, just removed the new file, took the two tests that currently fail on master and moved them into existing datetime and timedelta test files. |
Does this also fix #7996 ? |
No, but it does fix a related bug that probably belongs in the same issue
|
Rebased and pushed; hoping that magically fixes the CI errors |
AFAICT the test failures here were caused because of fragility in I think the immediate issue is now fixed, but ideally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs a whatsnew note. 0.21.1 is fine.
@@ -447,6 +447,40 @@ def f(): | |||
t - offset | |||
pytest.raises(OverflowError, f) | |||
|
|||
def test_datetimeindex_sub_timestamp_overflow(self): | |||
dtimax = pd.to_datetime(['now', pd.Timestamp.max]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the issue for these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no issue for this; I noticed it when tracking down the TimedeltaIndex bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure there is, the PR number!
|
||
for variant in ts_neg_variants + ts_pos_variants: | ||
res = tdinat + variant | ||
assert res[1] is pd.NaT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check sub as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also would check both add/sub for the reverse, e.g. variant + tdinat (and -)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might as well assert fully, e.g.
tm.assert_index_equal(pd.TimedeltaIndex(['NaT', 'NaT']))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to do the full add/sub/radd/rsub matrix... that was kind of what I started out with. The question becomes where to put it, since arithmetic tests are scattered about. My preference would be new test modules test_arithmetic
in each of indexes.timedeltas
, indexes.datetimes
, and indexes.periods
where I can a) put these new tests and b) collect the arithmetic tests that are currently scattered about.
See discussion in #18026, #18036.
But for now I'll just edit the contents of the tests already introduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might as well assert fully, e.g.
tm.assert_index_equal(pd.TimedeltaIndex(['NaT', 'NaT']))
The first entry is not NaT
, will not be constant across all of the variants (though the variants could be split into two groups over which it should be unchanging)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_arithmetic sounds like a good name. key is to share code as much as possible, via fixtures / parametrization / inheritence. We ideally want these objects to act as similar as possible, so keeping special cases to a minimum is important.
note before we do this, I think splitting out the tz-aware tests to its own hierarchy should be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But for now I'll just edit the contents of the tests already introduced.
Actually as I look at this, I'd much rather close out this bug fix and follow up with the Do It Right approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
certainly. bug fixes are good. separate, self-contained refactorings to make things more readable are better!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. After the current deluge of clears up, I'll circle back to this.
|
||
expected = pd.Timestamp.min.value - tsneg.value | ||
for variant in ts_neg_variants: | ||
res = dtimin - variant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as below, assert fully the result type.
Where did we land on this? My preference is to get this bug-fix in and worry about the rest in #18049+followups. |
needs a rebase |
Just rebased. The two new tests are unchanged, just moved to the appropriate locations in test_arithmetic. |
looks fine, tiny doc change. ping on green. |
doc/source/whatsnew/v0.21.1.txt
Outdated
@@ -57,6 +57,8 @@ Documentation Changes | |||
Bug Fixes | |||
~~~~~~~~~ | |||
- Bug in ``DataFrame.resample(...).apply(...)`` when there is a callable that returns different columns (:issue:`15169`) | |||
- Bug in :class:`TimedeltaIndex` subtraction could incorrectly overflow when `NaT` is present (:issue:`17791`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double backticks around NaT
doc/source/whatsnew/v0.21.1.txt
Outdated
@@ -57,6 +57,8 @@ Documentation Changes | |||
Bug Fixes | |||
~~~~~~~~~ | |||
- Bug in ``DataFrame.resample(...).apply(...)`` when there is a callable that returns different columns (:issue:`15169`) | |||
- Bug in :class:`TimedeltaIndex` subtraction could incorrectly overflow when `NaT` is present (:issue:`17791`) | |||
- Bug in :class:`DatetimeIndex` subtraction could fail to overflow (:issue:`18020`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you expand this a bit (e.g. subtracting what)
ping on green. |
ping |
thanks! |
…andas-dev#18020) closes pandas-dev#17991 (cherry picked from commit 8388a47)
There are a bunch of new tests (not obvious what the appropriate place is for a WIP test matrix like this, pls advise). The ones that will fail under master are
test_timedeltaindex_add_timestamp_nat_masking
andtest_datetimeindex_sub_timestamp_overflow
Start filling out a test matrix of arithmetic ops.
git diff upstream/master -u -- "*.py" | flake8 --diff