-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix IntervalIndex constructor inconsistencies #18424
BUG: Fix IntervalIndex constructor inconsistencies #18424
Conversation
'some kind, 5 was passed') | ||
with pytest.raises(TypeError, message=msg): | ||
with tm.assert_raises_regex(TypeError, msg): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason pytest.raises
didn't appear to actually be checking the message, so I switched to tm.assert_raises_regex
.
IntervalIndex.from_intervals([Interval(0, 1), | ||
Interval(1, 2, closed='left')]) | ||
|
||
with pytest.raises(ValueError, message=msg): | ||
IntervalIndex.from_arrays([0, 10], [3, 5]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was in the wrong section; raised a ValueError
but should have been under the "decreasing breaks/arrays" commented section. Already have a relevant test there, so deleted.
@@ -1054,10 +1111,6 @@ def test_constructor_coverage(self): | |||
end=end.to_pydatetime()) | |||
tm.assert_index_equal(result, expected) | |||
|
|||
result = pd.interval_range(start=start.tz_localize('UTC'), | |||
end=end.tz_localize('UTC')) | |||
tm.assert_index_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted this check because it's invalid; just didn't appear as so until these fixes went in and fixed an unrelated issue. Here expected
doesn't have any tz related info, but start.tz_localize('UTC')
/end.tz_localize('UTC')
adds tz info, so the dtype for result
is tz aware, and not the same as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it now, I should add a tz aware related test, but will wait until there are other comments on this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep this is broken
In [3]: pd.interval_range(start=start.tz_localize('UTC'),
...: end=end.tz_localize('UTC'))
...:
Out[3]:
IntervalIndex([(2017-01-01, 2017-01-02], (2017-01-02, 2017-01-03], (2017-01-03, 2017-01-04], (2017-01-04, 2017-01-05], (2017-01-05, 2017-01-06] ... (2017-01-10, 2017-01-11], (2017-01-11, 2017-01-12], (2017-01-12, 2017-01-13], (2017-01-13, 2017-01-14], (2017-01-14, 2017-01-15]]
closed='right',
dtype='interval[datetime64[ns]]')
In [4]: pd.interval_range(start=start.tz_localize('UTC'),
...: end=end.tz_localize('UTC')).left
...:
Out[4]:
DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04',
'2017-01-05', '2017-01-06', '2017-01-07', '2017-01-08',
'2017-01-09', '2017-01-10', '2017-01-11', '2017-01-12',
'2017-01-13', '2017-01-14'],
dtype='datetime64[ns]', freq=None)
also need to validate that left/right have the same tz (if constructed that way)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in this commit seem to resolve tz related issues. Not super well versed on the tz aware code, but looks right to me:
In [2]: dt1 = pd.Timestamp('2017-01-01', tz='US/Eastern')
In [3]: dt2 = pd.Timestamp('2017-01-04', tz='US/Eastern')
In [4]: dt2_bad = pd.Timestamp('2017-01-04', tz='UTC')
In [5]: pd.interval_range(dt1, dt2)
Out[5]:
IntervalIndex([(2017-01-01, 2017-01-02], (2017-01-02, 2017-01-03], (2017-01-03, 2017-01-04]]
closed='right',
dtype='interval[datetime64[ns, US/Eastern]]')
In [6]: pd.interval_range(dt1, dt2).left
Out[6]:
DatetimeIndex(['2017-01-01 00:00:00-05:00', '2017-01-02 00:00:00-05:00',
'2017-01-03 00:00:00-05:00'],
dtype='datetime64[ns, US/Eastern]', freq='D')
In [7]: pd.interval_range(dt1, dt2_bad)
---------------------------------------------------------------------------
AssertionError: Inputs must both have the same timezone, US/Eastern != UTC
During handling of the above exception, another exception occurred:
TypeError: Start and end cannot both be tz-aware with different timezones
I think it was just a matter of making sure the dates were handled properly, and the validation follows from deferring to date_range
as part of the creation process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm...the only issue appears to be an inconsistency when one date is tz aware but the other isn't:
In [2]: dt1_tz = pd.Timestamp('2017-01-01', tz='US/Eastern')
In [3]: dt1_no_tz = pd.Timestamp('2017-01-01')
In [4]: dt2_tz = pd.Timestamp('2017-01-04', tz='US/Eastern')
In [5]: dt2_no_tz = pd.Timestamp('2017-01-04')
In [6]: pd.interval_range(dt1_tz, dt2_no_tz)
Out[6]:
IntervalIndex([(2017-01-01, 2017-01-02], (2017-01-02, 2017-01-03], (2017-01-03, 2017-01-04]]
closed='right',
dtype='interval[datetime64[ns, US/Eastern]]')
In [7]: pd.interval_range(dt1_no_tz, dt2_tz)
---------------------------------------------------------------------------
AssertionError: Inputs must both have the same timezone, None != US/Eastern
During handling of the above exception, another exception occurred:
TypeError: Start and end cannot both be tz-aware with different timezones
Again, seems to be an underlying issue with date_range
and DatetimeIndex
though:
In [8]: pd.date_range(dt1_tz, dt2_no_tz)
Out[8]:
DatetimeIndex(['2017-01-01 00:00:00-05:00', '2017-01-02 00:00:00-05:00',
'2017-01-03 00:00:00-05:00', '2017-01-04 00:00:00-05:00'],
dtype='datetime64[ns, US/Eastern]', freq='D')
In [9]: pd.DatetimeIndex(start=dt1_tz, end=dt2_no_tz, freq='D')
Out[9]:
DatetimeIndex(['2017-01-01 00:00:00-05:00', '2017-01-02 00:00:00-05:00',
'2017-01-03 00:00:00-05:00', '2017-01-04 00:00:00-05:00'],
dtype='datetime64[ns, US/Eastern]', freq='D')
In [10]: pd.date_range(dt1_no_tz, dt2_tz)
---------------------------------------------------------------------------
AssertionError: Inputs must both have the same timezone, None != US/Eastern
During handling of the above exception, another exception occurred:
TypeError: Start and end cannot both be tz-aware with different timezones
In [11]: pd.DatetimeIndex(start=dt1_no_tz, end=dt2_tz, freq='D')
---------------------------------------------------------------------------
AssertionError: Inputs must both have the same timezone, None != US/Eastern
During handling of the above exception, another exception occurred:
TypeError: Start and end cannot both be tz-aware with different timezones
Will open a new issue for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
09dac70
to
fbb8b71
Compare
fbb8b71
to
42c15fa
Compare
Codecov Report
@@ Coverage Diff @@
## master #18424 +/- ##
==========================================
- Coverage 91.35% 91.34% -0.02%
==========================================
Files 163 163
Lines 49691 49700 +9
==========================================
+ Hits 45397 45398 +1
- Misses 4294 4302 +8
Continue to review full report at Codecov.
|
lgtm. @jschendel ready? |
@jreback : this should be ready to merge. Was going to add tz aware related tests, but there are still a few tz aware things that are broken (e.g. |
sgtm. |
thanks @jschendel nice PRs. keep em coming! |
git diff upstream/master -u -- "*.py" | flake8 --diff
Regarding item 3) in the issue, which deals with the dtype of
IntervalIndex([])
: I implemented this so that the default behavior is to haveobject
dtype. However, if the empty data has a specific dtype, e.g.np.array([], dtype='int64')
, it will use that dtype instead.