Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix IntervalIndex constructor inconsistencies #18424

Merged
merged 1 commit into from
Nov 24, 2017

Conversation

jschendel
Copy link
Member

@jschendel jschendel commented Nov 22, 2017

Regarding item 3) in the issue, which deals with the dtype of IntervalIndex([]): I implemented this so that the default behavior is to have object dtype. However, if the empty data has a specific dtype, e.g. np.array([], dtype='int64'), it will use that dtype instead.

'some kind, 5 was passed')
with pytest.raises(TypeError, message=msg):
with tm.assert_raises_regex(TypeError, msg):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason pytest.raises didn't appear to actually be checking the message, so I switched to tm.assert_raises_regex.

IntervalIndex.from_intervals([Interval(0, 1),
Interval(1, 2, closed='left')])

with pytest.raises(ValueError, message=msg):
IntervalIndex.from_arrays([0, 10], [3, 5])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was in the wrong section; raised a ValueError but should have been under the "decreasing breaks/arrays" commented section. Already have a relevant test there, so deleted.

@@ -1054,10 +1111,6 @@ def test_constructor_coverage(self):
end=end.to_pydatetime())
tm.assert_index_equal(result, expected)

result = pd.interval_range(start=start.tz_localize('UTC'),
end=end.tz_localize('UTC'))
tm.assert_index_equal(result, expected)
Copy link
Member Author

@jschendel jschendel Nov 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted this check because it's invalid; just didn't appear as so until these fixes went in and fixed an unrelated issue. Here expected doesn't have any tz related info, but start.tz_localize('UTC')/end.tz_localize('UTC') adds tz info, so the dtype for result is tz aware, and not the same as expected.

Copy link
Member Author

@jschendel jschendel Nov 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it now, I should add a tz aware related test, but will wait until there are other comments on this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep this is broken

In [3]: pd.interval_range(start=start.tz_localize('UTC'),
   ...:                   end=end.tz_localize('UTC'))
   ...: 
Out[3]: 
IntervalIndex([(2017-01-01, 2017-01-02], (2017-01-02, 2017-01-03], (2017-01-03, 2017-01-04], (2017-01-04, 2017-01-05], (2017-01-05, 2017-01-06] ... (2017-01-10, 2017-01-11], (2017-01-11, 2017-01-12], (2017-01-12, 2017-01-13], (2017-01-13, 2017-01-14], (2017-01-14, 2017-01-15]]
              closed='right',
              dtype='interval[datetime64[ns]]')

In [4]: pd.interval_range(start=start.tz_localize('UTC'),
   ...:                   end=end.tz_localize('UTC')).left
   ...:                   
Out[4]: 
DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04',
               '2017-01-05', '2017-01-06', '2017-01-07', '2017-01-08',
               '2017-01-09', '2017-01-10', '2017-01-11', '2017-01-12',
               '2017-01-13', '2017-01-14'],
              dtype='datetime64[ns]', freq=None)

also need to validate that left/right have the same tz (if constructed that way)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this commit seem to resolve tz related issues. Not super well versed on the tz aware code, but looks right to me:

In [2]: dt1 = pd.Timestamp('2017-01-01', tz='US/Eastern')

In [3]: dt2 = pd.Timestamp('2017-01-04', tz='US/Eastern')

In [4]: dt2_bad = pd.Timestamp('2017-01-04', tz='UTC')

In [5]: pd.interval_range(dt1, dt2)
Out[5]:
IntervalIndex([(2017-01-01, 2017-01-02], (2017-01-02, 2017-01-03], (2017-01-03, 2017-01-04]]
              closed='right',
              dtype='interval[datetime64[ns, US/Eastern]]')

In [6]: pd.interval_range(dt1, dt2).left
Out[6]:
DatetimeIndex(['2017-01-01 00:00:00-05:00', '2017-01-02 00:00:00-05:00',
               '2017-01-03 00:00:00-05:00'],
              dtype='datetime64[ns, US/Eastern]', freq='D')

In [7]: pd.interval_range(dt1, dt2_bad)
---------------------------------------------------------------------------
AssertionError: Inputs must both have the same timezone, US/Eastern != UTC

During handling of the above exception, another exception occurred:

TypeError: Start and end cannot both be tz-aware with different timezones

I think it was just a matter of making sure the dates were handled properly, and the validation follows from deferring to date_range as part of the creation process.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm...the only issue appears to be an inconsistency when one date is tz aware but the other isn't:

In [2]: dt1_tz = pd.Timestamp('2017-01-01', tz='US/Eastern')

In [3]: dt1_no_tz = pd.Timestamp('2017-01-01')

In [4]: dt2_tz = pd.Timestamp('2017-01-04', tz='US/Eastern')

In [5]: dt2_no_tz = pd.Timestamp('2017-01-04')

In [6]: pd.interval_range(dt1_tz, dt2_no_tz)
Out[6]:
IntervalIndex([(2017-01-01, 2017-01-02], (2017-01-02, 2017-01-03], (2017-01-03, 2017-01-04]]
              closed='right',
              dtype='interval[datetime64[ns, US/Eastern]]')

In [7]: pd.interval_range(dt1_no_tz, dt2_tz)
---------------------------------------------------------------------------
AssertionError: Inputs must both have the same timezone, None != US/Eastern

During handling of the above exception, another exception occurred:

TypeError: Start and end cannot both be tz-aware with different timezones

Again, seems to be an underlying issue with date_range and DatetimeIndex though:

In [8]: pd.date_range(dt1_tz, dt2_no_tz)
Out[8]:
DatetimeIndex(['2017-01-01 00:00:00-05:00', '2017-01-02 00:00:00-05:00',
               '2017-01-03 00:00:00-05:00', '2017-01-04 00:00:00-05:00'],
              dtype='datetime64[ns, US/Eastern]', freq='D')

In [9]: pd.DatetimeIndex(start=dt1_tz, end=dt2_no_tz, freq='D')
Out[9]:
DatetimeIndex(['2017-01-01 00:00:00-05:00', '2017-01-02 00:00:00-05:00',
               '2017-01-03 00:00:00-05:00', '2017-01-04 00:00:00-05:00'],
              dtype='datetime64[ns, US/Eastern]', freq='D')

In [10]: pd.date_range(dt1_no_tz, dt2_tz)
---------------------------------------------------------------------------
AssertionError: Inputs must both have the same timezone, None != US/Eastern

During handling of the above exception, another exception occurred:

TypeError: Start and end cannot both be tz-aware with different timezones

In [11]: pd.DatetimeIndex(start=dt1_no_tz, end=dt2_tz, freq='D')
---------------------------------------------------------------------------
AssertionError: Inputs must both have the same timezone, None != US/Eastern

During handling of the above exception, another exception occurred:

TypeError: Start and end cannot both be tz-aware with different timezones

Will open a new issue for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@jschendel jschendel force-pushed the iv-idx-constructor-consistency branch from 09dac70 to fbb8b71 Compare November 22, 2017 09:55
@jreback jreback added Bug Interval Interval data type labels Nov 22, 2017
@jschendel jschendel force-pushed the iv-idx-constructor-consistency branch from fbb8b71 to 42c15fa Compare November 22, 2017 23:36
@codecov
Copy link

codecov bot commented Nov 23, 2017

Codecov Report

Merging #18424 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18424      +/-   ##
==========================================
- Coverage   91.35%   91.34%   -0.02%     
==========================================
  Files         163      163              
  Lines       49691    49700       +9     
==========================================
+ Hits        45397    45398       +1     
- Misses       4294     4302       +8
Flag Coverage Δ
#multiple 89.14% <100%> (ø) ⬆️
#single 39.67% <11.53%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/interval.py 92.64% <100%> (+0.11%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️
pandas/core/dtypes/cast.py 88.59% <0%> (+0.17%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fedc503...42c15fa. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Nov 23, 2017

lgtm. @jschendel ready?

@jreback jreback added this to the 0.22.0 milestone Nov 23, 2017
@jschendel
Copy link
Member Author

@jreback : this should be ready to merge. Was going to add tz aware related tests, but there are still a few tz aware things that are broken (e.g. IntervalIndex.mid) that should probably be done in a separate PR to keep things clean.

@jreback
Copy link
Contributor

jreback commented Nov 24, 2017

@jreback : this should be ready to merge. Was going to add tz aware related tests, but there are still a few tz aware things that are broken (e.g. IntervalIndex.mid) that should probably be done in a separate PR to keep things clean.

sgtm.

@jreback jreback merged commit de4b384 into pandas-dev:master Nov 24, 2017
@jreback
Copy link
Contributor

jreback commented Nov 24, 2017

thanks @jschendel nice PRs. keep em coming!

@jschendel jschendel deleted the iv-idx-constructor-consistency branch November 24, 2017 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Interval Interval data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: IntervalIndex constructor inconsistencies
2 participants