Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Let IntervalIndex constructor override inferred closed #21584

Merged
merged 1 commit into from
Jun 27, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,13 @@ Strings
-
-

Interval
^^^^^^^^

- Bug in the :class:`IntervalIndex` constructor where the ``closed`` parameter did not always override the inferred ``closed`` (:issue:`19370`)
-
-

Indexing
^^^^^^^^

Expand Down
19 changes: 15 additions & 4 deletions pandas/_libs/interval.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -335,11 +335,17 @@ cdef class Interval(IntervalMixin):

@cython.wraparound(False)
@cython.boundscheck(False)
cpdef intervals_to_interval_bounds(ndarray intervals):
cpdef intervals_to_interval_bounds(ndarray intervals,
bint validate_closed=True):
"""
Parameters
----------
intervals: ndarray object array of Intervals / nulls
intervals : ndarray
object array of Intervals / nulls

validate_closed: boolean, default True
boolean indicating if all intervals must be closed on the same side.
Mismatching closed will raise if True, else return None for closed.

Returns
-------
Expand All @@ -353,6 +359,7 @@ cpdef intervals_to_interval_bounds(ndarray intervals):
object closed = None, interval
int64_t n = len(intervals)
ndarray left, right
bint seen_closed = False

left = np.empty(n, dtype=intervals.dtype)
right = np.empty(n, dtype=intervals.dtype)
Expand All @@ -370,10 +377,14 @@ cpdef intervals_to_interval_bounds(ndarray intervals):

left[i] = interval.left
right[i] = interval.right
if closed is None:
if not seen_closed:
seen_closed = True
closed = interval.closed
elif closed != interval.closed:
raise ValueError('intervals must all be closed on the same side')
closed = None
if validate_closed:
msg = 'intervals must all be closed on the same side'
raise ValueError(msg)

return left, right, closed

Expand Down
14 changes: 3 additions & 11 deletions pandas/core/indexes/interval.py
Original file line number Diff line number Diff line change
Expand Up @@ -233,24 +233,16 @@ def __new__(cls, data, closed=None, dtype=None, copy=False,
if isinstance(data, IntervalIndex):
left = data.left
right = data.right
closed = data.closed
closed = closed or data.closed
else:

# don't allow scalars
if is_scalar(data):
cls._scalar_data_error(data)

data = maybe_convert_platform_interval(data)
left, right, infer_closed = intervals_to_interval_bounds(data)

if (com._all_not_none(closed, infer_closed) and
closed != infer_closed):
# GH 18421
msg = ("conflicting values for closed: constructor got "
"'{closed}', inferred from data '{infer_closed}'"
.format(closed=closed, infer_closed=infer_closed))
raise ValueError(msg)

left, right, infer_closed = intervals_to_interval_bounds(
data, validate_closed=closed is None)
closed = closed or infer_closed

return cls._simple_new(left, right, closed, name, copy=copy,
Expand Down
26 changes: 19 additions & 7 deletions pandas/tests/indexes/interval/test_construction.py
Original file line number Diff line number Diff line change
Expand Up @@ -317,13 +317,7 @@ def test_generic_errors(self, constructor):
pass

def test_constructor_errors(self, constructor):
# mismatched closed inferred from intervals vs constructor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I think this is an invalid interval as its closed on both sides?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my logic here is that if there's a conflict, then the user had to explicitly pass the conflicting value to the constructor since the default for closed is None. Being done explicitly seems to indicate (at least a degree of) user intention, so seems reasonable to defer to the user passed value if there's a conflict, as opposed to forcing preprocessing onto the user.

Can certainly see an argument for raising in this case though, so can modify to make the behavior consistent in regards to raising if that's the preferred behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so thinking of the scenario where you have a list of Intervals and you are then overriding to make it uniform, but this actually changes what this represents; the user doesn't even know this. maybe a warning would be in order? ISince the override is explicit the user 'knows what they are doing', but do they really?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure. I think it would be OK to override without warning or error, but not really sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not opposed to adding a warning; will do that within the next day. Shouldn't be much work and will be easy enough to remove if we decide we don't want it later.

Another use case for overriding is to change closed for an existing IntervalIndex. A little bit different than the list of intervals case, but I think we'd want the constructor to have consistent behavior for list of intervals input and IntervalIndex input.

On master the closed parameter is ignored for IntervalIndex input:

In [2]: index = pd.interval_range(0, 3, closed='both')

In [3]: pd.IntervalIndex(index, closed='neither')
Out[3]:
IntervalIndex([[0, 1], [1, 2], [2, 3]]
              closed='both',
              dtype='interval[int64]')

And the best workaround (to my knowledge) is to deconstruct and pass to one of the from_* methods, which seems unnecessarily tedious, e.g.

In [4]: pd.IntervalIndex.from_arrays(index.left, index.right, closed='neither')
Out[4]:
IntervalIndex([(0, 1), (1, 2), (2, 3)]
              closed='neither',
              dtype='interval[int64]')

In [5]: pd.IntervalIndex.from_tuples(index.to_tuples(), closed='neither')
Out[5]:
IntervalIndex([(0, 1), (1, 2), (2, 3)]
              closed='neither',
              dtype='interval[int64]')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe think about adding a set_closed method (similar to how we have accessors for names/labels/levels and so on on Indexes), though in the construtor is fine as well.

ok so merge this as is and do followup? (either way ok)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so merge this as is and do followup?

Merging as is. To be clear, for the followup are you referring to adding warnings, or just the set_closed?

ivs = [Interval(0, 1, closed='both'), Interval(1, 2, closed='both')]
msg = 'conflicting values for closed'
with tm.assert_raises_regex(ValueError, msg):
constructor(ivs, closed='neither')

# mismatched closed within intervals
# mismatched closed within intervals with no constructor override
ivs = [Interval(0, 1, closed='right'), Interval(2, 3, closed='left')]
msg = 'intervals must all be closed on the same side'
with tm.assert_raises_regex(ValueError, msg):
Expand All @@ -341,6 +335,24 @@ def test_constructor_errors(self, constructor):
with tm.assert_raises_regex(TypeError, msg):
constructor([0, 1])

@pytest.mark.parametrize('data, closed', [
([], 'both'),
([np.nan, np.nan], 'neither'),
([Interval(0, 3, closed='neither'),
Interval(2, 5, closed='neither')], 'left'),
([Interval(0, 3, closed='left'),
Interval(2, 5, closed='right')], 'neither'),
(IntervalIndex.from_breaks(range(5), closed='both'), 'right')])
def test_override_inferred_closed(self, constructor, data, closed):
# GH 19370
if isinstance(data, IntervalIndex):
tuples = data.to_tuples()
else:
tuples = [(iv.left, iv.right) if notna(iv) else iv for iv in data]
expected = IntervalIndex.from_tuples(tuples, closed=closed)
result = constructor(data, closed=closed)
tm.assert_index_equal(result, expected)


class TestFromIntervals(TestClassConstructors):
"""
Expand Down