BUG: Allow IntervalIndex to be constructed from categorical data with appropriate dtype #21254

jschendel · 2018-05-30T06:49:31Z

closes IntervalIndex does not accept CategoricalIndex (of interval dtype) #21243
closes IntervalIndex.from_arrays/from_breaks does not accept categorical data with valid dtype #21253
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Added this to 0.23.1 since it's a regression and the fix is a minor change outside the IntervalIndex class. Not opposed to pushing to 0.24.0 if backporting this could be problematic.

… appropriate dtype

jschendel · 2018-05-30T06:55:00Z

pandas/tests/indexes/interval/test_construction.py

+        # GH 21243/21253
+        if isinstance(constructor, partial) and constructor.func is Index:
+            # Index is defined to create CategoricalIndex from categorical data
+            pytest.skip()


This is being skipped due to the following behavior:

In [2]: cat = pd.Categorical([pd.Interval(0, 1), pd.Interval(1, 2), pd.Interval(0, 1)]) In [3]: pd.Index(cat, dtype='interval') Out[3]: CategoricalIndex([(0, 1], (1, 2], (0, 1]], categories=[(0, 1], (1, 2]], ordered=False, dtype='category')

This happens because the Index code is structured so that categorical takes precedence over interval:

pandas/pandas/core/indexes/base.py

Lines 262 to 273 in c85ab08

# categorical

if is_categorical_dtype(data) or is_categorical_dtype(dtype):

from .category import CategoricalIndex

return CategoricalIndex(data, dtype=dtype, copy=copy, name=name,

**kwargs)

# interval

if is_interval_dtype(data) or is_interval_dtype(dtype):

from .interval import IntervalIndex

closed = kwargs.get('closed', None)

return IntervalIndex(data, dtype=dtype, name=name, copy=copy,

closed=closed)

The code above could be restructured so that the dtype argument, if present, takes precedence over the type of data. Seems like that would be more sensible than the current approach for this corner case, but on the fence about it.

I c. ok can open an issue about this, but yes I would agree should infer with a passed dtype first before switching on the type of the data.

codecov · 2018-05-30T07:37:41Z

Codecov Report

Merging #21254 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #21254      +/-   ##
==========================================
+ Coverage   91.84%   91.84%   +<.01%     
==========================================
  Files         153      153              
  Lines       49538    49540       +2     
==========================================
+ Hits        45499    45501       +2     
  Misses       4039     4039

Flag	Coverage Δ
#multiple	`90.24% <100%> (ø)`	⬆️
#single	`41.87% <50%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/indexes/interval.py	`93.16% <100%> (+0.02%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c85ab08...770fe10. Read the comment docs.

jreback

can yoou replicate the test in #21243 in test_tile, otherwise lgtm.

jreback · 2018-05-30T10:39:15Z

pandas/tests/indexes/interval/test_construction.py

+        # GH 21243/21253
+        if isinstance(constructor, partial) and constructor.func is Index:
+            # Index is defined to create CategoricalIndex from categorical data
+            pytest.skip()


I c. ok can open an issue about this, but yes I would agree should infer with a passed dtype first before switching on the type of the data.

jreback · 2018-06-04T21:28:55Z

thanks @jschendel

… appropriate dtype (pandas-dev#21254) (cherry picked from commit 686f604)

… appropriate dtype (#21254) (cherry picked from commit 686f604)

… appropriate dtype (pandas-dev#21254)

BUG: Allow IntervalIndex to be constructed from categorical data with…

770fe10

… appropriate dtype

jschendel added Regression Functionality that used to work in a prior pandas version Categorical Categorical Data Type Interval Interval data type Needs Backport labels May 30, 2018

jschendel added this to the 0.23.1 milestone May 30, 2018

jschendel commented May 30, 2018

View reviewed changes

jreback requested changes May 30, 2018

View reviewed changes

jschendel mentioned this pull request Jun 4, 2018

BUG/API: Index constructor does not enforce specified dtype #21311

Closed

jreback approved these changes Jun 4, 2018

View reviewed changes

jreback merged commit 686f604 into pandas-dev:master Jun 4, 2018

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Jun 12, 2018

BUG: Allow IntervalIndex to be constructed from categorical data with…

c65c124

… appropriate dtype (pandas-dev#21254) (cherry picked from commit 686f604)

TomAugspurger pushed a commit that referenced this pull request Jun 12, 2018

BUG: Allow IntervalIndex to be constructed from categorical data with…

34ab282

… appropriate dtype (#21254) (cherry picked from commit 686f604)

TomAugspurger removed the Needs Backport label Jun 12, 2018

david-liu-brattle-1 pushed a commit to david-liu-brattle-1/pandas that referenced this pull request Jun 18, 2018

BUG: Allow IntervalIndex to be constructed from categorical data with…

1e34f45

… appropriate dtype (pandas-dev#21254)

jschendel deleted the ii-from-cat branch June 22, 2018 17:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Allow IntervalIndex to be constructed from categorical data with appropriate dtype #21254

BUG: Allow IntervalIndex to be constructed from categorical data with appropriate dtype #21254

jschendel commented May 30, 2018

jschendel May 30, 2018

jreback May 30, 2018

codecov bot commented May 30, 2018 •

edited

Loading

jreback left a comment

jreback May 30, 2018

jreback commented Jun 4, 2018

	# categorical
	if is_categorical_dtype(data) or is_categorical_dtype(dtype):
	from .category import CategoricalIndex
	return CategoricalIndex(data, dtype=dtype, copy=copy, name=name,
	**kwargs)

	# interval
	if is_interval_dtype(data) or is_interval_dtype(dtype):
	from .interval import IntervalIndex
	closed = kwargs.get('closed', None)
	return IntervalIndex(data, dtype=dtype, name=name, copy=copy,
	closed=closed)

BUG: Allow IntervalIndex to be constructed from categorical data with appropriate dtype #21254

BUG: Allow IntervalIndex to be constructed from categorical data with appropriate dtype #21254

Conversation

jschendel commented May 30, 2018

jschendel May 30, 2018

Choose a reason for hiding this comment

jreback May 30, 2018

Choose a reason for hiding this comment

codecov bot commented May 30, 2018 • edited Loading

Codecov Report

jreback left a comment

Choose a reason for hiding this comment

jreback May 30, 2018

Choose a reason for hiding this comment

jreback commented Jun 4, 2018

codecov bot commented May 30, 2018 •

edited

Loading