Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Allow IntervalIndex to be constructed from categorical data with appropriate dtype #21254

Merged
merged 1 commit into from
Jun 4, 2018

Conversation

jschendel
Copy link
Member

Added this to 0.23.1 since it's a regression and the fix is a minor change outside the IntervalIndex class. Not opposed to pushing to 0.24.0 if backporting this could be problematic.

@jschendel jschendel added Regression Functionality that used to work in a prior pandas version Categorical Categorical Data Type Interval Interval data type Needs Backport labels May 30, 2018
@jschendel jschendel added this to the 0.23.1 milestone May 30, 2018
# GH 21243/21253
if isinstance(constructor, partial) and constructor.func is Index:
# Index is defined to create CategoricalIndex from categorical data
pytest.skip()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is being skipped due to the following behavior:

In [2]: cat = pd.Categorical([pd.Interval(0, 1), pd.Interval(1, 2), pd.Interval(0, 1)])

In [3]: pd.Index(cat, dtype='interval')
Out[3]: CategoricalIndex([(0, 1], (1, 2], (0, 1]], categories=[(0, 1], (1, 2]], ordered=False, dtype='category')

This happens because the Index code is structured so that categorical takes precedence over interval:

# categorical
if is_categorical_dtype(data) or is_categorical_dtype(dtype):
from .category import CategoricalIndex
return CategoricalIndex(data, dtype=dtype, copy=copy, name=name,
**kwargs)
# interval
if is_interval_dtype(data) or is_interval_dtype(dtype):
from .interval import IntervalIndex
closed = kwargs.get('closed', None)
return IntervalIndex(data, dtype=dtype, name=name, copy=copy,
closed=closed)

The code above could be restructured so that the dtype argument, if present, takes precedence over the type of data. Seems like that would be more sensible than the current approach for this corner case, but on the fence about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I c. ok can open an issue about this, but yes I would agree should infer with a passed dtype first before switching on the type of the data.

@codecov
Copy link

codecov bot commented May 30, 2018

Codecov Report

Merging #21254 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21254      +/-   ##
==========================================
+ Coverage   91.84%   91.84%   +<.01%     
==========================================
  Files         153      153              
  Lines       49538    49540       +2     
==========================================
+ Hits        45499    45501       +2     
  Misses       4039     4039
Flag Coverage Δ
#multiple 90.24% <100%> (ø) ⬆️
#single 41.87% <50%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/indexes/interval.py 93.16% <100%> (+0.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c85ab08...770fe10. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can yoou replicate the test in #21243 in test_tile, otherwise lgtm.

# GH 21243/21253
if isinstance(constructor, partial) and constructor.func is Index:
# Index is defined to create CategoricalIndex from categorical data
pytest.skip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I c. ok can open an issue about this, but yes I would agree should infer with a passed dtype first before switching on the type of the data.

@jreback jreback merged commit 686f604 into pandas-dev:master Jun 4, 2018
@jreback
Copy link
Contributor

jreback commented Jun 4, 2018

thanks @jschendel

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Jun 12, 2018
TomAugspurger pushed a commit that referenced this pull request Jun 12, 2018
david-liu-brattle-1 pushed a commit to david-liu-brattle-1/pandas that referenced this pull request Jun 18, 2018
@jschendel jschendel deleted the ii-from-cat branch June 22, 2018 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Interval Interval data type Regression Functionality that used to work in a prior pandas version
Projects
None yet
3 participants