-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Allow IntervalIndex to be constructed from categorical data with appropriate dtype #21254
Conversation
… appropriate dtype
# GH 21243/21253 | ||
if isinstance(constructor, partial) and constructor.func is Index: | ||
# Index is defined to create CategoricalIndex from categorical data | ||
pytest.skip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is being skipped due to the following behavior:
In [2]: cat = pd.Categorical([pd.Interval(0, 1), pd.Interval(1, 2), pd.Interval(0, 1)])
In [3]: pd.Index(cat, dtype='interval')
Out[3]: CategoricalIndex([(0, 1], (1, 2], (0, 1]], categories=[(0, 1], (1, 2]], ordered=False, dtype='category')
This happens because the Index
code is structured so that categorical takes precedence over interval:
pandas/pandas/core/indexes/base.py
Lines 262 to 273 in c85ab08
# categorical | |
if is_categorical_dtype(data) or is_categorical_dtype(dtype): | |
from .category import CategoricalIndex | |
return CategoricalIndex(data, dtype=dtype, copy=copy, name=name, | |
**kwargs) | |
# interval | |
if is_interval_dtype(data) or is_interval_dtype(dtype): | |
from .interval import IntervalIndex | |
closed = kwargs.get('closed', None) | |
return IntervalIndex(data, dtype=dtype, name=name, copy=copy, | |
closed=closed) |
The code above could be restructured so that the dtype
argument, if present, takes precedence over the type of data
. Seems like that would be more sensible than the current approach for this corner case, but on the fence about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I c. ok can open an issue about this, but yes I would agree should infer with a passed dtype first before switching on the type of the data.
Codecov Report
@@ Coverage Diff @@
## master #21254 +/- ##
==========================================
+ Coverage 91.84% 91.84% +<.01%
==========================================
Files 153 153
Lines 49538 49540 +2
==========================================
+ Hits 45499 45501 +2
Misses 4039 4039
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can yoou replicate the test in #21243 in test_tile, otherwise lgtm.
# GH 21243/21253 | ||
if isinstance(constructor, partial) and constructor.func is Index: | ||
# Index is defined to create CategoricalIndex from categorical data | ||
pytest.skip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I c. ok can open an issue about this, but yes I would agree should infer with a passed dtype first before switching on the type of the data.
thanks @jschendel |
… appropriate dtype (pandas-dev#21254) (cherry picked from commit 686f604)
… appropriate dtype (pandas-dev#21254)
git diff upstream/master -u -- "*.py" | flake8 --diff
Added this to 0.23.1 since it's a regression and the fix is a minor change outside the
IntervalIndex
class. Not opposed to pushing to 0.24.0 if backporting this could be problematic.