Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix KeyError in merge on CategoricalIndex #20777

Merged

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Apr 21, 2018

For categorical type indices a KeyError is raised when the index level is used during a merge on an index level

Example:

import pandas as pd
left = pd.DataFrame(
    {"left_data": [1, 2]},
   index=pd.CategoricalIndex(data=["A", "B"], categories=["A", "B"], name='index_col')
)
right = pd.DataFrame(
      {"right_data": [1.0, 2.0]},
   index=pd.CategoricalIndex(data=["A", "B"], categories=["A", "B"], name='index_col')
)
result = left.merge(right, on=['index_col'])

With this fix, the behavior of the test test_merge_datetime_index(self, klass) changed, though. IMHO, the behavior in this PR is more consistent since it is the same for all input types but I'm not sure what the actual behavior should be and I couldn't find a section in the documentation explaining this path.

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@jreback
Copy link
Contributor

jreback commented Apr 21, 2018

can u show. short example of what you are trying to do

@fjetter
Copy link
Member Author

fjetter commented Apr 21, 2018

The following code raises a KeyError if the index is of categorical type but is ok for all other types

import pandas as pd
left = pd.DataFrame(
    {"left_data": [1, 2]},
   index=pd.CategoricalIndex(data=["A", "B"], categories=["A", "B"], name='index_col')
)
right = pd.DataFrame(
      {"right_data": [1.0, 2.0]},
   index=pd.CategoricalIndex(data=["A", "B"], categories=["A", "B"], name='index_col')
)
result = left.merge(right, on=['index_col'])

I updated the title and the description

@fjetter fjetter changed the title BUG: Fix group key inference for CategoricalIndex BUG: Fix KeyError in merge on CategoricalIndex Apr 21, 2018
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you can fix up the formatting will have another look. pls add a whatsnew note as well. (reshaping bug fixes)



@pytest.mark.parametrize('index',
[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write like

@pytest.mark.parametrize(
    'index',
     [

so it is closer to the left margin, then each of the indexes can be writen more simply, also don't use the data kwarg
e.g.
```Index(['A', 'B'], name='index_col')``

name='index_col'),
])
def test_merge_index_types(index):
left = DataFrame(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be a 1-liner

@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Apr 21, 2018
@fjetter fjetter force-pushed the bugfix/groupby_categorical_index_level branch from 48c7e4b to 7189f5a Compare April 22, 2018 15:52
@codecov
Copy link

codecov bot commented Apr 22, 2018

Codecov Report

Merging #20777 into master will increase coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20777      +/-   ##
==========================================
+ Coverage   91.81%   91.85%   +0.03%     
==========================================
  Files         153      153              
  Lines       49471    49310     -161     
==========================================
- Hits        45422    45292     -130     
+ Misses       4049     4018      -31
Flag Coverage Δ
#multiple 90.24% <100%> (+0.03%) ⬆️
#single 41.89% <50%> (+0.04%) ⬆️
Impacted Files Coverage Δ
pandas/core/reshape/merge.py 94.25% <ø> (ø) ⬆️
pandas/core/algorithms.py 94.38% <100%> (-0.11%) ⬇️
pandas/core/indexing.py 93.08% <0%> (-0.48%) ⬇️
pandas/core/strings.py 98.32% <0%> (-0.31%) ⬇️
pandas/core/dtypes/base.py 91.89% <0%> (-0.22%) ⬇️
pandas/core/dtypes/cast.py 87.85% <0%> (-0.21%) ⬇️
pandas/core/indexes/api.py 98.78% <0%> (-0.15%) ⬇️
pandas/core/series.py 93.9% <0%> (-0.13%) ⬇️
pandas/core/dtypes/missing.py 92.85% <0%> (-0.09%) ⬇️
pandas/core/indexes/datetimelike.py 96.72% <0%> (-0.08%) ⬇️
... and 31 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d3d3352...0664858. Read the comment docs.

@fjetter fjetter force-pushed the bugfix/groupby_categorical_index_level branch from b7b9124 to a152f3e Compare April 23, 2018 06:20
@fjetter
Copy link
Member Author

fjetter commented May 3, 2018

@jreback I fixed the formatting and added a changelog entry for the bug I fixed. Since I didn't know for sure if the other change in behavior is intended I left it out for now. Can add another entry once this is settled

@jreback jreback added this to the 0.23.0 milestone May 3, 2018
@jreback jreback merged commit 21f5fb1 into pandas-dev:master May 3, 2018
@jreback
Copy link
Contributor

jreback commented May 3, 2018

thanks @fjetter nice patch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants