Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Allow for join between two multi-index dataframe instances #20356

Merged
merged 25 commits into from
Nov 15, 2018

Conversation

harisbal
Copy link
Contributor

@harisbal harisbal commented Mar 15, 2018

closes #16162
closes #6360

Allow to join on multiple levels for multi-indexed dataframe instances

@harisbal harisbal changed the title ENH: Allow for join between to multi-index dataframe instances ENH: Allow for join between two multi-index dataframe instances Mar 15, 2018
@jreback jreback added Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode API Design and removed API Design labels Mar 15, 2018
doc/source/merging.rst Outdated Show resolved Hide resolved
doc/source/merging.rst Outdated Show resolved Hide resolved
doc/source/merging.rst Outdated Show resolved Hide resolved
doc/source/whatsnew/v0.23.0.txt Outdated Show resolved Hide resolved
pandas/core/indexes/base.py Outdated Show resolved Hide resolved
pandas/core/indexes/base.py Show resolved Hide resolved
pandas/core/reshape/merge.py Outdated Show resolved Hide resolved
pandas/tests/reshape/merge/test_merge.py Outdated Show resolved Hide resolved
'LinkType', 'Distance'])
.set_index(['Origin', 'Destination', 'Period', 'LinkType']))

def f():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there other error conditions to test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you show a mini-example?

Copy link
Contributor Author

@harisbal harisbal Apr 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback Here is an example of a join on 2 multilevel indexed (same levels) dfs using two different methods

  1. pd.merge(df1.reset_index(), df2.reset_index(),...)
  2. df1.join(df2)

The results differ. Do you think that's an issue? I'm facing a similar issue when I try a multi-level join.

import numpy as np
import pandas as pd

join_type='left'

left_multi=(
        pd.DataFrame(
            dict(Origin=['A', 'A', 'B', 'B', 'C'],
                 Destination=[np.nan] * 5,
                 Trips=[1987, 3647, 2470, 4296, 4444]),
            columns=['Origin', 'Destination', 'Trips'])
        .set_index(['Origin', 'Destination']))

right_multi=(
        pd.DataFrame(
            dict(Origin=['A', 'A', 'B', 'B', 'C', 'C', 'E'],
                 Destination=[np.nan] * 7,
                 Distance=[100, 80, 90, 80, 75, 35, 55]),
            columns=['Origin', 'Destination', 'Distance'])
        .set_index(['Origin', 'Destination']))

on_cols = ['Origin', 'Destination']
idx_cols = ['Origin', 'Destination']

expected = (pd.merge(left_multi.reset_index(),
                     right_multi.reset_index(),
                     how=join_type, on=on_cols).set_index(idx_cols)
            .sort_index())

result = left_multi.join(right_multi, how=join_type).sort_index()

print(expected)
print(result)


# Inject -1 in the labels list where a join was not possible
# IOW indexer[i]=-1
labels = [restore_labels[i] if i != -1 else -1 for i in indexer]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a set operation on the arrays i think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @jreback but I'm not sure what you mean

Copy link
Contributor Author

@harisbal harisbal Nov 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codecov
Copy link

codecov bot commented Mar 15, 2018

Codecov Report

Merging #20356 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20356      +/-   ##
==========================================
+ Coverage   92.24%   92.25%   +<.01%     
==========================================
  Files         161      161              
  Lines       51339    51376      +37     
==========================================
+ Hits        47360    47397      +37     
  Misses       3979     3979
Flag Coverage Δ
#multiple 90.64% <100%> (ø) ⬆️
#single 42.31% <1.75%> (-0.03%) ⬇️
Impacted Files Coverage Δ
pandas/core/reshape/merge.py 94.24% <100%> (+0.23%) ⬆️
pandas/core/indexes/base.py 96.48% <100%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e413c49...8b5d0aa. Read the comment docs.

@harisbal harisbal force-pushed the multi-index-join branch 2 times, most recently from 310bf7a to a6c9733 Compare March 17, 2018 18:22
@pep8speaks
Copy link

pep8speaks commented Mar 17, 2018

Hello @harisbal! Thanks for updating the PR.

Comment last updated on November 11, 2018 at 04:31 Hours UTC

@shenker
Copy link

shenker commented Sep 14, 2018

Any progress on this?

@jreback
Copy link
Contributor

jreback commented Sep 18, 2018

sorry me take a look. i know this has been outstanding for quite some time.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rebase and let's get this in

doc/source/whatsnew/v0.23.0.txt Outdated Show resolved Hide resolved
pandas/tests/reshape/merge/test_merge.py Outdated Show resolved Hide resolved
@harisbal
Copy link
Contributor Author

I'll take a look asap. Cheers

@harisbal harisbal force-pushed the multi-index-join branch 2 times, most recently from de6c469 to 50c90cc Compare September 19, 2018 15:56
@harisbal harisbal force-pushed the multi-index-join branch 2 times, most recently from 6bd10f4 to 5689f0a Compare September 19, 2018 16:14
@harisbal
Copy link
Contributor Author

Any idea why pandas-dev.pandas failed?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, @jorisvandenbossche @TomAugspurger if you'd have a look

pandas/core/indexes/base.py Show resolved Hide resolved
pandas/core/reshape/merge.py Outdated Show resolved Hide resolved
pandas/core/reshape/merge.py Outdated Show resolved Hide resolved
pandas/tests/reshape/merge/test_multi.py Show resolved Hide resolved
pandas/tests/reshape/merge/test_multi.py Outdated Show resolved Hide resolved
pandas/tests/reshape/merge/test_multi.py Outdated Show resolved Hide resolved
pandas/tests/reshape/merge/test_multi.py Outdated Show resolved Hide resolved
pandas/tests/reshape/merge/test_multi.py Outdated Show resolved Hide resolved
doc/source/whatsnew/v0.24.0.txt Outdated Show resolved Hide resolved
…ex-join

# Conflicts:
#	doc/source/whatsnew/v0.24.0.txt
#	pandas/core/reshape/merge.py
#	pandas/tests/reshape/merge/test_multi.py
pandas/core/reshape/merge.py Outdated Show resolved Hide resolved
pandas/tests/reshape/merge/test_multi.py Show resolved Hide resolved
pandas/tests/reshape/merge/test_multi.py Outdated Show resolved Hide resolved
@TomAugspurger
Copy link
Contributor

How's this looking? I haven't checked on the changes in a while, but CI is passing.

@jreback
Copy link
Contributor

jreback commented Nov 14, 2018

@TomAugspurger I had some more comments. let me have a look again.

@jreback
Copy link
Contributor

jreback commented Nov 14, 2018

@harisbal can you merge master

@TomAugspurger this lgtm. let's merge and can followup on any small issues.

@TomAugspurger
Copy link
Contributor

Merged master. Ping on green.

@harisbal
Copy link
Contributor Author

Shall I try to merge again?

@TomAugspurger
Copy link
Contributor

I restarted that crashed worker. I haven't seen that failure before.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Nov 15, 2018

All green. Merging.

Thanks!

@TomAugspurger TomAugspurger merged commit 88cbce3 into pandas-dev:master Nov 15, 2018
@harisbal
Copy link
Contributor Author

@jreback @TomAugspurger @WillAyd Thank you so much for everything!!
Sincere apologies for putting you in so much trouble with this PR, I really appreciate your help.
Hopefully my next PR will be smoother.

thoo added a commit to thoo/pandas that referenced this pull request Nov 15, 2018
* upstream/master:
  BUG: to_html misses truncation indicators (...) when index=False (pandas-dev#22786)
  API/DEPR: replace "raise_conflict" with "errors" for df.update (pandas-dev#23657)
  BUG: Append DataFrame to Series with dateutil timezone (pandas-dev#23685)
  CLN/CI: Catch that stderr-warning! (pandas-dev#23706)
  ENH: Allow for join between two multi-index dataframe instances (pandas-dev#20356)
  Ensure Index._data is an ndarray (pandas-dev#23628)
  DOC: flake8-per-pr for windows users (pandas-dev#23707)
  DOC: Handle exceptions when computing contributors. (pandas-dev#23714)
  DOC: Validate space before colon docstring parameters pandas-dev#23483 (pandas-dev#23506)
  BUG-22984 Fix truncation of DataFrame representations (pandas-dev#22987)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: merge multi-index with a multi-index
6 participants