Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Multi-level join on multi-indexes #16162

Closed
wants to merge 221 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
221 commits
Select commit Hold shift + click to select a range
613312e
Rebase
Jan 28, 2018
a0045c8
Merge remote-tracking branch 'pandas-dev/master'
Jan 28, 2018
e58292f
Regression in make_block_same_class (tests failing for new fastparque…
minggli Jan 29, 2018
b00eeb3
TST: fix test for MultiIndexPyIntEngine on 32 bit (#19440)
toobaz Jan 29, 2018
ab51851
Misc typos (#19430)
luzpaz Jan 29, 2018
e4e7255
Change Future to DeprecationWarning for make_block_same_class (#19442)
jorisvandenbossche Jan 29, 2018
22228c7
catch PerformanceWarning (#19446)
jbrockmendel Jan 29, 2018
60a8218
CI: pin pymysql<0.8.0 (#19461)
jreback Jan 30, 2018
f6c492e
TST: fix (other check of) test for MultiIndexPyIntEngine on 32 bit (#…
toobaz Jan 30, 2018
ae64e59
remove reference to deprecated .ix from 10min.rst (#19452)
timhoffm Jan 30, 2018
d4d3b33
remove unused (#19466)
jbrockmendel Jan 30, 2018
e25a475
setup.py fixup, closes #19467 (#19472)
jbrockmendel Jan 31, 2018
74cf2dd
Centralize Arithmetic Tests (#19471)
jbrockmendel Jan 31, 2018
69f1bdc
implement bits of numpy_helper in cython where possible (#19450)
jbrockmendel Jan 31, 2018
1c64d7d
[#7292] BUG: asfreq / pct_change strange behavior (#19410)
minggli Jan 31, 2018
73c8d23
DEPR: Deprecate from_items (#18529)
reidy-p Jan 31, 2018
d32f0c2
BUG: Fixed accessor for Categorical[Datetime] (#19469)
TomAugspurger Jan 31, 2018
1785010
DOC: Spellcheck of categorical.rst and visualization.rst (#19428)
tommyod Jan 31, 2018
c4bf26c
DEPR/CLN: Remove pd.rolling_*, pd.expanding* and pd.ewm* (#18723)
topper-123 Feb 1, 2018
9bd1bc5
Organize, Split, Parametrize timezones/timestamps tests (#19473)
jbrockmendel Feb 1, 2018
c5da136
implement test_scalar_compat (#19479)
jbrockmendel Feb 1, 2018
7232932
Refactor out libwriters, fix references to Timestamp, Timedelta (#19413)
jbrockmendel Feb 1, 2018
23cfb38
PERF: remove use of Panel & perf in rolling corr/cov (#19257)
jreback Feb 1, 2018
05400a1
TST: fix up pandas_datareader downstream tests (#19490)
jreback Feb 1, 2018
e295435
BUG: fix issue with concat creating SparseFrame if not all series are…
hexgnu Feb 1, 2018
fba0737
updated hist documentation (#19366)
mitchnegus Feb 1, 2018
826390e
CLN: GH19404 Changing function signature to match logic (#19425)
Feb 1, 2018
a11f48d
ENH limit_area added to interpolate1d
WBare Feb 1, 2018
c753b3f
BUG: Fix problem with SparseDataFrame not persisting to csv (#19441)
hexgnu Feb 1, 2018
073188b
Added E741 to flake8 config (#19496)
TomAugspurger Feb 2, 2018
129a6b8
implement timedeltas.test_scalar_compat (#19503)
jbrockmendel Feb 2, 2018
4e0a32d
Continue de-nesting core.ops (#19448)
jbrockmendel Feb 2, 2018
14ad618
Make DateOffset.kwds a property (#19403)
jbrockmendel Feb 2, 2018
0160927
Fix DTI comparison with None, datetime.date (#19301)
jbrockmendel Feb 2, 2018
5c29123
DOC: Exposed arguments in plot.kde (#19229)
tommyod Feb 2, 2018
f2873e9
ENH: Array Interface and Categorical internals Refactor (#19268)
TomAugspurger Feb 2, 2018
14d5bd1
ERR: Better error msg when merging on tz-aware and tz-naive columns (…
reidy-p Feb 3, 2018
da6f51e
DOC: Spellcheck of enhancingperf.rst (#19516)
tommyod Feb 3, 2018
c2adaf7
TST: Remove duplicate TimdeltaIndex tests (#19509)
jschendel Feb 4, 2018
90f59e9
Frame specific parts of #19504 (#19512)
jbrockmendel Feb 4, 2018
c5c59fa
split Timestamp tests off of 19504 (#19511)
jbrockmendel Feb 4, 2018
98c5fea
ops cleanup, named functions instead of lambdas (#19515)
jbrockmendel Feb 4, 2018
ac941cc
DOC: Improve replace docstring (#18100)
reidy-p Feb 4, 2018
6f302c6
DOC: minor groupby and resampler improvements (#19514)
topper-123 Feb 4, 2018
d45afd9
DEPR: Changing default of str.extract(expand=False) to str.extract(ex…
datapythonista Feb 5, 2018
68d6c0b
TST: Remove legacy instances of _multiprocess_can_split_ (#19536)
jschendel Feb 5, 2018
9c25d3c
remove unused calendar options from period_helper (#19534)
jbrockmendel Feb 5, 2018
4ee165c
BUG: groupby with resample using on parameter errors when selecting c…
discort Feb 5, 2018
181fea4
TST: Fix makeIntIndex, benchmark get loc
toobaz Feb 5, 2018
ad9e205
DOC: Fix typo in example (#19537)
ppflrs Feb 5, 2018
e4ddbaf
BUG: don't assume series is length > 0
hexgnu Feb 6, 2018
d7dcac2
TST: fix and test index division by zero
jbrockmendel Feb 6, 2018
cc1b1e7
DOC: Remove repeated duplicated word (#19546)
GuessWhoSamFoo Feb 6, 2018
09c6317
centralize and split frame division tests (#19527)
jbrockmendel Feb 6, 2018
9985077
Fix parsing corner case closes #19382 (#19529)
jbrockmendel Feb 6, 2018
8862812
Collect Series timezone tests (#19541)
jbrockmendel Feb 6, 2018
bc1cd6d
DOC/ERR: better error message on no common merge keys (#19427)
swyoon Feb 6, 2018
325df9f
BUGFIX - AttributeError raised in StataReader.value_labels() (#19510)
miker985 Feb 6, 2018
25c2f08
separate DatetimeIndex timezone tests (#19545)
jbrockmendel Feb 6, 2018
5fa85e9
BUG: Fix ts precision issue with groupby and NaT (#19526)
jbandlow Feb 6, 2018
390aa9d
Cleaned up return of _get_cython_function (#19561)
WillAyd Feb 7, 2018
1c824e6
DEPR/CLN: fix from_items deprecation warnings (#19559)
reidy-p Feb 7, 2018
88455cb
Implement get_day_of_year, tests (#19555)
jbrockmendel Feb 7, 2018
0359bd6
fixed bug in df.aggregate passing non-existent columns (#19552)
discort Feb 7, 2018
e5fa17c
ERR: raise KeyError on invalid column name in aggregate (#19566)
jreback Feb 7, 2018
7a19781
Frame ops prelims - de-duplicate, remove unused kwargs (#19522)
jbrockmendel Feb 7, 2018
2e45a27
API/BUG: .apply will correctly infer output shape when axis=1 (#18577)
jreback Feb 7, 2018
a44efdb
BUG: Fixes rounding error in Timestamp.floor() (#19240)
cbertinato Feb 7, 2018
a10f2e0
DOC: some clean-up of the apply docs (follow-up #18577) (#19573)
jorisvandenbossche Feb 8, 2018
b1e3422
Remove duplicated logic from period_helper (#19540)
jbrockmendel Feb 8, 2018
f9db3b5
CI: Run ASV on Travis for failed benchmarks (#19236)
mroeschke Feb 8, 2018
6c88f53
BUG: Fixed merge on dtype equal categories (#19553)
TomAugspurger Feb 8, 2018
e30498a
PERF: Correct signature for group_nth / group_object (#19579)
TomAugspurger Feb 8, 2018
c259dad
DOC: doc/source/indexing.rst says pd.df.ix is deprecated, show warnin…
xpvpc Feb 8, 2018
845a74a
Simplify argument passing in period_helper (#19550)
jbrockmendel Feb 8, 2018
e0c6c25
separate numeric tests so we can isolate division by zero (#19336)
jbrockmendel Feb 8, 2018
91c76cc
Bug: adds support for unary plus (#19297)
deniederhut Feb 8, 2018
51c976c
Ignore warnings when reading pickle files (#19580)
TomAugspurger Feb 9, 2018
f523886
ENH: added an optional css id to `<table>` tags created by `frame.to_…
samghelms Feb 9, 2018
b5d4128
CI: Fixed NumPy pinning in conda-build (#19575)
TomAugspurger Feb 9, 2018
7246381
API: Default ExtensionArray.astype (#19604)
TomAugspurger Feb 9, 2018
5ea49ef
PERF: Cythonize Groupby Rank (#19481)
WillAyd Feb 10, 2018
94a696a
Consolidate nth / last object Groupby Implementations (#19610)
WillAyd Feb 10, 2018
58c8009
Revert "Consolidate nth / last object Groupby Implementations (#19610)"
jreback Feb 10, 2018
b98e595
ENH: df.assign accepting dependent **kwargs (#14207) (#18852)
datajanko Feb 10, 2018
fbdd613
Fix left join turning into outer join (#19624)
elrubio Feb 10, 2018
12b9c0c
function for frequently repeated tzconversion code (#19625)
jbrockmendel Feb 10, 2018
8488572
API: Allow ordered=None in CategoricalDtype (#18889)
jschendel Feb 10, 2018
0a10bbe
order of exceptions in array_to_datetime (#19621)
jbrockmendel Feb 10, 2018
308558c
Consolidated Groupby nth / last object Templates (#19635)
WillAyd Feb 10, 2018
d07884d
Continue porting period_helper to cython (#19608)
jbrockmendel Feb 10, 2018
5d17b20
fix overflows in Timestamp.tz_localize near boundaries (#19626)
jbrockmendel Feb 11, 2018
8433c0e
move shift_months test to test_arithmetic (#19636)
jbrockmendel Feb 11, 2018
8dffb15
move libfreqs and liboffsets tests to test_tslibs, move parsing tests…
jbrockmendel Feb 11, 2018
b4cdff8
Fix uncaught OutOfBounds in array_to_datetime (#19612)
jbrockmendel Feb 11, 2018
cb480ab
test_astype portion of #19627 (#19637)
jbrockmendel Feb 11, 2018
8bbb469
move timedelta test_astype test (#19639)
jbrockmendel Feb 11, 2018
c9334fe
Organize PeriodIndex tests (#19641)
jbrockmendel Feb 11, 2018
c416fea
TST: Add to_csv test when writing the single column CSV (#19091)
Licht-T Feb 11, 2018
82f011b
TST: set multi_statement flag for pymysql tests (#19619)
jorisvandenbossche Feb 11, 2018
067984a
move array_to_datetime timests (#19640)
jbrockmendel Feb 12, 2018
aa97648
BUG: assign doesnt cast SparseDataFrame to DataFrame (#19178)
hexgnu Feb 12, 2018
49a016b
TST: placement of network error catching in s3 tests (#19645)
jreback Feb 13, 2018
89a5df2
De-duplicate masking/fallback logic in ops (#19613)
jbrockmendel Feb 13, 2018
541b5e5
REF: Internal / External values (#19558)
TomAugspurger Feb 13, 2018
eb52d99
DOC: ignore Panel deprecation warnings during doc build (#19663)
jorisvandenbossche Feb 13, 2018
2134e52
DOC: fix IPython spelling (#19683)
Carreau Feb 13, 2018
16fc751
Explicitly set dtype of np.lexsort in group_rank (#19679)
WillAyd Feb 13, 2018
e42b61f
BUG: Do not round DatetimeIndex nanosecond precision when iterating (…
mroeschke Feb 14, 2018
491801e
COMPAT-18589: Supporting axis in Series.rename (#18923)
AaronCritchley Feb 14, 2018
93bfede
Performance increase rolling min max (#19549)
hexgnu Feb 14, 2018
984d068
tests for tslibs.conversion and tslibs.timezones (#19642)
jbrockmendel Feb 14, 2018
eed6647
Spellchecked io.rst (#19660)
tommyod Feb 14, 2018
408773d
CI: Move conda build and ASV check to cron job (#19698)
TomAugspurger Feb 14, 2018
e654b81
GroupBy Rank SegFault Fix - astype instead of view (#19701)
WillAyd Feb 14, 2018
e7a26b0
DOC: Ambiguous description in to_parquet engine documentation (#19669)
giba0 Feb 15, 2018
196596a
ENH: groupby().is_monotonic_increasing #17015 (#17453)
No-Stream Feb 15, 2018
bdd6a33
DOC: improve docs to clarify MultiIndex indexing (#19507)
cbrnr Feb 15, 2018
6a6f897
add missing args, make kwarg explicit (#19691)
jbrockmendel Feb 15, 2018
0cde46b
remove usages of _get_na_value (#19692)
jbrockmendel Feb 16, 2018
335314a
TST: Parametrize PeriodIndex tests (#19659)
jbrockmendel Feb 17, 2018
6173edf
DOC: Updated tutorials with additional info, new version and added so…
tommyod Feb 18, 2018
383f7ea
collect index formatting tests (#19661)
jbrockmendel Feb 18, 2018
563367f
finish off tests.tseries.test_timezones (#19739)
jbrockmendel Feb 18, 2018
c0f761d
dispatch frame methods to series versions instead of re-implementing …
jbrockmendel Feb 18, 2018
de9e867
Removed if...else for K > 1 (#19734)
WillAyd Feb 18, 2018
1f4484c
Dispatch categorical Series ops to Categorical (#19582)
jbrockmendel Feb 18, 2018
84a0e23
DOC/BLD: Pinning sphinx to 1.5, as 1.7 has been released and it's inc…
datapythonista Feb 18, 2018
d27bd54
DOC: correct merge_asof example (#19737)
ZhuBaohe Feb 18, 2018
0ca6680
FIX: const-correctness in numpy helpers (#19749)
tacaswell Feb 18, 2018
52be57d
DOC/BLD: update vendored IPython.sphinxext version (#19765)
jorisvandenbossche Feb 19, 2018
e9bb374
add test for numpy ops, esp. nanmin/max bug for np<1.13 (#19753)
topper-123 Feb 19, 2018
9e8794c
DOC: correct Period.strftime exsample (#19758)
ZhuBaohe Feb 19, 2018
15232fd
ENH: fake http proxy in case of --skip-network testing (#19757)
yarikoptic Feb 19, 2018
505bf5e
DOC: correct Panel.apply exsample (#19766)
ZhuBaohe Feb 19, 2018
81e2f76
split off scalar tests to submodules (#19752)
jbrockmendel Feb 19, 2018
ea14495
CI: remove PIP & old conda build in favor of pandas-ci buildsx (#19775)
jreback Feb 19, 2018
f83893c
ENH: implement Timedelta.__mod__ and __divmod__ (#19755)
jbrockmendel Feb 19, 2018
5b931a2
Fix Timedelta floordiv, rfloordiv with offset, fix td64 return types …
jbrockmendel Feb 19, 2018
30f9b18
Reduce redirection in ops (#19649)
jbrockmendel Feb 19, 2018
63fc36a
Fix the non cython build for cpp extensions (#19707)
hexgnu Feb 19, 2018
b419650
DOC: whatsnew typo cleanup
jreback Feb 20, 2018
df4fd45
DOC: fix various warnings and errors in the docs (from deprecations/a…
jorisvandenbossche Feb 20, 2018
c116584
Split+Parametrize Timedelta tests (#19736)
jbrockmendel Feb 20, 2018
e075e3b
BUG: GH19458 fixes precision issue in TimeDelta.total_seconds() (#19783)
mikekutzma Feb 20, 2018
0ea6a5a
DOC: whatsnew cleanups
jreback Feb 20, 2018
c0bd94f
DOC: typos in whatsnew
jreback Feb 20, 2018
cca6300
DOC: added a reference to DataFrame assign in concatenate section of …
obilodeau Feb 20, 2018
1e3ff82
Sparse Ops Cleanup (#19782)
jbrockmendel Feb 21, 2018
a60e325
BUG: fix Period.asfreq conversion near datetime(1, 1, 1) (#19650)
jbrockmendel Feb 21, 2018
e8e925b
DOC: correct Series.searchsorted example (#19784)
ZhuBaohe Feb 21, 2018
5f82d60
DOC: Edit installation instructions for clarity. (#19798)
antquinonez Feb 21, 2018
7077fe9
BF: Skip test_read_excel_parse_dates if no xlwt which is used in to_e…
yarikoptic Feb 21, 2018
e7e1712
DEPR: Add deprecation warning for factorize() order keyword (#19751)
EricChea Feb 21, 2018
440fc8d
BUG: drop_duplicates not raising KeyError on missing key (#19730)
NoahTheDuke Feb 21, 2018
c767f22
ASV: excel asv occasional failure (#19811)
jreback Feb 21, 2018
db4c8e9
DOC: Add example of how to preserve order of columns with usecols. (#…
EricChea Feb 21, 2018
8875ecb
TST: move more series tests to test_arithmetic (#19794)
jbrockmendel Feb 21, 2018
e975455
Fix name setting in DTI/TDI __add__ and __sub__ (#19744)
jbrockmendel Feb 21, 2018
0bfda02
parametrize a whole mess of tests (#19785)
jbrockmendel Feb 22, 2018
4726b84
DEPR: remove pandas.core.common is_* (#19769)
jreback Feb 22, 2018
4ea1508
DOC: Clarify and add fill_value example in arithmetic ops (#19675)
HagaiHargil Feb 22, 2018
51350bc
DOC: added plotting module to the api reference docs (#19780)
mehemken Feb 22, 2018
891ee92
API: Validate keyword arguments to fillna (#19684)
TomAugspurger Feb 22, 2018
49bfc0b
Fix Index __mul__-like ops with timedelta scalars (#19333)
jbrockmendel Feb 22, 2018
11c14e1
DOC: Improving code quality of doc/make.py, PEP-8, refactoring and re…
datapythonista Feb 22, 2018
ab48369
DOC: RangeIndex as default index (#19781)
max-sixty Feb 22, 2018
613983b
Update df.to_stata() docstring (#19818)
kylebarron Feb 22, 2018
ea382a7
DOC: correct Series.reset_index example (#19832)
ZhuBaohe Feb 22, 2018
69eac1e
implement add_offset_array for PeriodIndex (#19826)
jbrockmendel Feb 22, 2018
f3836c4
ENH: Add columns parameter to from_dict (#19802)
reidy-p Feb 22, 2018
c660e2a
fix Timedelta.__mul__(NaT) (#19819)
jbrockmendel Feb 22, 2018
a31d2ad
Fix rfloordiv return type, un-xfail Timedelta tests (#19820)
jbrockmendel Feb 22, 2018
c1dda28
BUG: Fix qcut with NaT present (#19833)
jschendel Feb 22, 2018
290f410
CI: Align pep8speaks config with setup.cfg (#19841)
jorisvandenbossche Feb 22, 2018
80d6ccb
DOC: Making doc/source/conf.py pass PEP-8, and added to lint (#19839)
datapythonista Feb 22, 2018
c320854
Let initialisation from dicts use insertion order for py>=36, part I …
topper-123 Feb 23, 2018
c05f632
BUG: Fix MultiIndex .loc with all numpy arrays (#19772)
spacesphere Feb 23, 2018
8466004
DOC: correct min_count param docstring (#19836)
ZhuBaohe Feb 23, 2018
2c657dd
Continue porting period_helper; fix leftover asfreq bug (#19834)
jbrockmendel Feb 23, 2018
d5c6167
BUG: fix index op names and pinning (#19723)
jbrockmendel Feb 23, 2018
4242a0e
DOC: Spellcheck of gotchas.rst (FAQ page) (#19747)
tommyod Feb 23, 2018
0468afe
ENH: Allow storing ExtensionArrays in containers (#19520)
TomAugspurger Feb 23, 2018
5c41b2d
Separate TimedeltaIndex mul/div tests (#19848)
jbrockmendel Feb 23, 2018
c5631bb
DOC: misc. typos (#19876)
luzpaz Feb 24, 2018
eb60dae
DOC: remove deprecated from_items from dsintro docs (#19837)
jorisvandenbossche Feb 24, 2018
f8a3e72
De-duplicate add_offset_array methods (#19835)
jbrockmendel Feb 24, 2018
ab0bcfc
Let initialisation from dicts use insertion order for python >= 3.6 (…
topper-123 Feb 24, 2018
f001b70
BUG: fix Series constructor for scalar and Categorical dtype (#19717)
cbertinato Feb 24, 2018
5d0f3d5
Raise OptionError instead of KeyError in __getattr__. Fixes #19789. …
jayfoad Feb 24, 2018
fbc8d72
Keep subclassing in apply (#19823)
jaumebonet Feb 24, 2018
faf595e
REF: Base class for all extension tests (#19863)
TomAugspurger Feb 24, 2018
92299cd
DOC: Updated links to 2 tutorials in tutorials.rst (#19857)
tommyod Feb 24, 2018
0057ee2
templatize timedelta arith ops (#19871)
jbrockmendel Feb 24, 2018
06088a8
COMPAT: fixup decimal extension for indexing compat (#19882)
jreback Feb 24, 2018
ffa89a6
CI: pin jemalloc=4.5.0.poast for 2.7 build per (#19888)
jreback Feb 25, 2018
08732e0
Cythonized GroupBy Fill (#19673)
WillAyd Feb 25, 2018
e5981d1
Fixed pct_change with 'fill_method' returning NaN instead of 0 (#19875)
WillAyd Feb 25, 2018
a00f41a
Use pandas_datetimestruct instead of date_info (#19874)
jbrockmendel Feb 25, 2018
86dfeae
Fix+test DTI/TDI/PI add/sub with ndarray[datetime64/timedelta64] (#19…
jbrockmendel Feb 25, 2018
21bc4d5
Fixed issue with leftover test.json file (#19879)
jjames34 Feb 25, 2018
b9149b0
ENH: ISO8601-compliant datetime string conversion in `iterrows()` and…
minggli Feb 25, 2018
48785e6
parameterize test_pct_change_periods_freq (#19897)
minggli Feb 25, 2018
6003ff6
DOC: Make API reference intro section concise (#19846)
Feb 26, 2018
769e4c2
DOC/BLD: unpin sphinx to use sphinx 1.7 (#19687)
jorisvandenbossche Feb 26, 2018
51093fc
DOC: fix numpydoc section titles in misc plotting docstrings (#19899)
jorisvandenbossche Feb 26, 2018
eed8092
DOC: small typo fix (#19901)
ColCarroll Feb 26, 2018
e3c5467
cleanup order of operations kludges (#19895)
jbrockmendel Feb 26, 2018
d9f57a4
make ops.add_foo take just class (#19828)
jbrockmendel Feb 26, 2018
3471271
Test Decorators and Better Pytest Integration in 'test_excel' (#19829)
WillAyd Feb 26, 2018
5508704
BUG: Fix Series constructor for Categorical with index (#19714)
cbertinato Feb 27, 2018
69f0e8b
CLN: Remove Series._from_array (#19893)
jaumebonet Feb 27, 2018
25fc828
DOC fix incorrect example in DataFrame.to_dict docstring. Close GH198…
Feb 27, 2018
2d10b35
handle NaT add/sub in one place (#19903)
jbrockmendel Feb 27, 2018
892dd3d
ASV: Added seek to buffer to fix xlwt asv failure (#19926)
WillAyd Feb 27, 2018
dc4bf8a
TST: Debug flaky plotting test (#19925)
TomAugspurger Feb 27, 2018
4a4f3f2
Rebase
Jan 28, 2018
f00a2d1
Merge remote-tracking branch 'pandas-dev/master'
Feb 28, 2018
593d6cb
Rebase
harisbal Mar 11, 2018
4708db0
Merge branch 'master' of https://github.com/pandas-dev/pandas
Mar 11, 2018
5da2e44
Merge branch 'master' into multi-index-merge
Mar 11, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 68 additions & 56 deletions doc/source/merging.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,10 @@ operations.
Concatenating objects
---------------------

The :func:`~pandas.concat` function (in the main pandas namespace) does all of
the heavy lifting of performing concatenation operations along an axis while
performing optional set logic (union or intersection) of the indexes (if any) on
the other axes. Note that I say "if any" because there is only a single possible
The :func:`~pandas.concat` function (in the main pandas namespace) does all of
the heavy lifting of performing concatenation operations along an axis while
performing optional set logic (union or intersection) of the indexes (if any) on
the other axes. Note that I say "if any" because there is only a single possible
axis of concatenation for Series.

Before diving into all of the details of ``concat`` and what it can do, here is
Expand Down Expand Up @@ -109,9 +109,9 @@ some configurable handling of "what to do with the other axes":
to the actual data concatenation.
- ``copy`` : boolean, default True. If False, do not copy data unnecessarily.

Without a little bit of context many of these arguments don't make much sense.
Let's revisit the above example. Suppose we wanted to associate specific keys
with each of the pieces of the chopped up DataFrame. We can do this using the
Without a little bit of context many of these arguments don't make much sense.
Let's revisit the above example. Suppose we wanted to associate specific keys
with each of the pieces of the chopped up DataFrame. We can do this using the
``keys`` argument:

.. ipython:: python
Expand All @@ -138,9 +138,9 @@ It's not a stretch to see how this can be very useful. More detail on this
functionality below.

.. note::
It is worth noting that :func:`~pandas.concat` (and therefore
:func:`~pandas.append`) makes a full copy of the data, and that constantly
reusing this function can create a significant performance hit. If you need
It is worth noting that :func:`~pandas.concat` (and therefore
:func:`~pandas.append`) makes a full copy of the data, and that constantly
reusing this function can create a significant performance hit. If you need
to use the operation over several datasets, use a list comprehension.

::
Expand All @@ -153,7 +153,7 @@ Set logic on the other axes
~~~~~~~~~~~~~~~~~~~~~~~~~~~

When gluing together multiple DataFrames, you have a choice of how to handle
the other axes (other than the one being concatenated). This can be done in
the other axes (other than the one being concatenated). This can be done in
the following three ways:

- Take the (sorted) union of them all, ``join='outer'``. This is the default
Expand Down Expand Up @@ -216,8 +216,8 @@ DataFrame:
Concatenating using ``append``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A useful shortcut to :func:`~pandas.concat` are the :meth:`~DataFrame.append`
instance methods on ``Series`` and ``DataFrame``. These methods actually predated
A useful shortcut to :func:`~pandas.concat` are the :meth:`~DataFrame.append`
instance methods on ``Series`` and ``DataFrame``. These methods actually predated
``concat``. They concatenate along ``axis=0``, namely the index:

.. ipython:: python
Expand Down Expand Up @@ -263,8 +263,8 @@ need to be:

.. note::

Unlike the :py:meth:`~list.append` method, which appends to the original list
and returns ``None``, :meth:`~DataFrame.append` here **does not** modify
Unlike the :py:meth:`~list.append` method, which appends to the original list
and returns ``None``, :meth:`~DataFrame.append` here **does not** modify
``df1`` and returns its copy with ``df2`` appended.

.. _merging.ignore_index:
Expand Down Expand Up @@ -362,9 +362,9 @@ Passing ``ignore_index=True`` will drop all name references.
More concatenating with group keys
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A fairly common use of the ``keys`` argument is to override the column names
A fairly common use of the ``keys`` argument is to override the column names
when creating a new ``DataFrame`` based on existing ``Series``.
Notice how the default behaviour consists on letting the resulting ``DataFrame``
Notice how the default behaviour consists on letting the resulting ``DataFrame``
inherit the parent ``Series``' name, when these existed.

.. ipython:: python
Expand Down Expand Up @@ -460,7 +460,7 @@ Appending rows to a DataFrame
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

While not especially efficient (since a new object must be created), you can
append a single row to a ``DataFrame`` by passing a ``Series`` or dict to
append a single row to a ``DataFrame`` by passing a ``Series`` or dict to
``append``, which returns a new ``DataFrame`` as above.

.. ipython:: python
Expand Down Expand Up @@ -505,15 +505,15 @@ pandas has full-featured, **high performance** in-memory join operations
idiomatically very similar to relational databases like SQL. These methods
perform significantly better (in some cases well over an order of magnitude
better) than other open source implementations (like ``base::merge.data.frame``
in R). The reason for this is careful algorithmic design and the internal layout
in R). The reason for this is careful algorithmic design and the internal layout
of the data in ``DataFrame``.

See the :ref:`cookbook<cookbook.merge>` for some advanced strategies.

Users who are familiar with SQL but new to pandas might be interested in a
:ref:`comparison with SQL<compare_with_sql.join>`.

pandas provides a single function, :func:`~pandas.merge`, as the entry point for
pandas provides a single function, :func:`~pandas.merge`, as the entry point for
all standard database join operations between ``DataFrame`` objects:

::
Expand Down Expand Up @@ -582,7 +582,7 @@ and ``right`` is a subclass of DataFrame, the return type will still be
``DataFrame``.

``merge`` is a function in the pandas namespace, and it is also available as a
``DataFrame`` instance method :meth:`~DataFrame.merge`, with the calling
``DataFrame`` instance method :meth:`~DataFrame.merge`, with the calling
``DataFrame `` being implicitly considered the left object in the join.

The related :meth:`~DataFrame.join` method, uses ``merge`` internally for the
Expand All @@ -594,7 +594,7 @@ Brief primer on merge methods (relational algebra)

Experienced users of relational databases like SQL will be familiar with the
terminology used to describe join operations between two SQL-table like
structures (``DataFrame`` objects). There are several cases to consider which
structures (``DataFrame`` objects). There are several cases to consider which
are very important to understand:

- **one-to-one** joins: for example when joining two ``DataFrame`` objects on
Expand Down Expand Up @@ -634,8 +634,8 @@ key combination:
labels=['left', 'right'], vertical=False);
plt.close('all');

Here is a more complicated example with multiple join keys. Only the keys
appearing in ``left`` and ``right`` are present (the intersection), since
Here is a more complicated example with multiple join keys. Only the keys
appearing in ``left`` and ``right`` are present (the intersection), since
``how='inner'`` by default.

.. ipython:: python
Expand Down Expand Up @@ -751,13 +751,13 @@ Checking for duplicate keys

.. versionadded:: 0.21.0

Users can use the ``validate`` argument to automatically check whether there
are unexpected duplicates in their merge keys. Key uniqueness is checked before
merge operations and so should protect against memory overflows. Checking key
uniqueness is also a good way to ensure user data structures are as expected.
Users can use the ``validate`` argument to automatically check whether there
are unexpected duplicates in their merge keys. Key uniqueness is checked before
merge operations and so should protect against memory overflows. Checking key
uniqueness is also a good way to ensure user data structures are as expected.

In the following example, there are duplicate values of ``B`` in the right
``DataFrame``. As this is not a one-to-one merge -- as specified in the
In the following example, there are duplicate values of ``B`` in the right
``DataFrame``. As this is not a one-to-one merge -- as specified in the
``validate`` argument -- an exception will be raised.


Expand All @@ -770,11 +770,11 @@ In the following example, there are duplicate values of ``B`` in the right

In [53]: result = pd.merge(left, right, on='B', how='outer', validate="one_to_one")
...
MergeError: Merge keys are not unique in right dataset; not a one-to-one merge
MergeError: Merge keys are not unique in right dataset; not a one-to-one merge

If the user is aware of the duplicates in the right ``DataFrame`` but wants to
ensure there are no duplicates in the left DataFrame, one can use the
``validate='one_to_many'`` argument instead, which will not raise an exception.
If the user is aware of the duplicates in the right ``DataFrame`` but wants to
ensure there are no duplicates in the left DataFrame, one can use the
``validate='one_to_many'`` argument instead, which will not raise an exception.

.. ipython:: python

Expand All @@ -786,8 +786,8 @@ ensure there are no duplicates in the left DataFrame, one can use the
The merge indicator
~~~~~~~~~~~~~~~~~~~

:func:`~pandas.merge` accepts the argument ``indicator``. If ``True``, a
Categorical-type column called ``_merge`` will be added to the output object
:func:`~pandas.merge` accepts the argument ``indicator``. If ``True``, a
Categorical-type column called ``_merge`` will be added to the output object
that takes on values:

=================================== ================
Expand Down Expand Up @@ -895,7 +895,7 @@ Joining on index
~~~~~~~~~~~~~~~~

:meth:`DataFrame.join` is a convenient method for combining the columns of two
potentially differently-indexed ``DataFrames`` into a single result
potentially differently-indexed ``DataFrames`` into a single result
``DataFrame``. Here is a very basic example:

.. ipython:: python
Expand Down Expand Up @@ -975,9 +975,9 @@ indexes:
Joining key columns on an index
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:meth:`~DataFrame.join` takes an optional ``on`` argument which may be a column
:meth:`~DataFrame.join` takes an optional ``on`` argument which may be a column
or multiple column names, which specifies that the passed ``DataFrame`` is to be
aligned on that column in the ``DataFrame``. These two function calls are
aligned on that column in the ``DataFrame``. These two function calls are
completely equivalent:

::
Expand All @@ -987,7 +987,7 @@ completely equivalent:
how='left', sort=False)

Obviously you can choose whichever form you find more convenient. For
many-to-one joins (where one of the ``DataFrame``'s is already indexed by the
many-to-one joins (where one of the ``DataFrame``'s is already indexed by the
join key), using ``join`` may be more convenient. Here is a simple example:

.. ipython:: python
Expand Down Expand Up @@ -1125,20 +1125,25 @@ This is equivalent but less verbose and more memory efficient / faster than this
Joining with two multi-indexes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is not implemented via ``join`` at-the-moment, however it can be done using
the following code.
As of Pandas 0.23.1 the :func:`Dataframe.join` can be used to join multi-indexed ``Dataframe`` instances on the overlaping index levels

.. ipython:: python

index = pd.MultiIndex.from_tuples([('K0', 'X0'), ('K0', 'X1'),
index_left = pd.MultiIndex.from_tuples([('K0', 'X0'), ('K0', 'X1'),
('K1', 'X2')],
names=['key', 'X'])
left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']},
index=index)
index=index_left)

index_right = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'),
('K2', 'Y2'), ('K2', 'Y3')],
names=['key', 'Y'])
right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=index_right)

result = pd.merge(left.reset_index(), right.reset_index(),
on=['key'], how='inner').set_index(['key','X','Y'])
left.join(right)

.. ipython:: python
:suppress:
Expand All @@ -1148,6 +1153,13 @@ the following code.
labels=['left', 'right'], vertical=False);
plt.close('all');

For earlier versions it can be done using the following.

.. ipython:: python

pd.merge(left.reset_index(), right.reset_index(),
on=['key'], how='inner').set_index(['key','X','Y'])

.. _merging.merge_on_columns_and_levels:

Merging on a combination of columns and index levels
Expand Down Expand Up @@ -1254,7 +1266,7 @@ similarly.
Joining multiple DataFrame or Panel objects
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A list or tuple of ``DataFrames`` can also be passed to :meth:`~DataFrame.join`
A list or tuple of ``DataFrames`` can also be passed to :meth:`~DataFrame.join`
to join them together on their indexes.

.. ipython:: python
Expand All @@ -1276,7 +1288,7 @@ Merging together values within Series or DataFrame columns
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Another fairly common situation is to have two like-indexed (or similarly
indexed) ``Series`` or ``DataFrame`` objects and wanting to "patch" values in
indexed) ``Series`` or ``DataFrame`` objects and wanting to "patch" values in
one object from values for matching indices in the other. Here is an example:

.. ipython:: python
Expand All @@ -1301,7 +1313,7 @@ For this, use the :meth:`~DataFrame.combine_first` method:
plt.close('all');

Note that this method only takes values from the right ``DataFrame`` if they are
missing in the left ``DataFrame``. A related method, :meth:`~DataFrame.update`,
missing in the left ``DataFrame``. A related method, :meth:`~DataFrame.update`,
alters non-NA values inplace:

.. ipython:: python
Expand Down Expand Up @@ -1353,15 +1365,15 @@ Merging AsOf

.. versionadded:: 0.19.0

A :func:`merge_asof` is similar to an ordered left-join except that we match on
nearest key rather than equal keys. For each row in the ``left`` ``DataFrame``,
we select the last row in the ``right`` ``DataFrame`` whose ``on`` key is less
A :func:`merge_asof` is similar to an ordered left-join except that we match on
nearest key rather than equal keys. For each row in the ``left`` ``DataFrame``,
we select the last row in the ``right`` ``DataFrame`` whose ``on`` key is less
than the left's key. Both DataFrames must be sorted by the key.

Optionally an asof merge can perform a group-wise merge. This matches the
Optionally an asof merge can perform a group-wise merge. This matches the
``by`` key equally, in addition to the nearest match on the ``on`` key.

For example; we might have ``trades`` and ``quotes`` and we want to ``asof``
For example; we might have ``trades`` and ``quotes`` and we want to ``asof``
merge them.

.. ipython:: python
Expand Down Expand Up @@ -1420,8 +1432,8 @@ We only asof within ``2ms`` between the quote time and the trade time.
by='ticker',
tolerance=pd.Timedelta('2ms'))

We only asof within ``10ms`` between the quote time and the trade time and we
exclude exact matches on time. Note that though we exclude the exact matches
We only asof within ``10ms`` between the quote time and the trade time and we
exclude exact matches on time. Note that though we exclude the exact matches
(of the quotes), prior quotes **do** propagate to that point in time.

.. ipython:: python
Expand Down
34 changes: 34 additions & 0 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,40 @@ The :func:`get_dummies` now accepts a ``dtype`` argument, which specifies a dtyp
pd.get_dummies(df, columns=['c']).dtypes
pd.get_dummies(df, columns=['c'], dtype=bool).dtypes

.. _whatsnew_0230.enhancements.join_with_two_multiindexes:

Joining with two multi-indexes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

As of Pandas 0.23.1 the :func:`Dataframe.join` can be used to join multi-indexed ``Dataframe`` instances on the overlaping index levels

See the :ref:`Merge, join, and concatenate
<merging.Join_with_two_multi_indexes>` documentation section.

.. ipython:: python

index_left = pd.MultiIndex.from_tuples([('K0', 'X0'), ('K0', 'X1'),
('K1', 'X2')],
names=['key', 'X'])
left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']},
index=index_left)

index_right = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'),
('K2', 'Y2'), ('K2', 'Y3')],
names=['key', 'Y'])
right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=index_right)

left.join(right)

For earlier versions it can be done using the following.

.. ipython:: python

pd.merge(left.reset_index(), right.reset_index(),
on=['key'], how='inner').set_index(['key','X','Y'])

.. _whatsnew_0230.enhancements.merge_on_columns_and_levels:

Expand Down
Loading