You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I want to merge two DataFrame's, where column 'name' of a matches index named 'name' on b, with the old syntax I needed to write a.merge(b, left_on='name', right_index=True). This would produce a new DataFrame with the index of a retained as an index, the shared column/index retained as a column, and other columns of a and b also retained (as shown in expected output).
With recent versions, however, I could just use a.merge(b, on='name'), but it drops the index of a (merging on column mode instead of merging on index).
The join is done on columns or indexes. If joining columns on columns, the
DataFrame indexes will be ignored. Otherwise if joining indexes on indexes
or indexes on a column or columns, the index will be passed on.
Perhaps this could be clarified in the doc, but the behaviour is at least inconsistent. For me it is more intuitive to perform an index merge internally in this case.
Expected Output
b
c
a
10
7
a
11
6
b
15
4
c
21
2
d
Output of pd.show_versions()
[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS
commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 7
machine : AMD64
processor : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : None.None
Code Sample
Problem description
If I want to merge two DataFrame's, where column 'name' of
a
matches index named 'name' onb
, with the old syntax I needed to writea.merge(b, left_on='name', right_index=True)
. This would produce a new DataFrame with the index ofa
retained as an index, the shared column/index retained as a column, and other columns ofa
andb
also retained (as shown in expected output).With recent versions, however, I could just use
a.merge(b, on='name')
, but it drops the index ofa
(merging on column mode instead of merging on index).The documentation states (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html):
Perhaps this could be clarified in the doc, but the behaviour is at least inconsistent. For me it is more intuitive to perform an index merge internally in this case.
Expected Output
Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here below this line]INSTALLED VERSIONS
commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 7
machine : AMD64
processor : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : None.None
pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.2.0.post20200210
Cython : 0.29.15
pytest : None
hypothesis : None
sphinx : 2.4.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.11.1
IPython : 7.12.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.3
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.13
tables : 3.6.1
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None
The text was updated successfully, but these errors were encountered: