pd.Concat with empty series drops series.name and series.index.name #24685

ericksc · 2019-01-09T16:40:38Z

Concat with empty Series

Using the following code:

import pandas as pd

#Create a empty Series
empty_s = pd.Series()
s1 = pd.Series(['a', 'b'], name='beta', index=pd.Index(['X', 'Y'], name='alpha'))
s2 = pd.concat([empty_s, s1])
print('Using empty series')
print(s2)

#Create a empty Series with name and index
empty_s = pd.Series(name='beta', index=pd.Index([], name='alpha'))
s1 = pd.Series(['a', 'b'], name='beta', index=pd.Index(['X', 'Y'], name='alpha'))
s2 = pd.concat([empty_s, s1])
print('Using empty series with index and name defined')
print(s2)

pd.show_versions()

Problem description

The code pasted intend to probe the what if an empty series is concated with a series fully defined
To get the expected result:
Same series with name expected name and index.

at pandas 0.17 it works as expected but when it ran at pandas 0.18 and so on it drops expected name and index.

it only works that empty series with forced name and index as the following snapshot shows.

Same result with pandas 0.23.4

jreback · 2019-01-09T17:08:47Z

pls try with 0.23.4 , master if you can

ericksc · 2019-01-09T17:42:24Z

pls try with 0.23.4 , master if you can

with pandas 0.23.4 -> same result Updated problem description

mroeschke · 2019-01-09T18:09:08Z

I can also reproduce on master

In [19]: empty_s = pd.Series()
    ...: s1 = pd.Series(['a', 'b'], name='beta', index=pd.Index(['X', 'Y'], name='alpha'))
    ...: s2 = pd.concat([empty_s, s1])

In [20]: s1
Out[20]:
alpha
X    a
Y    b
Name: beta, dtype: object

In [21]: s2
Out[21]:
X    a
Y    b
dtype: object

In [22]: pd.__version__
Out[22]: '0.24.0.dev0+1557.gdecc8ce56'

avant1 · 2019-07-12T11:19:01Z

Looks like DataFrame index name is lost too after concatenating it with empty dataframe:

import pandas as pd
df = pd.DataFrame([['a', 5]], columns=['whatever', 'id']).set_index('id')
empty_df = pd.DataFrame(columns=['type']) #note: at least one column in empty DF is required
result = pd.concat([empty_df, df], sort=False)

print(result)

#  type whatever
#5  NaN        a

Everything works as expected if empty DataFrame has no columns:

import pandas as pd
df = pd.DataFrame([['a', 5]], columns=['whatever', 'id']).set_index('id')
empty_df = pd.DataFrame(columns=[]) # no columns here
result = pd.concat([empty_df, df], sort=False)

print(result)

#   whatever
#id         
#5         a

Installed versions

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-957.12.1.el7.x86_64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: None
pip: 19.1.1
setuptools: 20.10.1
Cython: None
numpy: 1.16.3
scipy: 1.3.0
pyarrow: None
xarray: None
IPython: 7.5.0
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.10.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

johannes-mueller · 2020-03-20T13:07:15Z

Probably the same issue in another context: Series.name is lost, when the Series object is put into a DataFrame. Even if the Series.name is set after it has been put into the DataFrame, it is lost as soon as any manipulation (even with a different columns) on the DataFrame occurs. In contrast to Series.name, Index.name is preserved.

Sample code

import pandas as pd

idx = pd.Index(['x', 'y', 'z'], name='my_index')
a = pd.Series([1,2,3], name='series_a', index=idx)
b = pd.Series([4,5,6], name='series_b', index=idx)

df = pd.DataFrame({'a': a})
assert idx.name == df.index.name       # succeeds
assert a.index.name == df.a.index.name # succeeds
df.a.name = a.name                     # shouldn't be neccesary, but obviously is
assert a.name == df.a.name             # fails if the previous line is not present

df['b'] = b
assert idx.name == df.index.name       # succeeds
assert b.index.name == df.b.index.name # succeeds
df.a.name = a.name                     # shouldn't be neccesary, but obviously is
assert a.name == df.a.name             # fails if the previous line is not present

df['b'] *= 2
assert idx.name == df.index.name       # succeeds
assert b.index.name == df.b.index.name # succeeds
assert a.name == df.a.name             # fails

INSTALLED VERSIONS ------------------ commit : None python : 3.8.1.final.0 python-bits : 64 OS : Linux OS-release : 5.3.0-42-lowlatency machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : de_DE.UTF-8 LOCALE : de_DE.UTF-8

pandas : 1.0.2
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.0.0.post20200309
Cython : 0.29.15
pytest : 5.4.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.3
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.4.1
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : 0.15.0
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
numba : None

phofl · 2020-09-12T02:41:26Z

In #11082 this was declared a bug. Should we close this issue or has that changed?

jreback · 2020-09-12T20:29:39Z

#11082 was fully patched

In [5]: empty_s = pd.Series(dtype=object) 
   ...: s1 = pd.Series(['a', 'b'], name='beta', index=pd.Index(['X', 'Y'], name='alpha')) 
   ...: pd.concat([empty_s, s1])                                                                                                                                 
Out[5]: 
X    a
Y    b
dtype: object

looks ok to me

so would review the postings here, but I think they might be fixed (maybe needs a test)

phofl · 2020-09-12T21:24:06Z

The Poster proposed that the result would keep the name of the non empty Series, which is exactly the opposite #11082, that is the reason why I am asking. We have already tests checking that the Name is dropped

mroeschke · 2021-06-25T05:10:41Z

Since a test exists that the name is dropped, looks like we can close this issue

mroeschke added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jan 9, 2019

johannes-mueller mentioned this issue Mar 20, 2020

_metadata items of subclassed pd.Series are not propagated into corresponding SubclassedDataFrame #32860

Open

mroeschke closed this as completed Jun 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pd.Concat with empty series drops series.name and series.index.name #24685

pd.Concat with empty series drops series.name and series.index.name #24685

ericksc commented Jan 9, 2019 •

edited

Loading

jreback commented Jan 9, 2019

ericksc commented Jan 9, 2019

mroeschke commented Jan 9, 2019

avant1 commented Jul 12, 2019

johannes-mueller commented Mar 20, 2020

phofl commented Sep 12, 2020

jreback commented Sep 12, 2020

phofl commented Sep 12, 2020

mroeschke commented Jun 25, 2021

pd.Concat with empty series drops series.name and series.index.name #24685

pd.Concat with empty series drops series.name and series.index.name #24685

Comments

ericksc commented Jan 9, 2019 • edited Loading

Concat with empty Series

Problem description

jreback commented Jan 9, 2019

ericksc commented Jan 9, 2019

mroeschke commented Jan 9, 2019

avant1 commented Jul 12, 2019

johannes-mueller commented Mar 20, 2020

Sample code

phofl commented Sep 12, 2020

jreback commented Sep 12, 2020

phofl commented Sep 12, 2020

mroeschke commented Jun 25, 2021

ericksc commented Jan 9, 2019 •

edited

Loading