Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pd.Concat with empty series drops series.name and series.index.name #24685

Closed
ericksc opened this issue Jan 9, 2019 · 9 comments
Closed

pd.Concat with empty series drops series.name and series.index.name #24685

ericksc opened this issue Jan 9, 2019 · 9 comments
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@ericksc
Copy link

ericksc commented Jan 9, 2019

Concat with empty Series

Using the following code:

import pandas as pd

#Create a empty Series
empty_s = pd.Series()
s1 = pd.Series(['a', 'b'], name='beta', index=pd.Index(['X', 'Y'], name='alpha'))
s2 = pd.concat([empty_s, s1])
print('Using empty series')
print(s2)

#Create a empty Series with name and index
empty_s = pd.Series(name='beta', index=pd.Index([], name='alpha'))
s1 = pd.Series(['a', 'b'], name='beta', index=pd.Index(['X', 'Y'], name='alpha'))
s2 = pd.concat([empty_s, s1])
print('Using empty series with index and name defined')
print(s2)

pd.show_versions()

Problem description

The code pasted intend to probe the what if an empty series is concated with a series fully defined
To get the expected result:
Same series with name expected name and index.
image

at pandas 0.17 it works as expected but when it ran at pandas 0.18 and so on it drops expected name and index.

it only works that empty series with forced name and index as the following snapshot shows.

image

Same result with pandas 0.23.4
image

@jreback
Copy link
Contributor

jreback commented Jan 9, 2019

pls try with 0.23.4 , master if you can

@ericksc
Copy link
Author

ericksc commented Jan 9, 2019

pls try with 0.23.4 , master if you can

with pandas 0.23.4 -> same result Updated problem description

@mroeschke
Copy link
Member

I can also reproduce on master

In [19]: empty_s = pd.Series()
    ...: s1 = pd.Series(['a', 'b'], name='beta', index=pd.Index(['X', 'Y'], name='alpha'))
    ...: s2 = pd.concat([empty_s, s1])

In [20]: s1
Out[20]:
alpha
X    a
Y    b
Name: beta, dtype: object

In [21]: s2
Out[21]:
X    a
Y    b
dtype: object

In [22]: pd.__version__
Out[22]: '0.24.0.dev0+1557.gdecc8ce56'

@mroeschke mroeschke added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jan 9, 2019
@avant1
Copy link

avant1 commented Jul 12, 2019

Looks like DataFrame index name is lost too after concatenating it with empty dataframe:

import pandas as pd
df = pd.DataFrame([['a', 5]], columns=['whatever', 'id']).set_index('id')
empty_df = pd.DataFrame(columns=['type']) #note: at least one column in empty DF is required
result = pd.concat([empty_df, df], sort=False)

print(result)

#  type whatever
#5  NaN        a

Everything works as expected if empty DataFrame has no columns:

import pandas as pd
df = pd.DataFrame([['a', 5]], columns=['whatever', 'id']).set_index('id')
empty_df = pd.DataFrame(columns=[]) # no columns here
result = pd.concat([empty_df, df], sort=False)

print(result)

#   whatever
#id         
#5         a
Installed versions
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-957.12.1.el7.x86_64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: None
pip: 19.1.1
setuptools: 20.10.1
Cython: None
numpy: 1.16.3
scipy: 1.3.0
pyarrow: None
xarray: None
IPython: 7.5.0
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.10.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@johannes-mueller
Copy link
Contributor

Probably the same issue in another context: Series.name is lost, when the Series object is put into a DataFrame. Even if the Series.name is set after it has been put into the DataFrame, it is lost as soon as any manipulation (even with a different columns) on the DataFrame occurs. In contrast to Series.name, Index.name is preserved.

Sample code

import pandas as pd

idx = pd.Index(['x', 'y', 'z'], name='my_index')
a = pd.Series([1,2,3], name='series_a', index=idx)
b = pd.Series([4,5,6], name='series_b', index=idx)

df = pd.DataFrame({'a': a})
assert idx.name == df.index.name       # succeeds
assert a.index.name == df.a.index.name # succeeds
df.a.name = a.name                     # shouldn't be neccesary, but obviously is
assert a.name == df.a.name             # fails if the previous line is not present

df['b'] = b
assert idx.name == df.index.name       # succeeds
assert b.index.name == df.b.index.name # succeeds
df.a.name = a.name                     # shouldn't be neccesary, but obviously is
assert a.name == df.a.name             # fails if the previous line is not present

df['b'] *= 2
assert idx.name == df.index.name       # succeeds
assert b.index.name == df.b.index.name # succeeds
assert a.name == df.a.name             # fails
INSTALLED VERSIONS ------------------ commit : None python : 3.8.1.final.0 python-bits : 64 OS : Linux OS-release : 5.3.0-42-lowlatency machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : de_DE.UTF-8 LOCALE : de_DE.UTF-8

pandas : 1.0.2
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.0.0.post20200309
Cython : 0.29.15
pytest : 5.4.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.3
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.4.1
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : 0.15.0
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
numba : None

@phofl
Copy link
Member

phofl commented Sep 12, 2020

In #11082 this was declared a bug. Should we close this issue or has that changed?

@jreback
Copy link
Contributor

jreback commented Sep 12, 2020

#11082 was fully patched

In [5]: empty_s = pd.Series(dtype=object) 
   ...: s1 = pd.Series(['a', 'b'], name='beta', index=pd.Index(['X', 'Y'], name='alpha')) 
   ...: pd.concat([empty_s, s1])                                                                                                                                 
Out[5]: 
X    a
Y    b
dtype: object

looks ok to me

so would review the postings here, but I think they might be fixed (maybe needs a test)

@phofl
Copy link
Member

phofl commented Sep 12, 2020

The Poster proposed that the result would keep the name of the non empty Series, which is exactly the opposite #11082, that is the reason why I am asking. We have already tests checking that the Name is dropped

@mroeschke
Copy link
Member

Since a test exists that the name is dropped, looks like we can close this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

6 participants