inconsistant result shape from dataframe.apply depending on if a date column is present #14370

KevinGrealish · 2016-10-07T04:33:12Z

A small, complete example of the issue

import pandas as pd
df1 = pd.DataFrame.from_items([('A', [1,2,3]), ('B', [4, 5, 6])])
df2 = pd.DataFrame.from_items([('A', [pd.datetime(1970,1,1), pd.datetime(1970,1,1), pd.datetime(1970,1,1)]), ('B', [4, 5, 6])])
f = lambda row: [row.B + 3]
r1 = df1.apply(f, axis=1)
r2 = df2.apply(f, axis=1)
# bug: r1 and r2 are different and different in shape.
# expect: r1 and r2 to be the same values and shape.
r1
r2
# a clue to the difference:
(df1._is_mixed_type,df1._is_datelike_mixed_type)
(df2._is_mixed_type,df2._is_datelike_mixed_type)

Expected Output

0    [7]
1    [8]
2    [9]

Output of `pd.show_versions()`

## INSTALLED VERSIONS

commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 20.3
Cython: None
numpy: 1.11.1
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jreback · 2016-10-07T10:35:57Z

this is irrespective of other dtypes. You are returning a 2 element list so naturally pandas will try to coerce to the original shape as its compatible. Returning a list is not labeled so that is the only thing pandas can do. You can do this if you want. I also suppose more/better documentation is possible.

In [8]: df1.apply(lambda row: Series([row.B + 3], ['result']), axis=1)
Out[8]: 
   result
0       7
1       8
2       9

KevinGrealish · 2016-10-07T17:22:03Z

Jeff, are you saying it's by design that r1 and r2 come out different, (By Design), or are you saying they should be the same (Won't Fix)?

KevinGrealish · 2016-10-07T17:46:51Z

In my case, f is a function that takes a string and returns a list of strings and it needs to be applied to each row. i.e. I want result cells to have lists in them, I simplified this in the example, but here is a better repro. If what I was doing is under specified to Pandas, I want it to error all the time, not just when another column happens to be a date. It cost a me a lot of time trying to figure out why this only worked some of the time.:

import pandas as pd
df1 = pd.DataFrame.from_items([('A', [1,2,3]), ('B', ["ABCD", "EFGH", "IJKL"])])
df2 = pd.DataFrame.from_items([('A', [pd.datetime(1970,1,1), pd.datetime(1970,1,1), pd.datetime(1970,1,1)]), ('B', ["ABCD", "EFGH", "IJKL"])])
f = lambda row: [char for char in row.B]
r1 = df1.apply(f, axis=1)  # works
r2 = df2.apply(f, axis=1)  # fails
# bug: apply when date present crashes.
# expect: r1 and r2 to be the same values and shape. r1 has the expected value.

The problem is that I now have code that crashes only when a date column is present. Your workaround to use a series:

f = lambda row: pd.Series([[char for char in row.B]], ["result"])
r1 = df1.apply(f, axis=1)
r2 = df2.apply(f, axis=1)

jreback closed this as completed Oct 7, 2016

jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Usage Question labels Oct 7, 2016

jreback added this to the No action milestone Oct 7, 2016

KevinGrealish changed the title ~~inconsistant result shape from dataframe.apply depending on is a date column is present~~ inconsistant result shape from dataframe.apply depending on if a date column is present Oct 7, 2016

jreback mentioned this issue Nov 7, 2016

groupby apply #14605

Closed

jreback mentioned this issue Mar 9, 2017

DOC: Dataframe.apply does not always return a Series when reduce=True #15628

Closed

jreback mentioned this issue Mar 23, 2017

DataFrame.apply() returns DataFrame unexpectedly when length of list stored in cell matches DF-dimensions #15788

Closed

jreback added the Apply Apply, Aggregate, Transform, Map label Sep 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inconsistant result shape from dataframe.apply depending on if a date column is present #14370

inconsistant result shape from dataframe.apply depending on if a date column is present #14370

KevinGrealish commented Oct 7, 2016

jreback commented Oct 7, 2016

KevinGrealish commented Oct 7, 2016 •

edited

Loading

KevinGrealish commented Oct 7, 2016

inconsistant result shape from dataframe.apply depending on if a date column is present #14370

inconsistant result shape from dataframe.apply depending on if a date column is present #14370

Comments

KevinGrealish commented Oct 7, 2016

A small, complete example of the issue

Expected Output

Output of pd.show_versions()

jreback commented Oct 7, 2016

KevinGrealish commented Oct 7, 2016 • edited Loading

KevinGrealish commented Oct 7, 2016

Output of `pd.show_versions()`

KevinGrealish commented Oct 7, 2016 •

edited

Loading