Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.apply() raises ValueError when output df is different size than input df #17437

Closed
jennirinker opened this issue Sep 5, 2017 · 4 comments · Fixed by #18577
Closed
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@jennirinker
Copy link

jennirinker commented Sep 5, 2017

MWE

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.rand(10, 2))  # dummy array
df1 = df.apply(np.fft.fft, axis=0)  # works
print(df1.shape)  # for testing
df2 = df.apply(np.fft.rfft, axis=0)  # breaks
print(df2.shape)  # for testing

Problem description

I would like to take a DataFrame of time series and apply the real-fft along the columns, but it seems that DataFrame.apply only works if the function to be applied returns output that is the same size as the input.

Expected Output

I expect the code block above to run without error and produce the following output:

(10, 2)
(6, 2)

Alternatively, raising an error that tells the user "output array size must match input array size" would be fine, if we want to restrict apply to working only for functions that return the same-size arrays as the inputs.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None

pandas: 0.20.2
pytest: 3.2.1
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.13.1
scipy: 0.19.0
xarray: 0.9.6
IPython: 6.1.0
sphinx: 1.6.2
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Sep 6, 2017

see commentary #15628

this is a thorny issue. someone would have to dig in and see what could / if anything could be done. .apply tries to infer the output shape and is not always successful.

@jreback jreback added Difficulty Advanced Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Sep 6, 2017
@jreback jreback added this to the Next Major Release milestone Sep 6, 2017
@jennirinker
Copy link
Author

Yeah, I figured that this was non-trivial. I suppose the easiest fix would be to just update the documentation so that it explicitly states that .apply must return a vector of the same length as the input. A slightly less trivial fix could be to apply .apply to the first column and use the resulting size to initalize the new dataframe. But I'm not sure if this works with the underlying logic, nor am I certain how much overhead that might add to the function.

@jreback
Copy link
Contributor

jreback commented Sep 6, 2017

no this needs some debugging. can you trace both cases and see where things go wrong? basically step thru.

@jennirinker
Copy link
Author

I can try, but it won't be for a few days at least.

@jreback jreback modified the milestones: Next Major Release, 0.22.0 Nov 30, 2017
jorisvandenbossche pushed a commit that referenced this issue Feb 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants