Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH/API: rolling_apply to pass frames to the rolled function (rather than ndarrays) #5071

Closed
jreback opened this issue Oct 1, 2013 · 10 comments · Fixed by #20584
Closed
Labels
API Design Enhancement Groupby Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Oct 1, 2013

Returning a Series: http://stackoverflow.com/questions/19121854/using-rolling-apply-on-a-dataframe-object

Returning a Scalar: http://stackoverflow.com/questions/21040766/python-pandas-rolling-apply-two-column-input-into-function/21045831#21045831

@jseabold
Copy link
Contributor

jseabold commented Jan 9, 2014

+1

I was just trying to do similar. Would be nice if rolling_apply, expanding_apply had an option to work over the whole DataFrame. It doesn't even have to pass frames, but rather just roll over the whole 0 axis instead of one series at a time.

@ghost
Copy link

ghost commented Jan 9, 2014

That sounds equivalent to the split-apply(-combine) approach of groupby, only pandas doesn't
currently provide that sort of split function.

related #4059

@twiecki
Copy link
Contributor

twiecki commented Jun 22, 2015

Just ran into the same issue.

@frankz-ai
Copy link

same issue here

@max-sixty
Copy link
Contributor

max-sixty commented Apr 22, 2016

@jreback What's the best way to do this?

If I try and change the _apply method on _Rolling to take pandas objects rather than numpy arrays, a few of the standard functions fail (e.g. _zsqrt):

...
return _zsqrt(algos.roll_var(arg, window, minp, ddof))
TypeError: Argument 'input' has incorrect type (expected numpy.ndarray, got Series)

Could this be done in roll_generic? Or with an additional path other that the standard _apply for user-supplied functions? Neither seem that compelling

@jreback
Copy link
Contributor Author

jreback commented Apr 22, 2016

So just to have an example

In [32]: df = DataFrame({'A' : np.random.randn(5), 'B' : np.random.randint(0,10,size=5)})

In [33]: def f(x):
    print type(x)
    return x.sum()
   ....: 

In [34]: df.rolling(2).apply(f)
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
Out[34]: 
          A     B
0       NaN   NaN
1 -0.414646  15.0
2  1.007150   8.0
3  1.822979   2.0
4  0.884894   4.0

The issue is that you need to pass a constructed object to algos.roll_generic (or maybe a new function) which does the windowing.

here

@max-sixty
Copy link
Contributor

Is this do-able with roll_generic? It seems that requires an array:

In [28]: series=pd.Series(range(10),dtype='float64')

In [29]: roll_generic(series, win=2, minp=2, offset=0, func=lambda x: x.sum(), args=[], kwargs={})
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-29-3ec0f9465dad> in <module>()
----> 1 roll_generic(series, win=2, minp=2, offset=0, func=lambda x: x.sum(), args=[], kwargs={})

TypeError: Argument 'input' has incorrect type (expected numpy.ndarray, got Series)

Does that mean we need a parallel function which operates on Series?

I could imagine having a function that generated the groups - then it would actually be a groupby. But haven't thought through it enough and performance may be an issue.

@jreback
Copy link
Contributor Author

jreback commented Apr 22, 2016

no u have to change roll_generic to take an object

doing with GroupBy is a whole separate idea - I may do that but it's orthogonal (and the reason is different than this)

@max-sixty
Copy link
Contributor

OK, I haven't worked with Cython before, and not sure how it handles non-numpy arrays, but I can have a go. Probably won't have immediate results.

@citynorman
Copy link

Almost 3 years and it's still an issue :'(
`

import pandas as pd
import numpy as np

def distance_sum(df):
    print df
    df['norm1']=df.ix[:,0]/df.ix[0,0]
    df['norm2']=df.ix[:,1]/df.ix[0,1]
    return np.sum(np.square(df['norm1']-df['norm2']))

df=pd.DataFrame({'a':np.array([1,2,3]),'b':np.array([10,20,30])})
df.rolling(center=False,window=2).apply(distance_sum)

`

AttributeError Traceback (most recent call last)
in ()
9
10 df=pd.DataFrame({'a':np.array([1,2,3]),'b':np.array([10,20,30])})
---> 11 df.rolling(center=False,window=2).apply(distance_sum)

/usr/local/lib/python2.7/dist-packages/pandas/core/generic.pyc in getattr(self, name)
2358 return self[name]
2359 raise AttributeError("'%s' object has no attribute '%s'" %
-> 2360 (type(self).name, name))
2361
2362 def setattr(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'rolling'

OR


AttributeError Traceback (most recent call last)
in ()
14
15 t=pd.DataFrame({'a':a,'b':b})
---> 16 t.rolling(center=False,window=2).apply(test_distance_sum)

/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in apply(self, func, args, kwargs)

/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in apply(self, func, args, kwargs)

/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in _apply(self, func, name, window, center, check_minp, how, **kwargs)

/usr/local/lib/python2.7/dist-packages/numpy/lib/shape_base.pyc in apply_along_axis(func1d, axis, arr, _args, *_kwargs)
89 outshape = asarray(arr.shape).take(indlist)
90 i.put(indlist, ind)
---> 91 res = func1d(arr[tuple(i.tolist())], _args, *_kwargs)
92 # if res is a number, then we have a smaller output array
93 if isscalar(res):

/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in calc(x)

/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in f(arg, window, min_periods)

pandas/algos.pyx in pandas.algos.roll_generic (pandas/algos.c:51577)()

in test_distance_sum(df)
9 def test_distance_sum(df):
10 print df
---> 11 df['pxnorm1']=df.ix[:,0]/df.ix[0,0]
12 df['pxnorm2']=df.ix[:,1]/df.ix[0,1]
13 return np.mean(df)#np.sum(np.square(df['pxnorm1']-df['pxnorm2']))

AttributeError: 'numpy.ndarray' object has no attribute 'ix'

@jreback jreback modified the milestones: Interesting Issues, Next Major Release Sep 11, 2017
@jreback jreback modified the milestones: Interesting Issues, Next Major Release Nov 26, 2017
jreback added a commit to jreback/pandas that referenced this issue Apr 2, 2018
jreback added a commit to jreback/pandas that referenced this issue Apr 2, 2018
jreback added a commit to jreback/pandas that referenced this issue Apr 10, 2018
jreback added a commit to jreback/pandas that referenced this issue Apr 12, 2018
jreback added a commit to jreback/pandas that referenced this issue Apr 13, 2018
jreback added a commit to jreback/pandas that referenced this issue Apr 14, 2018
jreback added a commit to jreback/pandas that referenced this issue Apr 15, 2018
jreback added a commit to jreback/pandas that referenced this issue Apr 16, 2018
@jreback jreback modified the milestones: Next Major Release, 0.23.0 Apr 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Enhancement Groupby Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants