You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I use groupby.cumsum or other scan or segmented shift operations from Python for feature engineering, my results are sorted by key (but in the original order within each key). This can make it a challenging to add the resultant data as a column back into an original dataframe, a common use case when creating lagged or scan-based features. I'd like to be able to return results in the original row order.
Adding results from shift (segmented shift) and cumsum|max|etc. operations as new columns in the original dataframe might potentially require running one boolean masking + setitem operation per unique key in the groupby, which would not scale well.
I remember groupby.shift was implemented prior to the discussion I had with @shwina about the order of return values. Later features (e.g. groupby.fillna) preserves the order of index and was done without libcudf support.
If I use
groupby.cumsum
or other scan or segmented shift operations from Python for feature engineering, my results are sorted by key (but in the original order within each key). This can make it a challenging to add the resultant data as a column back into an original dataframe, a common use case when creating lagged or scan-based features. I'd like to be able to return results in the original row order.Adding results from
shift
(segmented shift) andcumsum|max|etc.
operations as new columns in the original dataframe might potentially require running one boolean masking + setitem operation per unique key in the groupby, which would not scale well.The text was updated successfully, but these errors were encountered: