[FEA] Groupby shift #7183

beckernick · 2021-01-21T16:05:01Z

I'd like to be able to use shift on groupby Series and DataFrame objects. Today, I can do this in pandas but not cudf.

import pandas as pd
import dask.dataframe as dd
import cudf

pdf = pd.DataFrame({
    "a": [0,1,0,1,1,0],
    "b": range(6),
    "c": ["a","b","c","d","e","f"]
})
gdf = cudf.from_pandas(pdf)

print(pdf.groupby("a").b.shift(1))
gdf.groupby("a").b.shift(1)
0    NaN
1    NaN
2    0.0
3    1.0
4    3.0
5    2.0
Name: b, dtype: float64
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-13-2e03f810ad06> in <module>
     11 
     12 print(pdf.groupby("a").b.shift(1))
---> 13 gdf.groupby("a").b.shift(1)

/raid/nicholasb/miniconda3/envs/rapids-gpu-bdb-20210120/lib/python3.7/site-packages/cudf/core/groupby/groupby.py in __getattribute__(self, key)
     61     def __getattribute__(self, key):
     62         try:
---> 63             return super().__getattribute__(key)
     64         except AttributeError:
     65             if key in libgroupby._GROUPBY_AGGS:

AttributeError: 'SeriesGroupBy' object has no attribute 'shift'

conda list | grep "rapids\|blazing\|dask\|distr\|pandas"
# packages in environment at /raid/nicholasb/miniconda3/envs/rapids-gpu-bdb-20210120:
blazingsql                0.18.0a0                 pypi_0    pypi
cudf                      0.18.0a210120   cuda_10.2_py37_g02e25b6f3d_183    rapidsai-nightly
cuml                      0.18.0a210120   cuda10.2_py37_g816bb6506_79    rapidsai-nightly
dask                      2021.1.0           pyhd8ed1ab_0    conda-forge
dask-core                 2021.1.0           pyhd8ed1ab_0    conda-forge
dask-cuda                 0.18.0a201211           py37_39    http://conda-mirror.gpuci.io/rapidsai-nightly
dask-cudf                 0.18.0a210120   py37_g02e25b6f3d_183    http://conda-mirror.gpuci.io/rapidsai-nightly
distributed               2021.1.0         py37h89c1867_1    conda-forge
faiss-proc                1.0.0                      cuda    http://conda-mirror.gpuci.io/rapidsai-nightly
libcudf                   0.18.0a210120   cuda10.2_g02e25b6f3d_183    rapidsai-nightly
libcuml                   0.18.0a210120   cuda10.2_g816bb6506_79    rapidsai-nightly
libcumlprims              0.18.0a201203   cuda10.2_gff080f3_0    http://conda-mirror.gpuci.io/rapidsai-nightly
librmm                    0.18.0a210120   cuda10.2_gce99588_23    rapidsai-nightly
pandas                    1.1.5            py37hdc94413_0    conda-forge
rmm                       0.18.0a210120   cuda_10.2_py37_gce99588_23    http://conda-mirror.gpuci.io/rapidsai-nightly
ucx                       1.9.0+gcd9efd3       cuda10.2_0    http://conda-mirror.gpuci.io/rapidsai-nightly
ucx-proc                  1.0.0                       gpu    http://conda-mirror.gpuci.io/rapidsai-nightly
ucx-py                    0.18.0a210120   py37_gcd9efd3_10    http://conda-mirror.gpuci.io/rapidsai-nightly

The text was updated successfully, but these errors were encountered:

github-actions · 2021-02-26T16:26:19Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

randerzander · 2021-03-03T01:41:46Z

Still a valid [FEA]

minhlong94 · 2021-03-09T03:50:01Z

This is a very useful feature. Should be implemented. +1

taureandyernv · 2021-03-31T19:35:48Z

@harrism @beckernick , this feature request was mentioned in https://stackoverflow.com/questions/66863973/cudf-an-alternative-of-pandas-groupby-shift. Definitely seems like there is demand :)

kkraus14 · 2021-03-31T19:38:57Z

This is planned for 0.20.

Part 1 (libcudf side) of #7183 This PR adds `groupby::shift` API, performs group based shifts. The main difference between regular `shift` and `groupby::shift`, is that value gets clipped, and `<NA>` gets introduced at group boundaries. Example: ``` key = [1, 1, 1, 1, 2, 2, 2] val = [3, 4, 5, 6, 7, 8, 9] offset = 2 fill_value = <NA> # No fill for boundary values result = [<NA>, <NA>, 3, 4, <NA>, <NA>, 7] ``` ``` key = [1, 1, 1, 1, 2, 2, 2] val = [3, 4, 5, 6, 7, 8, 9] offset = 2 fill_value = 42 # Fill 42 for boundary values result = [42, 42, 3, 4, 42, 42, 7] ``` Implementation notes: Current implementation is based on `copy_if_else`, where `lhs` is the segmented values iterator with an offset, and `rhs` is a constant iterator to the fill scalar. Authors: - Michael Wang (https://github.com/isVoid) Approvers: - Mike Wendt (https://github.com/mike-wendt) - Nghia Truong (https://github.com/ttnghia) - Keith Kraus (https://github.com/kkraus14) - Karthikeyan (https://github.com/karthikeyann) - Mark Harris (https://github.com/harrism) URL: #7910

github-actions · 2021-04-30T23:04:31Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

Closes #7183 , follow up of #7910 This PR: - refactors existing libcudf `groupby::shift` API, which only takes a single column, to accept multiple columns. - adds cython and python bindings for `groupby.shift`. Example python usage: ``` df = cudf.DataFrame({"a":[1,2,1,2,2], "b":["x", "y", "z", "42", "7"]}) >>> df.groupby("a").shift(1) b a 1 <NA> 1 x 2 <NA> 2 y 2 42 ``` Minor refactors: - adds `use_thread` parameter to `dataset_generator.rand_dataframe` to expose thread pool config. Authors: - Michael Wang (https://github.com/isVoid) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Robert Maynard (https://github.com/robertmaynard) - Ashwin Srinath (https://github.com/shwina) - Keith Kraus (https://github.com/kkraus14) - Karthikeyan (https://github.com/karthikeyann) - Christopher Harris (https://github.com/cwharris) URL: #8131

beckernick added feature request New feature or request Needs Triage Need team to review and classify labels Jan 21, 2021

kkraus14 added Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Jan 27, 2021

github-actions bot added the inactive-30d label Feb 26, 2021

randerzander removed the inactive-30d label Mar 3, 2021

harrism assigned karthikeyann Mar 30, 2021

harrism assigned isVoid and unassigned karthikeyann Mar 31, 2021

isVoid mentioned this issue Apr 8, 2021

Add groupby::shift API #7910

Merged

github-actions bot added the inactive-30d label Apr 30, 2021

isVoid mentioned this issue May 1, 2021

Groupby.shift c++ API refactor and python binding #8131

Merged

gabrielspmoreira mentioned this issue May 4, 2021

[FEA] Add window functions NVIDIA-Merlin/NVTabular#740

Open

rapids-bot bot closed this as completed in #8131 May 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Groupby shift #7183

[FEA] Groupby shift #7183

beckernick commented Jan 21, 2021 •

edited by randerzander

Loading

github-actions bot commented Feb 26, 2021

randerzander commented Mar 3, 2021

minhlong94 commented Mar 9, 2021

taureandyernv commented Mar 31, 2021 •

edited

Loading

kkraus14 commented Mar 31, 2021

github-actions bot commented Apr 30, 2021

[FEA] Groupby shift #7183

[FEA] Groupby shift #7183

Comments

beckernick commented Jan 21, 2021 • edited by randerzander Loading

github-actions bot commented Feb 26, 2021

randerzander commented Mar 3, 2021

minhlong94 commented Mar 9, 2021

taureandyernv commented Mar 31, 2021 • edited Loading

kkraus14 commented Mar 31, 2021

github-actions bot commented Apr 30, 2021

beckernick commented Jan 21, 2021 •

edited by randerzander

Loading

taureandyernv commented Mar 31, 2021 •

edited

Loading