Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Groupby cumulative count #1296

Closed
beckernick opened this issue Mar 26, 2019 · 8 comments · Fixed by #7759
Closed

[FEA] Groupby cumulative count #1296

beckernick opened this issue Mar 26, 2019 · 8 comments · Fixed by #7759
Assignees
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.

Comments

@beckernick
Copy link
Member

beckernick commented Mar 26, 2019

Is your feature request related to a problem? Please describe.
As a cuDF user, I want to assign numbers to each observation of a group reflecting its order of occurrence in the group.

The equivalent in the pandas API doc is here.

Describe the solution you'd like
I'd like to be able to call df.groupby(col).cumcount() and return a column of the same size containing the numberings described above.

@beckernick beckernick added feature request New feature or request Needs Triage Need team to review and classify labels Mar 26, 2019
@kkraus14 kkraus14 added Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Apr 8, 2019
@harrism
Copy link
Member

harrism commented Apr 16, 2019

Implementation is basically the same as groupby.cumsum. Will move to 0.8 along with #1298

@daxiongshu
Copy link

Is this the same function as the following code?

def get_order_in_group(y, order):
    for i in range(cuda.threadIdx.x, len(y), cuda.blockDim.x):
        order[i] = i

got = df.groupby(['y']).apply_grouped(get_order_in_group,incols=['y'],
                                  outcols={'order': 'int32'},
                                  tpb=self.tpb)

The full notebook link, cell 15.

@karthikeyann
Copy link
Contributor

karthikeyann commented Feb 12, 2021

This operation requires output to be in same order as input at column level. is that right?
because order only within group is simply sequence from 0 to size(group).

@kkraus14
Copy link
Collaborator

This operation requires output to be in same order as input at column level. is that right?
because order only within group is simply sequence from 0 to size(group).

It only requires the output order within groups to be stable. The ordering of the grouping keys is not required.

@karthikeyann karthikeyann self-assigned this Mar 16, 2021
rapids-bot bot pushed a commit that referenced this issue Mar 23, 2021
Adds support for groupby scan operations. 

Addresses part of 
#1298 cumsum
#1296 cumcount

- sum
- min
- max
- count

Authors:
  - Karthikeyan (@karthikeyann)
  - Michael Wang (@isVoid)

Approvers:
  - Vukasin Milovanovic (@vuule)
  - Jake Hemstad (@jrhemstad)
  - Nghia Truong (@ttnghia)
  - David (@davidwendt)

URL: #7387
@harrism
Copy link
Member

harrism commented Mar 24, 2021

@karthikeyann is this implemented as part of 7387?

@karthikeyann
Copy link
Contributor

karthikeyann commented Mar 25, 2021

libcudf part is implemented.
Cython, Python code is not done yet.

@kkraus14
Copy link
Collaborator

@karthikeyann unassigned you since I figured someone else would do the Cython / Python code, but if you're tackling it please reassign yourself.

@karthikeyann
Copy link
Contributor

I am already working on it.

@karthikeyann karthikeyann self-assigned this Mar 30, 2021
rapids-bot bot pushed a commit that referenced this issue Apr 22, 2021
closes #1296 Groupby cumulative count 
closes #1298 Groupby cumulative sum 

- [x] Add cython code for groupby scan (cannot mix reduce aggs and scan aggs)
- [x] Add python code for groupby scan functions - cumsum, cummin, cummax, cumcount, groupby.agg()
- [x] unit tests

Authors:
  - Karthikeyan (https://github.com/karthikeyann)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Keith Kraus (https://github.com/kkraus14)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #7759
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants