Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] .describe() after DataFrameGroupBy #7990

Closed
HanwGeek opened this issue Apr 17, 2021 · 5 comments · Fixed by #8179
Closed

[FEA] .describe() after DataFrameGroupBy #7990

HanwGeek opened this issue Apr 17, 2021 · 5 comments · Fixed by #8179
Assignees
Labels
feature request New feature or request Python Affects Python cuDF API.

Comments

@HanwGeek
Copy link

Is your feature request related to a problem? Please describe.

if I want to get describe() information after a .groupby() operation. But I got:

AttributeError: 'DataFrameGroupBy' object has no attribute 'describe'

Describe the solution you'd like
When I execute this:

df.groupby(['col1', 'col2'])[['col3', 'col4']].describe()

I would like to get results the same as pandas.

Thank you very much! Feel free to make any comment about this issue.

@HanwGeek HanwGeek added Needs Triage Need team to review and classify feature request New feature or request labels Apr 17, 2021
@kkraus14 kkraus14 added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Apr 20, 2021
@kkraus14
Copy link
Collaborator

Looks like this is the equivalent of running DataFrameGroupBy.agg(['count', 'mean', 'std', 'min', '25%', '50%', '75%', 'max']) and I believe we support all of these aggregations already.

@HanwGeek
Copy link
Author

Thank you for your reply! And I think these could satisfy my needs. I will close this issue.

@kkraus14 kkraus14 reopened this Apr 21, 2021
@kkraus14
Copy link
Collaborator

Reopening this as we should still support this feature. Glad to hear there's a nice workaround though!

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@quasiben
Copy link
Member

This is being actively worked on here: #8179

@rapids-bot rapids-bot bot closed this as completed in #8179 Jun 7, 2021
rapids-bot bot pushed a commit that referenced this issue Jun 7, 2021
This PR implements  functionality  to generate summary statistics for ` Dataframe.groupby() ` operation  
via `.describe() ` method, similar to Pandas.


```
>>> import pandas as pd
>>> pdf = pd.DataFrame({"Speed": [380.0, 370.0, 24.0, 26.0], "Score": [50, 30, 90, 80]})
>>> pdf
   Speed  Score
0  380.0     50
1  370.0     30
2   24.0     90
3   26.0     80
>>> pdf.groupby('Score').describe()
                                                    Speed                                              
      count   mean std    min    25%    50%    75%    max
Score                                                    
30      1.0  370.0 NaN  370.0  370.0  370.0  370.0  370.0
50      1.0  380.0 NaN  380.0  380.0  380.0  380.0  380.0
80      1.0   26.0 NaN   26.0   26.0   26.0   26.0   26.0
90      1.0   24.0 NaN   24.0   24.0   24.0   24.0   24.0


>>> import cudf
>>> gdf = cudf.from_pandas(pdf)
>>> gdf.groupby('Score').describe()
       count   mean   std    min    25%    50%    75%    max
Score                                                       
30         1  370.0  <NA>  370.0  370.0  370.0  370.0  370.0
50         1  380.0  <NA>  380.0  380.0  380.0  380.0  380.0
80         1   26.0  <NA>   26.0   26.0   26.0   26.0   26.0
90         1   24.0  <NA>   24.0   24.0   24.0   24.0   24.0

```


Fixes: #7990

Authors:
  - Sheilah Kirui (https://github.com/skirui-source)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Ashwin Srinath (https://github.com/shwina)
  - Michael Wang (https://github.com/isVoid)
  - Christopher Harris (https://github.com/cwharris)

URL: #8179
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants