Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This PR implements functionality to generate summary statistics for ` Dataframe.groupby() ` operation via `.describe() ` method, similar to Pandas. ``` >>> import pandas as pd >>> pdf = pd.DataFrame({"Speed": [380.0, 370.0, 24.0, 26.0], "Score": [50, 30, 90, 80]}) >>> pdf Speed Score 0 380.0 50 1 370.0 30 2 24.0 90 3 26.0 80 >>> pdf.groupby('Score').describe() Speed count mean std min 25% 50% 75% max Score 30 1.0 370.0 NaN 370.0 370.0 370.0 370.0 370.0 50 1.0 380.0 NaN 380.0 380.0 380.0 380.0 380.0 80 1.0 26.0 NaN 26.0 26.0 26.0 26.0 26.0 90 1.0 24.0 NaN 24.0 24.0 24.0 24.0 24.0 >>> import cudf >>> gdf = cudf.from_pandas(pdf) >>> gdf.groupby('Score').describe() count mean std min 25% 50% 75% max Score 30 1 370.0 <NA> 370.0 370.0 370.0 370.0 370.0 50 1 380.0 <NA> 380.0 380.0 380.0 380.0 380.0 80 1 26.0 <NA> 26.0 26.0 26.0 26.0 26.0 90 1 24.0 <NA> 24.0 24.0 24.0 24.0 24.0 ``` Fixes: #7990 Authors: - Sheilah Kirui (https://github.com/skirui-source) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Ashwin Srinath (https://github.com/shwina) - Michael Wang (https://github.com/isVoid) - Christopher Harris (https://github.com/cwharris) URL: #8179
- Loading branch information