Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Create agg() function for dataframes #6483

Merged
merged 44 commits into from
Dec 4, 2020

Conversation

skirui-source
Copy link
Contributor

@skirui-source skirui-source commented Oct 9, 2020

Closes #5247

Adds agg function for DataFrame

@skirui-source skirui-source requested a review from a team as a code owner October 9, 2020 23:18
@GPUtester
Copy link
Collaborator

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

@codecov
Copy link

codecov bot commented Oct 10, 2020

Codecov Report

Merging #6483 (5a8fbc1) into branch-0.17 (5336301) will decrease coverage by 0.01%.
The diff coverage is 80.00%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.17    #6483      +/-   ##
===============================================
- Coverage        81.96%   81.95%   -0.02%     
===============================================
  Files               96       96              
  Lines            16177    16294     +117     
===============================================
+ Hits             13260    13353      +93     
- Misses            2917     2941      +24     
Impacted Files Coverage Δ
python/cudf/cudf/core/dataframe.py 90.77% <80.00%> (-0.38%) ⬇️
python/cudf/cudf/io/json.py 96.55% <0.00%> (-3.45%) ⬇️
python/cudf/cudf/io/avro.py 78.57% <0.00%> (-3.25%) ⬇️
python/cudf/cudf/io/csv.py 94.00% <0.00%> (-1.75%) ⬇️
python/cudf/cudf/utils/ioutils.py 78.71% <0.00%> (-1.18%) ⬇️
python/cudf/cudf/io/orc.py 88.40% <0.00%> (-0.99%) ⬇️
python/cudf/cudf/utils/dtypes.py 89.10% <0.00%> (+0.38%) ⬆️
python/cudf/cudf/io/parquet.py 92.12% <0.00%> (+0.45%) ⬆️
python/cudf/cudf/utils/utils.py 85.35% <0.00%> (+0.67%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5336301...5a8fbc1. Read the comment docs.

@skirui-source skirui-source requested a review from isVoid October 12, 2020 22:05
@isVoid
Copy link
Contributor

isVoid commented Oct 12, 2020

As we discussed, pd.agg does type promotions to the aggregated results:

>>> pdf = pd.DataFrame({"a":[1,2,3], "b":[3.0, 4.0, 5.0], "c":[True, True, False]})
>>> pdf.agg("sum")
a     6.0
b    12.0
c     2.0
dtype: float64

>>> pdf = pd.DataFrame({"a":[1,2,3], "b":[3, 4, 5], "c":[True, True, False]})
>>> pdf.agg("sum")
a     6
b    12
c     2
dtype: int64

I would suggest you use np.find_common_type() to find out about the promoted dtypes. And later when #6415 merges you could switch to cudf.utils.dtype.find_common_type

@kkraus14 kkraus14 added 2 - In Progress Currently a work in progress Python Affects Python cuDF API. labels Oct 13, 2020
@kkraus14 kkraus14 changed the base branch from branch-0.16 to branch-0.17 October 13, 2020 02:01
@kkraus14
Copy link
Collaborator

FYI I changed the branch to 0.17 so you'll likely need to merge in upstream

@skirui-source skirui-source changed the title [WIP] Create agg() function for dataframes [REVIEW] Create agg() function for dataframes Oct 21, 2020
@isVoid isVoid added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Oct 22, 2020
python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/dataframe.py Show resolved Hide resolved
python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/dataframe.py Show resolved Hide resolved
python/cudf/cudf/tests/test_dataframe.py Outdated Show resolved Hide resolved
python/cudf/cudf/tests/test_dataframe.py Show resolved Hide resolved
python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved
@kkraus14 kkraus14 added 5 - Ready to Merge Testing and reviews complete, ready to merge 6 - Okay to Auto-Merge and removed 3 - Ready for Review Ready for review by team labels Dec 1, 2020
@isVoid isVoid added the non-breaking Non-breaking change label Dec 1, 2020
@harrism
Copy link
Member

harrism commented Dec 2, 2020

I think this needs "feature", "improvement", or "bug" label (and to pass CI).

@isVoid isVoid added the feature request New feature or request label Dec 3, 2020
@kkraus14
Copy link
Collaborator

kkraus14 commented Dec 3, 2020

@kkraus14 kkraus14 added 0 - Waiting on Author Waiting for author to respond to review and removed 5 - Ready to Merge Testing and reviews complete, ready to merge 6 - Okay to Auto-Merge labels Dec 3, 2020
python/cudf/cudf/tests/test_dataframe.py Outdated Show resolved Hide resolved
python/cudf/cudf/tests/test_dataframe.py Outdated Show resolved Hide resolved
python/cudf/cudf/tests/test_dataframe.py Outdated Show resolved Hide resolved
python/cudf/cudf/tests/test_dataframe.py Outdated Show resolved Hide resolved
@kkraus14 kkraus14 added 5 - Ready to Merge Testing and reviews complete, ready to merge 6 - Okay to Auto-Merge and removed 0 - Waiting on Author Waiting for author to respond to review labels Dec 3, 2020
@rgsl888prabhu
Copy link
Contributor

@skirui-source test cases are failing

@rapids-bot rapids-bot bot merged commit 30bbb39 into rapidsai:branch-0.17 Dec 4, 2020
@skirui-source skirui-source self-assigned this Mar 26, 2021
@skirui-source skirui-source deleted the aggfordataframe branch May 6, 2021 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge feature request New feature or request non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] agg() functions for dataframes
8 participants