Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use list of columns for methods in Groupby.pyx #10419

Merged

Conversation

isVoid
Copy link
Contributor

@isVoid isVoid commented Mar 11, 2022

Part of #10153

This PR changes the APIs in groupby.pyx to accept a list of columns as input, not a Frame. This change affects both keys and values. The Groupby object now only stores a list of columns in the keys attribute and other APIs (groups, aggregate, shift, replace_nulls) now only accept a list of columns as its value columns. The aggregation communication protocol has changed from a dictionary mapping column names to list of agg names to a list of list of agg names. See changes in _normalize_aggs for detail.
This PR also tries to simplify post-processing of result frame in agg method now that we have a finer control in pure python.

I gave an attempt to rewrite aggregate_internal and scan_internal but ended up in futile because the unified aggregation object is a cdef type and precludes separating the aggregation filtering step outside of it's current place. Besides, I tried unifying aggregation and scan with cython fused type but didn't make it due to limitation of using fused type with c++ templated type in cython.

Overall, the performance of agg call is on par with main branch. With -3%-13% performance diff depending on agg types.

Raw Benchmark
========================================================================== 36 passed in 33.48s ==========================================================================
(rapids) rapids@compose:~/scratch/cudf_benchmarks$ ./compare.sh bench_groupby.py 

--------------------------------------------------------------- benchmark 'False-False-agg1-100': 2 tests ---------------------------------------------------------------
Name (time in ms)                               Min               Max              Mean            StdDev            Median               IQR            Outliers  Rounds
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[False-False-agg1-100] (afte)     2.5090 (1.0)      2.8418 (1.0)      2.5280 (1.0)      0.0290 (2.40)     2.5229 (1.0)      0.0103 (1.05)        15;19     273
groupby_agg[False-False-agg1-100] (befo)     2.7681 (1.10)     2.8441 (1.00)     2.7877 (1.10)     0.0121 (1.0)      2.7849 (1.10)     0.0098 (1.0)         60;26     252
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------- benchmark 'False-False-agg1-10000': 2 tests ---------------------------------------------------------------
Name (time in ms)                                 Min               Max              Mean            StdDev            Median               IQR            Outliers  Rounds
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[False-False-agg1-10000] (afte)     2.7803 (1.0)      3.4156 (1.05)     2.8131 (1.0)      0.0548 (1.57)     2.8007 (1.0)      0.0253 (1.0)         10;12     252
groupby_agg[False-False-agg1-10000] (befo)     3.0402 (1.09)     3.2407 (1.0)      3.1571 (1.12)     0.0348 (1.0)      3.1535 (1.13)     0.0509 (2.01)         39;6     236
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------- benchmark 'False-False-agg1-1000000': 2 tests -----------------------------------------------------------------
Name (time in ms)                                    Min                Max               Mean            StdDev             Median               IQR            Outliers  Rounds
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[False-False-agg1-1000000] (afte)     13.2601 (1.0)      14.0128 (1.01)     13.4242 (1.0)      0.1056 (1.28)     13.4004 (1.0)      0.0286 (1.0)           5;8      68
groupby_agg[False-False-agg1-1000000] (befo)     13.5150 (1.02)     13.9165 (1.0)      13.6015 (1.01)     0.0826 (1.0)      13.5944 (1.01)     0.0696 (2.43)          8;5      66
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------- benchmark 'False-False-agg2-100': 2 tests ---------------------------------------------------------------
Name (time in ms)                               Min               Max              Mean            StdDev            Median               IQR            Outliers  Rounds
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[False-False-agg2-100] (afte)     2.5342 (1.0)      2.8621 (1.0)      2.5591 (1.0)      0.0431 (3.18)     2.5509 (1.0)      0.0106 (1.01)        13;18     273
groupby_agg[False-False-agg2-100] (befo)     2.8797 (1.14)     2.9507 (1.03)     2.8997 (1.13)     0.0136 (1.0)      2.8965 (1.14)     0.0105 (1.0)         52;28     227
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------- benchmark 'False-False-agg2-10000': 2 tests ---------------------------------------------------------------
Name (time in ms)                                 Min               Max              Mean            StdDev            Median               IQR            Outliers  Rounds
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[False-False-agg2-10000] (afte)     2.7922 (1.0)      3.2884 (1.0)      2.8205 (1.0)      0.0473 (1.40)     2.8118 (1.0)      0.0096 (1.0)         10;18     251
groupby_agg[False-False-agg2-10000] (befo)     3.1491 (1.13)     3.4791 (1.06)     3.1752 (1.13)     0.0338 (1.0)      3.1687 (1.13)     0.0108 (1.12)         6;17     172
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------- benchmark 'False-False-agg2-1000000': 2 tests -----------------------------------------------------------------
Name (time in ms)                                    Min                Max               Mean            StdDev             Median               IQR            Outliers  Rounds
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[False-False-agg2-1000000] (afte)     13.4699 (1.0)      14.6287 (1.0)      13.6020 (1.0)      0.1359 (1.0)      13.5769 (1.0)      0.0270 (1.0)           3;8      69
groupby_agg[False-False-agg2-1000000] (befo)     13.6079 (1.01)     29.8318 (2.04)     14.0777 (1.03)     1.9806 (14.57)    13.7795 (1.01)     0.0567 (2.10)          2;6      68
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------- benchmark 'False-False-sum-100': 2 tests ---------------------------------------------------------------
Name (time in ms)                              Min               Max              Mean            StdDev            Median               IQR            Outliers  Rounds
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[False-False-sum-100] (afte)     2.1667 (1.0)      2.2855 (1.0)      2.1831 (1.0)      0.0146 (1.49)     2.1802 (1.0)      0.0111 (1.14)        25;16     301
groupby_agg[False-False-sum-100] (befo)     2.4142 (1.11)     2.4782 (1.08)     2.4319 (1.11)     0.0098 (1.0)      2.4309 (1.11)     0.0097 (1.0)         65;15     278
------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------- benchmark 'False-False-sum-10000': 2 tests ---------------------------------------------------------------
Name (time in ms)                                Min               Max              Mean            StdDev            Median               IQR            Outliers  Rounds
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[False-False-sum-10000] (afte)     2.4293 (1.0)      2.6593 (1.0)      2.4493 (1.0)      0.0206 (1.66)     2.4455 (1.0)      0.0115 (1.10)        17;19     278
groupby_agg[False-False-sum-10000] (befo)     2.6646 (1.10)     2.7706 (1.04)     2.6832 (1.10)     0.0124 (1.0)      2.6811 (1.10)     0.0105 (1.0)         49;14     257
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------- benchmark 'False-False-sum-1000000': 2 tests ---------------------------------------------------------------
Name (time in ms)                                  Min                Max              Mean            StdDev            Median               IQR            Outliers  Rounds
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[False-False-sum-1000000] (afte)     9.3678 (1.0)      21.0480 (2.07)     9.6817 (1.0)      1.2252 (16.49)    9.5286 (1.0)      0.0342 (1.28)          1;9      89
groupby_agg[False-False-sum-1000000] (befo)     9.6830 (1.03)     10.1832 (1.0)      9.7434 (1.01)     0.0743 (1.0)      9.7238 (1.02)     0.0266 (1.0)           6;6      86
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------- benchmark 'False-True-agg1-100': 2 tests ---------------------------------------------------------------
Name (time in ms)                              Min               Max              Mean            StdDev            Median               IQR            Outliers  Rounds
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[False-True-agg1-100] (afte)     2.4392 (1.0)      2.7474 (1.06)     2.4598 (1.0)      0.0287 (2.07)     2.4545 (1.0)      0.0103 (1.0)         10;17     278
groupby_agg[False-True-agg1-100] (befo)     2.5183 (1.03)     2.6017 (1.0)      2.5354 (1.03)     0.0139 (1.0)      2.5332 (1.03)     0.0134 (1.30)        51;18     268
------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------- benchmark 'False-True-agg1-10000': 2 tests ---------------------------------------------------------------
Name (time in ms)                                Min               Max              Mean            StdDev            Median               IQR            Outliers  Rounds
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[False-True-agg1-10000] (afte)     2.7196 (1.0)      3.2290 (1.06)     2.7446 (1.0)      0.0462 (2.17)     2.7359 (1.0)      0.0106 (1.00)        11;17     257
groupby_agg[False-True-agg1-10000] (befo)     2.7807 (1.02)     3.0590 (1.0)      2.8039 (1.02)     0.0213 (1.0)      2.8004 (1.02)     0.0106 (1.0)         16;18     251
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------- benchmark 'False-True-agg1-1000000': 2 tests -----------------------------------------------------------------
Name (time in ms)                                   Min                Max               Mean            StdDev             Median               IQR            Outliers  Rounds
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[False-True-agg1-1000000] (afte)     13.2259 (1.01)     13.7344 (1.0)      13.3449 (1.00)     0.0797 (1.0)      13.3288 (1.00)     0.0322 (1.41)          5;8      69
groupby_agg[False-True-agg1-1000000] (befo)     13.0875 (1.0)      14.1552 (1.03)     13.3135 (1.0)      0.1325 (1.66)     13.2901 (1.0)      0.0229 (1.0)           4;7      68
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------- benchmark 'False-True-agg2-100': 2 tests ---------------------------------------------------------------
Name (time in ms)                              Min               Max              Mean            StdDev            Median               IQR            Outliers  Rounds
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[False-True-agg2-100] (afte)     2.4580 (1.0)      2.5791 (1.0)      2.4792 (1.0)      0.0174 (1.92)     2.4756 (1.0)      0.0121 (1.37)        21;14     277
groupby_agg[False-True-agg2-100] (befo)     2.6094 (1.06)     2.6686 (1.03)     2.6260 (1.06)     0.0091 (1.0)      2.6255 (1.06)     0.0088 (1.0)         66;21     264
------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------- benchmark 'False-True-agg2-10000': 2 tests ---------------------------------------------------------------
Name (time in ms)                                Min               Max              Mean            StdDev            Median               IQR            Outliers  Rounds
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[False-True-agg2-10000] (afte)     2.7218 (1.0)      2.8843 (1.0)      2.7415 (1.0)      0.0180 (1.0)      2.7383 (1.0)      0.0116 (1.12)        21;16     257
groupby_agg[False-True-agg2-10000] (befo)     2.8771 (1.06)     3.1227 (1.08)     2.8956 (1.06)     0.0185 (1.03)     2.8922 (1.06)     0.0104 (1.0)         16;16     244
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------- benchmark 'False-True-agg2-1000000': 2 tests -----------------------------------------------------------------
Name (time in ms)                                   Min                Max               Mean            StdDev             Median               IQR            Outliers  Rounds
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[False-True-agg2-1000000] (afte)     13.4555 (1.01)     13.7924 (1.0)      13.5244 (1.00)     0.0601 (1.0)      13.5099 (1.00)     0.0362 (1.0)           7;6      70
groupby_agg[False-True-agg2-1000000] (befo)     13.3841 (1.0)      13.9437 (1.01)     13.4948 (1.0)      0.0773 (1.29)     13.4768 (1.0)      0.0443 (1.22)          5;5      68
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------- benchmark 'False-True-sum-100': 2 tests ---------------------------------------------------------------
Name (time in ms)                             Min               Max              Mean            StdDev            Median               IQR            Outliers  Rounds
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[False-True-sum-100] (afte)     2.1270 (1.0)      2.2397 (1.0)      2.1435 (1.0)      0.0158 (1.01)     2.1407 (1.0)      0.0105 (1.0)         27;22     302
groupby_agg[False-True-sum-100] (befo)     2.1881 (1.03)     2.3309 (1.04)     2.2048 (1.03)     0.0156 (1.0)      2.2014 (1.03)     0.0111 (1.06)        35;30     297
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------- benchmark 'False-True-sum-10000': 2 tests ---------------------------------------------------------------
Name (time in ms)                               Min               Max              Mean            StdDev            Median               IQR            Outliers  Rounds
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[False-True-sum-10000] (afte)     2.4018 (1.0)      2.6107 (1.0)      2.4183 (1.0)      0.0198 (1.16)     2.4149 (1.0)      0.0108 (1.12)        14;14     277
groupby_agg[False-True-sum-10000] (befo)     2.4406 (1.02)     2.6840 (1.03)     2.4606 (1.02)     0.0170 (1.0)      2.4585 (1.02)     0.0097 (1.0)         15;14     274
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------- benchmark 'False-True-sum-1000000': 2 tests ----------------------------------------------------------------
Name (time in ms)                                 Min                Max              Mean            StdDev            Median               IQR            Outliers  Rounds
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[False-True-sum-1000000] (afte)     9.4459 (1.01)     10.0397 (1.0)      9.4983 (1.0)      0.0706 (1.0)      9.4846 (1.0)      0.0216 (1.0)           4;6      89
groupby_agg[False-True-sum-1000000] (befo)     9.3064 (1.0)      10.2732 (1.02)     9.5150 (1.00)     0.1107 (1.57)     9.4933 (1.00)     0.0239 (1.10)         6;10      88
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------- benchmark 'True-False-agg1-100': 2 tests ---------------------------------------------------------------
Name (time in ms)                              Min                Max              Mean            StdDev            Median               IQR            Outliers  Rounds
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[True-False-agg1-100] (afte)     4.3327 (1.0)       4.4800 (1.0)      4.3504 (1.0)      0.0202 (1.0)      4.3457 (1.0)      0.0103 (1.0)         10;16     181
groupby_agg[True-False-agg1-100] (befo)     4.6486 (1.07)     12.4651 (2.78)     4.8006 (1.10)     0.7100 (35.18)    4.6664 (1.07)     0.0191 (1.86)        10;19     170
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------- benchmark 'True-False-agg1-10000': 2 tests ---------------------------------------------------------------
Name (time in ms)                                Min               Max              Mean            StdDev            Median               IQR            Outliers  Rounds
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[True-False-agg1-10000] (afte)     4.9246 (1.0)      5.1165 (1.0)      4.9491 (1.0)      0.0269 (1.0)      4.9407 (1.0)      0.0133 (1.06)        16;19     164
groupby_agg[True-False-agg1-10000] (befo)     5.2464 (1.07)     5.6002 (1.09)     5.2700 (1.06)     0.0370 (1.38)     5.2623 (1.07)     0.0126 (1.0)         10;17     154
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------- benchmark 'True-False-agg1-1000000': 2 tests -----------------------------------------------------------------
Name (time in ms)                                   Min                Max               Mean            StdDev             Median               IQR            Outliers  Rounds
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[True-False-agg1-1000000] (afte)     36.5089 (1.00)     37.2874 (1.0)      36.8305 (1.0)      0.2321 (1.0)      36.7404 (1.0)      0.2208 (1.0)           7;5      28
groupby_agg[True-False-agg1-1000000] (befo)     36.3558 (1.0)      47.0329 (1.26)     37.7670 (1.03)     2.7313 (11.77)    36.8183 (1.00)     0.8527 (3.86)          2;3      26
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------- benchmark 'True-False-agg2-100': 2 tests ---------------------------------------------------------------
Name (time in ms)                              Min               Max              Mean            StdDev            Median               IQR            Outliers  Rounds
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[True-False-agg2-100] (afte)     4.6287 (1.0)      5.2921 (1.02)     4.6918 (1.0)      0.1017 (4.64)     4.6526 (1.0)      0.0496 (3.27)        21;23     167
groupby_agg[True-False-agg2-100] (befo)     4.9776 (1.08)     5.1737 (1.0)      5.0060 (1.07)     0.0219 (1.0)      4.9995 (1.07)     0.0152 (1.0)         18;10     161
------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------- benchmark 'True-False-agg2-10000': 2 tests ---------------------------------------------------------------
Name (time in ms)                                Min               Max              Mean            StdDev            Median               IQR            Outliers  Rounds
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[True-False-agg2-10000] (afte)     5.2022 (1.0)      6.7622 (1.16)     5.2405 (1.0)      0.1267 (2.98)     5.2219 (1.0)      0.0157 (1.0)          2;16     155
groupby_agg[True-False-agg2-10000] (befo)     5.5802 (1.07)     5.8531 (1.0)      5.6166 (1.07)     0.0424 (1.0)      5.6041 (1.07)     0.0206 (1.31)        11;14     147
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------- benchmark 'True-False-agg2-1000000': 2 tests -----------------------------------------------------------------
Name (time in ms)                                   Min                Max               Mean            StdDev             Median               IQR            Outliers  Rounds
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[True-False-agg2-1000000] (afte)     37.9639 (1.0)      38.7598 (1.0)      38.2381 (1.0)      0.1221 (1.0)      38.2346 (1.00)     0.0583 (1.0)           2;2      27
groupby_agg[True-False-agg2-1000000] (befo)     38.0569 (1.00)     41.5735 (1.07)     38.7983 (1.01)     1.1968 (9.80)     38.1696 (1.0)      0.6344 (10.88)         5;5      26
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------- benchmark 'True-False-sum-100': 2 tests ---------------------------------------------------------------
Name (time in ms)                             Min               Max              Mean            StdDev            Median               IQR            Outliers  Rounds
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[True-False-sum-100] (afte)     3.6893 (1.0)      4.2792 (1.03)     3.7130 (1.0)      0.0580 (4.15)     3.7022 (1.0)      0.0079 (1.0)         10;16     206
groupby_agg[True-False-sum-100] (befo)     4.0016 (1.08)     4.1370 (1.0)      4.0218 (1.08)     0.0140 (1.0)      4.0180 (1.09)     0.0097 (1.23)        27;17     188
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------- benchmark 'True-False-sum-10000': 2 tests ---------------------------------------------------------------
Name (time in ms)                               Min               Max              Mean            StdDev            Median               IQR            Outliers  Rounds
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[True-False-sum-10000] (afte)     4.2660 (1.0)      4.6651 (1.0)      4.2913 (1.0)      0.0493 (2.97)     4.2799 (1.0)      0.0097 (1.0)         10;21     185
groupby_agg[True-False-sum-10000] (befo)     4.5702 (1.07)     4.7321 (1.01)     4.5904 (1.07)     0.0166 (1.0)      4.5858 (1.07)     0.0134 (1.37)         24;8     172
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------- benchmark 'True-False-sum-1000000': 2 tests -----------------------------------------------------------------
Name (time in ms)                                  Min                Max               Mean            StdDev             Median               IQR            Outliers  Rounds
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[True-False-sum-1000000] (afte)     30.5871 (1.00)     30.9527 (1.0)      30.6797 (1.00)     0.0628 (1.0)      30.6720 (1.00)     0.0421 (1.0)           4;3      32
groupby_agg[True-False-sum-1000000] (befo)     30.5386 (1.0)      31.8930 (1.03)     30.6654 (1.0)      0.2383 (3.80)     30.6013 (1.0)      0.0573 (1.36)          1;4      31
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------- benchmark 'True-True-agg1-100': 2 tests ---------------------------------------------------------------
Name (time in ms)                             Min               Max              Mean            StdDev            Median               IQR            Outliers  Rounds
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[True-True-agg1-100] (afte)     4.2812 (1.0)      4.5815 (1.0)      4.3304 (1.0)      0.0495 (1.43)     4.3134 (1.0)      0.0647 (4.80)         22;4     173
groupby_agg[True-True-agg1-100] (befo)     4.4126 (1.03)     4.7356 (1.03)     4.4357 (1.02)     0.0348 (1.0)      4.4253 (1.03)     0.0135 (1.0)         14;18     158
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------- benchmark 'True-True-agg1-10000': 2 tests ---------------------------------------------------------------
Name (time in ms)                               Min               Max              Mean            StdDev            Median               IQR            Outliers  Rounds
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[True-True-agg1-10000] (afte)     4.8505 (1.0)      5.3411 (1.0)      4.8882 (1.0)      0.0596 (1.49)     4.8693 (1.0)      0.0240 (1.41)        12;15     166
groupby_agg[True-True-agg1-10000] (befo)     4.9857 (1.03)     5.3869 (1.01)     5.0191 (1.03)     0.0399 (1.0)      5.0089 (1.03)     0.0170 (1.0)          9;15     160
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------- benchmark 'True-True-agg1-1000000': 2 tests -----------------------------------------------------------------
Name (time in ms)                                  Min                Max               Mean            StdDev             Median               IQR            Outliers  Rounds
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[True-True-agg1-1000000] (afte)     36.5387 (1.01)     55.8017 (1.52)     37.3622 (1.03)     3.6965 (48.22)    36.5756 (1.00)     0.0882 (2.75)          1;3      27
groupby_agg[True-True-agg1-1000000] (befo)     36.3456 (1.0)      36.7584 (1.0)      36.4209 (1.0)      0.0767 (1.0)      36.4014 (1.0)      0.0320 (1.0)           1;4      27
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------- benchmark 'True-True-agg2-100': 2 tests ---------------------------------------------------------------
Name (time in ms)                             Min               Max              Mean            StdDev            Median               IQR            Outliers  Rounds
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[True-True-agg2-100] (afte)     4.5713 (1.0)      5.1548 (1.06)     4.6064 (1.0)      0.0621 (4.49)     4.5886 (1.0)      0.0203 (1.51)        13;22     170
groupby_agg[True-True-agg2-100] (befo)     4.7628 (1.04)     4.8752 (1.0)      4.7832 (1.04)     0.0138 (1.0)      4.7795 (1.04)     0.0134 (1.0)          29;9     167
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------- benchmark 'True-True-agg2-10000': 2 tests ---------------------------------------------------------------
Name (time in ms)                               Min               Max              Mean            StdDev            Median               IQR            Outliers  Rounds
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[True-True-agg2-10000] (afte)     5.1343 (1.0)      5.4159 (1.0)      5.1769 (1.0)      0.0517 (1.36)     5.1590 (1.0)      0.0179 (1.21)        16;22     157
groupby_agg[True-True-agg2-10000] (befo)     5.3567 (1.04)     5.6432 (1.04)     5.3858 (1.04)     0.0379 (1.0)      5.3785 (1.04)     0.0147 (1.0)          7;12     152
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------- benchmark 'True-True-agg2-1000000': 2 tests -----------------------------------------------------------------
Name (time in ms)                                  Min                Max               Mean            StdDev             Median               IQR            Outliers  Rounds
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[True-True-agg2-1000000] (afte)     38.0357 (1.00)     38.2935 (1.00)     38.1159 (1.00)     0.0597 (1.0)      38.1014 (1.00)     0.0846 (1.0)           6;1      27
groupby_agg[True-True-agg2-1000000] (befo)     37.9134 (1.0)      38.2851 (1.0)      38.0201 (1.0)      0.0929 (1.55)     37.9944 (1.0)      0.1066 (1.26)          7;1      26
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------- benchmark 'True-True-sum-100': 2 tests ---------------------------------------------------------------
Name (time in ms)                            Min               Max              Mean            StdDev            Median               IQR            Outliers  Rounds
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[True-True-sum-100] (afte)     3.7452 (1.0)      4.0287 (1.0)      3.8009 (1.0)      0.0408 (1.0)      3.7968 (1.0)      0.0503 (1.0)          29;3     131
groupby_agg[True-True-sum-100] (befo)     3.8752 (1.03)     4.4384 (1.10)     3.9316 (1.03)     0.0608 (1.49)     3.9265 (1.03)     0.0504 (1.00)          4;3     148
----------------------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------- benchmark 'True-True-sum-10000': 2 tests ---------------------------------------------------------------
Name (time in ms)                              Min                Max              Mean            StdDev            Median               IQR            Outliers  Rounds
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[True-True-sum-10000] (afte)     4.4442 (1.0)      11.3511 (2.35)     4.5582 (1.0)      0.5829 (24.78)    4.4741 (1.0)      0.0323 (2.85)         3;19     171
groupby_agg[True-True-sum-10000] (befo)     4.5676 (1.03)      4.8264 (1.0)      4.5913 (1.01)     0.0235 (1.0)      4.5871 (1.03)     0.0114 (1.0)         15;16     168
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------- benchmark 'True-True-sum-1000000': 2 tests -----------------------------------------------------------------
Name (time in ms)                                 Min                Max               Mean            StdDev             Median               IQR            Outliers  Rounds
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groupby_agg[True-True-sum-1000000] (afte)     30.5326 (1.00)     33.6395 (1.02)     31.2355 (1.0)      0.9563 (1.20)     30.6933 (1.0)      0.9663 (1.0)           5;3      30
groupby_agg[True-True-sum-1000000] (befo)     30.4080 (1.0)      33.0341 (1.0)      31.2527 (1.00)     0.7946 (1.0)      30.9808 (1.01)     1.2781 (1.32)         11;0      30
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Benchmark code

@isVoid isVoid requested a review from a team as a code owner March 11, 2022 21:03
@isVoid isVoid requested review from bdice and rgsl888prabhu March 11, 2022 21:03
@github-actions github-actions bot added the Python Affects Python cuDF API. label Mar 11, 2022
@isVoid isVoid requested review from vyasr and removed request for rgsl888prabhu March 11, 2022 21:03
@isVoid isVoid added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 11, 2022
@isVoid isVoid self-assigned this Mar 11, 2022
python/cudf/cudf/_lib/groupby.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/groupby.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/groupby.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/groupby.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/groupby.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/core/groupby/groupby.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/groupby/groupby.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/groupby/groupby.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/groupby/groupby.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/groupby/groupby.py Show resolved Hide resolved
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this PR last week and then forgot to click "Submit." 🤦‍♂️ Sorry about that @isVoid. Luckily, @vyasr covered nearly all my unsubmitted suggestions in his review. Ping me once you've applied those changes and I can take a second pass of review.

python/cudf/cudf/core/groupby/groupby.py Outdated Show resolved Hide resolved
@vyasr vyasr requested review from bdice and vyasr March 16, 2022 22:29
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we're not quite there yet, but so close!

python/cudf/cudf/_lib/groupby.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/groupby.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/groupby.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/_typing.py Show resolved Hide resolved
python/cudf/cudf/core/groupby/groupby.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/groupby/groupby.py Show resolved Hide resolved
python/cudf/cudf/core/groupby/groupby.py Show resolved Hide resolved
@isVoid isVoid requested a review from vyasr March 17, 2022 17:12
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assume you address my one outstanding comment this LGTM!

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion, and one question/suggestion-if-applicable. Overall this is ready to go, so I'll approve.

python/cudf/cudf/_lib/groupby.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/core/groupby/groupby.py Show resolved Hide resolved
@isVoid
Copy link
Contributor Author

isVoid commented Mar 18, 2022

rerun tests

@bdice bdice added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Mar 18, 2022
@bdice
Copy link
Contributor

bdice commented Mar 18, 2022

@isVoid The failed CI issue should be solved by rapidsai/rapids-cmake#168. I merged in branch-22.04, and this should be good to go once CI passes.

@isVoid
Copy link
Contributor Author

isVoid commented Mar 18, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 21ed251 into rapidsai:branch-22.04 Mar 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants