Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Use list of columns for methods in
Groupby.pyx
(#10419)
Part of #10153 This PR changes the APIs in `groupby.pyx` to accept a list of columns as input, not a Frame. This change affects both keys and values. The `Groupby` object now only stores a list of columns in the `keys` attribute and other APIs (`groups`, `aggregate`, `shift`, `replace_nulls`) now only accept a list of columns as its value columns. The `aggregation` communication protocol has changed from a dictionary mapping column names to list of agg names to a list of list of agg names. See changes in `_normalize_aggs` for detail. This PR also tries to simplify post-processing of `result` frame in `agg` method now that we have a finer control in pure python. I gave an attempt to rewrite `aggregate_internal` and `scan_internal` but ended up in futile because the unified aggregation object is a cdef type and precludes separating the aggregation filtering step outside of it's current place. Besides, I tried unifying aggregation and scan with cython fused type but didn't make it due to limitation of using fused type with c++ templated type in cython. Overall, the performance of `agg` call is on par with main branch. With -3%-13% performance diff depending on agg types. <details> <summary>Raw Benchmark</summary> ``` ========================================================================== 36 passed in 33.48s ========================================================================== (rapids) rapids@compose:~/scratch/cudf_benchmarks$ ./compare.sh bench_groupby.py --------------------------------------------------------------- benchmark 'False-False-agg1-100': 2 tests --------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[False-False-agg1-100] (afte) 2.5090 (1.0) 2.8418 (1.0) 2.5280 (1.0) 0.0290 (2.40) 2.5229 (1.0) 0.0103 (1.05) 15;19 273 groupby_agg[False-False-agg1-100] (befo) 2.7681 (1.10) 2.8441 (1.00) 2.7877 (1.10) 0.0121 (1.0) 2.7849 (1.10) 0.0098 (1.0) 60;26 252 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------- benchmark 'False-False-agg1-10000': 2 tests --------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[False-False-agg1-10000] (afte) 2.7803 (1.0) 3.4156 (1.05) 2.8131 (1.0) 0.0548 (1.57) 2.8007 (1.0) 0.0253 (1.0) 10;12 252 groupby_agg[False-False-agg1-10000] (befo) 3.0402 (1.09) 3.2407 (1.0) 3.1571 (1.12) 0.0348 (1.0) 3.1535 (1.13) 0.0509 (2.01) 39;6 236 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------- benchmark 'False-False-agg1-1000000': 2 tests ----------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[False-False-agg1-1000000] (afte) 13.2601 (1.0) 14.0128 (1.01) 13.4242 (1.0) 0.1056 (1.28) 13.4004 (1.0) 0.0286 (1.0) 5;8 68 groupby_agg[False-False-agg1-1000000] (befo) 13.5150 (1.02) 13.9165 (1.0) 13.6015 (1.01) 0.0826 (1.0) 13.5944 (1.01) 0.0696 (2.43) 8;5 66 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------- benchmark 'False-False-agg2-100': 2 tests --------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[False-False-agg2-100] (afte) 2.5342 (1.0) 2.8621 (1.0) 2.5591 (1.0) 0.0431 (3.18) 2.5509 (1.0) 0.0106 (1.01) 13;18 273 groupby_agg[False-False-agg2-100] (befo) 2.8797 (1.14) 2.9507 (1.03) 2.8997 (1.13) 0.0136 (1.0) 2.8965 (1.14) 0.0105 (1.0) 52;28 227 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------- benchmark 'False-False-agg2-10000': 2 tests --------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[False-False-agg2-10000] (afte) 2.7922 (1.0) 3.2884 (1.0) 2.8205 (1.0) 0.0473 (1.40) 2.8118 (1.0) 0.0096 (1.0) 10;18 251 groupby_agg[False-False-agg2-10000] (befo) 3.1491 (1.13) 3.4791 (1.06) 3.1752 (1.13) 0.0338 (1.0) 3.1687 (1.13) 0.0108 (1.12) 6;17 172 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------- benchmark 'False-False-agg2-1000000': 2 tests ----------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[False-False-agg2-1000000] (afte) 13.4699 (1.0) 14.6287 (1.0) 13.6020 (1.0) 0.1359 (1.0) 13.5769 (1.0) 0.0270 (1.0) 3;8 69 groupby_agg[False-False-agg2-1000000] (befo) 13.6079 (1.01) 29.8318 (2.04) 14.0777 (1.03) 1.9806 (14.57) 13.7795 (1.01) 0.0567 (2.10) 2;6 68 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------- benchmark 'False-False-sum-100': 2 tests --------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ groupby_agg[False-False-sum-100] (afte) 2.1667 (1.0) 2.2855 (1.0) 2.1831 (1.0) 0.0146 (1.49) 2.1802 (1.0) 0.0111 (1.14) 25;16 301 groupby_agg[False-False-sum-100] (befo) 2.4142 (1.11) 2.4782 (1.08) 2.4319 (1.11) 0.0098 (1.0) 2.4309 (1.11) 0.0097 (1.0) 65;15 278 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------------------- benchmark 'False-False-sum-10000': 2 tests --------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[False-False-sum-10000] (afte) 2.4293 (1.0) 2.6593 (1.0) 2.4493 (1.0) 0.0206 (1.66) 2.4455 (1.0) 0.0115 (1.10) 17;19 278 groupby_agg[False-False-sum-10000] (befo) 2.6646 (1.10) 2.7706 (1.04) 2.6832 (1.10) 0.0124 (1.0) 2.6811 (1.10) 0.0105 (1.0) 49;14 257 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------- benchmark 'False-False-sum-1000000': 2 tests --------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[False-False-sum-1000000] (afte) 9.3678 (1.0) 21.0480 (2.07) 9.6817 (1.0) 1.2252 (16.49) 9.5286 (1.0) 0.0342 (1.28) 1;9 89 groupby_agg[False-False-sum-1000000] (befo) 9.6830 (1.03) 10.1832 (1.0) 9.7434 (1.01) 0.0743 (1.0) 9.7238 (1.02) 0.0266 (1.0) 6;6 86 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------- benchmark 'False-True-agg1-100': 2 tests --------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ groupby_agg[False-True-agg1-100] (afte) 2.4392 (1.0) 2.7474 (1.06) 2.4598 (1.0) 0.0287 (2.07) 2.4545 (1.0) 0.0103 (1.0) 10;17 278 groupby_agg[False-True-agg1-100] (befo) 2.5183 (1.03) 2.6017 (1.0) 2.5354 (1.03) 0.0139 (1.0) 2.5332 (1.03) 0.0134 (1.30) 51;18 268 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------------------- benchmark 'False-True-agg1-10000': 2 tests --------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[False-True-agg1-10000] (afte) 2.7196 (1.0) 3.2290 (1.06) 2.7446 (1.0) 0.0462 (2.17) 2.7359 (1.0) 0.0106 (1.00) 11;17 257 groupby_agg[False-True-agg1-10000] (befo) 2.7807 (1.02) 3.0590 (1.0) 2.8039 (1.02) 0.0213 (1.0) 2.8004 (1.02) 0.0106 (1.0) 16;18 251 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------- benchmark 'False-True-agg1-1000000': 2 tests ----------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[False-True-agg1-1000000] (afte) 13.2259 (1.01) 13.7344 (1.0) 13.3449 (1.00) 0.0797 (1.0) 13.3288 (1.00) 0.0322 (1.41) 5;8 69 groupby_agg[False-True-agg1-1000000] (befo) 13.0875 (1.0) 14.1552 (1.03) 13.3135 (1.0) 0.1325 (1.66) 13.2901 (1.0) 0.0229 (1.0) 4;7 68 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------- benchmark 'False-True-agg2-100': 2 tests --------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ groupby_agg[False-True-agg2-100] (afte) 2.4580 (1.0) 2.5791 (1.0) 2.4792 (1.0) 0.0174 (1.92) 2.4756 (1.0) 0.0121 (1.37) 21;14 277 groupby_agg[False-True-agg2-100] (befo) 2.6094 (1.06) 2.6686 (1.03) 2.6260 (1.06) 0.0091 (1.0) 2.6255 (1.06) 0.0088 (1.0) 66;21 264 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------------------- benchmark 'False-True-agg2-10000': 2 tests --------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[False-True-agg2-10000] (afte) 2.7218 (1.0) 2.8843 (1.0) 2.7415 (1.0) 0.0180 (1.0) 2.7383 (1.0) 0.0116 (1.12) 21;16 257 groupby_agg[False-True-agg2-10000] (befo) 2.8771 (1.06) 3.1227 (1.08) 2.8956 (1.06) 0.0185 (1.03) 2.8922 (1.06) 0.0104 (1.0) 16;16 244 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------- benchmark 'False-True-agg2-1000000': 2 tests ----------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[False-True-agg2-1000000] (afte) 13.4555 (1.01) 13.7924 (1.0) 13.5244 (1.00) 0.0601 (1.0) 13.5099 (1.00) 0.0362 (1.0) 7;6 70 groupby_agg[False-True-agg2-1000000] (befo) 13.3841 (1.0) 13.9437 (1.01) 13.4948 (1.0) 0.0773 (1.29) 13.4768 (1.0) 0.0443 (1.22) 5;5 68 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------- benchmark 'False-True-sum-100': 2 tests --------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[False-True-sum-100] (afte) 2.1270 (1.0) 2.2397 (1.0) 2.1435 (1.0) 0.0158 (1.01) 2.1407 (1.0) 0.0105 (1.0) 27;22 302 groupby_agg[False-True-sum-100] (befo) 2.1881 (1.03) 2.3309 (1.04) 2.2048 (1.03) 0.0156 (1.0) 2.2014 (1.03) 0.0111 (1.06) 35;30 297 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------- benchmark 'False-True-sum-10000': 2 tests --------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[False-True-sum-10000] (afte) 2.4018 (1.0) 2.6107 (1.0) 2.4183 (1.0) 0.0198 (1.16) 2.4149 (1.0) 0.0108 (1.12) 14;14 277 groupby_agg[False-True-sum-10000] (befo) 2.4406 (1.02) 2.6840 (1.03) 2.4606 (1.02) 0.0170 (1.0) 2.4585 (1.02) 0.0097 (1.0) 15;14 274 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------- benchmark 'False-True-sum-1000000': 2 tests ---------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[False-True-sum-1000000] (afte) 9.4459 (1.01) 10.0397 (1.0) 9.4983 (1.0) 0.0706 (1.0) 9.4846 (1.0) 0.0216 (1.0) 4;6 89 groupby_agg[False-True-sum-1000000] (befo) 9.3064 (1.0) 10.2732 (1.02) 9.5150 (1.00) 0.1107 (1.57) 9.4933 (1.00) 0.0239 (1.10) 6;10 88 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------- benchmark 'True-False-agg1-100': 2 tests --------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[True-False-agg1-100] (afte) 4.3327 (1.0) 4.4800 (1.0) 4.3504 (1.0) 0.0202 (1.0) 4.3457 (1.0) 0.0103 (1.0) 10;16 181 groupby_agg[True-False-agg1-100] (befo) 4.6486 (1.07) 12.4651 (2.78) 4.8006 (1.10) 0.7100 (35.18) 4.6664 (1.07) 0.0191 (1.86) 10;19 170 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------- benchmark 'True-False-agg1-10000': 2 tests --------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[True-False-agg1-10000] (afte) 4.9246 (1.0) 5.1165 (1.0) 4.9491 (1.0) 0.0269 (1.0) 4.9407 (1.0) 0.0133 (1.06) 16;19 164 groupby_agg[True-False-agg1-10000] (befo) 5.2464 (1.07) 5.6002 (1.09) 5.2700 (1.06) 0.0370 (1.38) 5.2623 (1.07) 0.0126 (1.0) 10;17 154 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------- benchmark 'True-False-agg1-1000000': 2 tests ----------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[True-False-agg1-1000000] (afte) 36.5089 (1.00) 37.2874 (1.0) 36.8305 (1.0) 0.2321 (1.0) 36.7404 (1.0) 0.2208 (1.0) 7;5 28 groupby_agg[True-False-agg1-1000000] (befo) 36.3558 (1.0) 47.0329 (1.26) 37.7670 (1.03) 2.7313 (11.77) 36.8183 (1.00) 0.8527 (3.86) 2;3 26 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------- benchmark 'True-False-agg2-100': 2 tests --------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ groupby_agg[True-False-agg2-100] (afte) 4.6287 (1.0) 5.2921 (1.02) 4.6918 (1.0) 0.1017 (4.64) 4.6526 (1.0) 0.0496 (3.27) 21;23 167 groupby_agg[True-False-agg2-100] (befo) 4.9776 (1.08) 5.1737 (1.0) 5.0060 (1.07) 0.0219 (1.0) 4.9995 (1.07) 0.0152 (1.0) 18;10 161 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------------------- benchmark 'True-False-agg2-10000': 2 tests --------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[True-False-agg2-10000] (afte) 5.2022 (1.0) 6.7622 (1.16) 5.2405 (1.0) 0.1267 (2.98) 5.2219 (1.0) 0.0157 (1.0) 2;16 155 groupby_agg[True-False-agg2-10000] (befo) 5.5802 (1.07) 5.8531 (1.0) 5.6166 (1.07) 0.0424 (1.0) 5.6041 (1.07) 0.0206 (1.31) 11;14 147 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------- benchmark 'True-False-agg2-1000000': 2 tests ----------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[True-False-agg2-1000000] (afte) 37.9639 (1.0) 38.7598 (1.0) 38.2381 (1.0) 0.1221 (1.0) 38.2346 (1.00) 0.0583 (1.0) 2;2 27 groupby_agg[True-False-agg2-1000000] (befo) 38.0569 (1.00) 41.5735 (1.07) 38.7983 (1.01) 1.1968 (9.80) 38.1696 (1.0) 0.6344 (10.88) 5;5 26 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------- benchmark 'True-False-sum-100': 2 tests --------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[True-False-sum-100] (afte) 3.6893 (1.0) 4.2792 (1.03) 3.7130 (1.0) 0.0580 (4.15) 3.7022 (1.0) 0.0079 (1.0) 10;16 206 groupby_agg[True-False-sum-100] (befo) 4.0016 (1.08) 4.1370 (1.0) 4.0218 (1.08) 0.0140 (1.0) 4.0180 (1.09) 0.0097 (1.23) 27;17 188 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------- benchmark 'True-False-sum-10000': 2 tests --------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[True-False-sum-10000] (afte) 4.2660 (1.0) 4.6651 (1.0) 4.2913 (1.0) 0.0493 (2.97) 4.2799 (1.0) 0.0097 (1.0) 10;21 185 groupby_agg[True-False-sum-10000] (befo) 4.5702 (1.07) 4.7321 (1.01) 4.5904 (1.07) 0.0166 (1.0) 4.5858 (1.07) 0.0134 (1.37) 24;8 172 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------- benchmark 'True-False-sum-1000000': 2 tests ----------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[True-False-sum-1000000] (afte) 30.5871 (1.00) 30.9527 (1.0) 30.6797 (1.00) 0.0628 (1.0) 30.6720 (1.00) 0.0421 (1.0) 4;3 32 groupby_agg[True-False-sum-1000000] (befo) 30.5386 (1.0) 31.8930 (1.03) 30.6654 (1.0) 0.2383 (3.80) 30.6013 (1.0) 0.0573 (1.36) 1;4 31 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------- benchmark 'True-True-agg1-100': 2 tests --------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[True-True-agg1-100] (afte) 4.2812 (1.0) 4.5815 (1.0) 4.3304 (1.0) 0.0495 (1.43) 4.3134 (1.0) 0.0647 (4.80) 22;4 173 groupby_agg[True-True-agg1-100] (befo) 4.4126 (1.03) 4.7356 (1.03) 4.4357 (1.02) 0.0348 (1.0) 4.4253 (1.03) 0.0135 (1.0) 14;18 158 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------- benchmark 'True-True-agg1-10000': 2 tests --------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[True-True-agg1-10000] (afte) 4.8505 (1.0) 5.3411 (1.0) 4.8882 (1.0) 0.0596 (1.49) 4.8693 (1.0) 0.0240 (1.41) 12;15 166 groupby_agg[True-True-agg1-10000] (befo) 4.9857 (1.03) 5.3869 (1.01) 5.0191 (1.03) 0.0399 (1.0) 5.0089 (1.03) 0.0170 (1.0) 9;15 160 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------- benchmark 'True-True-agg1-1000000': 2 tests ----------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[True-True-agg1-1000000] (afte) 36.5387 (1.01) 55.8017 (1.52) 37.3622 (1.03) 3.6965 (48.22) 36.5756 (1.00) 0.0882 (2.75) 1;3 27 groupby_agg[True-True-agg1-1000000] (befo) 36.3456 (1.0) 36.7584 (1.0) 36.4209 (1.0) 0.0767 (1.0) 36.4014 (1.0) 0.0320 (1.0) 1;4 27 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------- benchmark 'True-True-agg2-100': 2 tests --------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[True-True-agg2-100] (afte) 4.5713 (1.0) 5.1548 (1.06) 4.6064 (1.0) 0.0621 (4.49) 4.5886 (1.0) 0.0203 (1.51) 13;22 170 groupby_agg[True-True-agg2-100] (befo) 4.7628 (1.04) 4.8752 (1.0) 4.7832 (1.04) 0.0138 (1.0) 4.7795 (1.04) 0.0134 (1.0) 29;9 167 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------- benchmark 'True-True-agg2-10000': 2 tests --------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[True-True-agg2-10000] (afte) 5.1343 (1.0) 5.4159 (1.0) 5.1769 (1.0) 0.0517 (1.36) 5.1590 (1.0) 0.0179 (1.21) 16;22 157 groupby_agg[True-True-agg2-10000] (befo) 5.3567 (1.04) 5.6432 (1.04) 5.3858 (1.04) 0.0379 (1.0) 5.3785 (1.04) 0.0147 (1.0) 7;12 152 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------- benchmark 'True-True-agg2-1000000': 2 tests ----------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[True-True-agg2-1000000] (afte) 38.0357 (1.00) 38.2935 (1.00) 38.1159 (1.00) 0.0597 (1.0) 38.1014 (1.00) 0.0846 (1.0) 6;1 27 groupby_agg[True-True-agg2-1000000] (befo) 37.9134 (1.0) 38.2851 (1.0) 38.0201 (1.0) 0.0929 (1.55) 37.9944 (1.0) 0.1066 (1.26) 7;1 26 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------- benchmark 'True-True-sum-100': 2 tests --------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[True-True-sum-100] (afte) 3.7452 (1.0) 4.0287 (1.0) 3.8009 (1.0) 0.0408 (1.0) 3.7968 (1.0) 0.0503 (1.0) 29;3 131 groupby_agg[True-True-sum-100] (befo) 3.8752 (1.03) 4.4384 (1.10) 3.9316 (1.03) 0.0608 (1.49) 3.9265 (1.03) 0.0504 (1.00) 4;3 148 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------- benchmark 'True-True-sum-10000': 2 tests --------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- groupby_agg[True-True-sum-10000] (afte) 4.4442 (1.0) 11.3511 (2.35) 4.5582 (1.0) 0.5829 (24.78) 4.4741 (1.0) 0.0323 (2.85) 3;19 171 groupby_agg[True-True-sum-10000] (befo) 4.5676 (1.03) 4.8264 (1.0) 4.5913 (1.01) 0.0235 (1.0) 4.5871 (1.03) 0.0114 (1.0) 15;16 168 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------- benchmark 'True-True-sum-1000000': 2 tests ----------------------------------------------------------------- Name (time in ms) Min Max Mean StdDev Median IQR Outliers Rounds ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ groupby_agg[True-True-sum-1000000] (afte) 30.5326 (1.00) 33.6395 (1.02) 31.2355 (1.0) 0.9563 (1.20) 30.6933 (1.0) 0.9663 (1.0) 5;3 30 groupby_agg[True-True-sum-1000000] (befo) 30.4080 (1.0) 33.0341 (1.0) 31.2527 (1.00) 0.7946 (1.0) 30.9808 (1.01) 1.2781 (1.32) 11;0 30 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ ``` </details> [Benchmark code](https://github.com/isVoid/cudf_benchmarks/blob/9d9644eaa5301df7894c2fe4b1ba317396240518/bench_groupby.py#L23-L42) Authors: - Michael Wang (https://github.com/isVoid) - Bradley Dice (https://github.com/bdice) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Bradley Dice (https://github.com/bdice) URL: #10419
- Loading branch information