-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix fallback to sort aggregation for grouping only hash aggregate (21.12) #9898
Fix fallback to sort aggregation for grouping only hash aggregate (21.12) #9898
Conversation
…pidsai#9891) The following fixes what looks like an unintended fallback to sort aggregate introduced in rapidsai#9545 for a grouping only (no aggregation request) case. In the PR, the `std::all_of` function is used to determine whether the aggregation requests would be for struct types. That said, when there are no aggregation requests the `std::all_of` function will return true, causing a fallback to the sort aggregation (relevant code: https://github.com/rapidsai/cudf/pull/9545/files#diff-e409f72ddc11ad10fa0099e21b409b92f12bfac8ba1817266696c34a620aa081R645-R650). I added a benchmark `group_no_requests_benchmark.cu` by mostly copying `group_sum_benchmark.cu` but I changed one critical part. I am re-creating the `groupby` object for each `state`: ``` for (auto _ : state) { cuda_event_timer timer(state, true); cudf::groupby::groupby gb_obj(cudf::table_view({keys}));e auto result = gb_obj.aggregate(requests); } ``` This shows what would happen in the scenario where the `groupby` instance is created each time an aggregate is issued, which would re-create the `helper` each time for the sorted case. If the `groupby` object is not recreated each time, the difference in performance between the before/after cases is negligible. We never recycle a `groupby` instance when using the groupby API from Spark. Posting this as draft for feedback as I am not sure if I handled the benchmark part correctly. This was executed on a T4 GPU. Before the patch: ``` Groupby/BasicNoRequest/10000/manual_time 0.158 ms 0.184 ms 4420 Groupby/BasicNoRequest/1000000/manual_time 1.72 ms 1.74 ms 408 Groupby/BasicNoRequest/10000000/manual_time 18.9 ms 18.9 ms 37 Groupby/BasicNoRequest/100000000/manual_time 198 ms 198 ms 3 ``` <details> <summary>Full output</summary> <p> ``` 2021-12-12T13:41:08+00:00 Running ./GROUPBY_BENCH Run on (64 X 2801.89 MHz CPU s) CPU Caches: L1 Data 32 KiB (x32) L1 Instruction 32 KiB (x32) L2 Unified 1024 KiB (x32) L3 Unified 22528 KiB (x2) Load Average: 1.01, 0.78, 0.42 -------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------------------------------- Groupby/Basic/10000/manual_time 0.117 ms 0.143 ms 5906 Groupby/Basic/1000000/manual_time 0.524 ms 0.542 ms 1352 Groupby/Basic/10000000/manual_time 4.41 ms 4.43 ms 159 Groupby/Basic/100000000/manual_time 50.1 ms 50.1 ms 13 Groupby/PreSorted/1000000/manual_time 0.332 ms 0.350 ms 2118 Groupby/PreSorted/10000000/manual_time 2.22 ms 2.23 ms 315 Groupby/PreSorted/100000000/manual_time 22.2 ms 22.2 ms 30 Groupby/PreSortedNth/1000000/manual_time 0.160 ms 0.188 ms 4381 Groupby/PreSortedNth/10000000/manual_time 0.890 ms 0.917 ms 774 Groupby/PreSortedNth/100000000/manual_time 8.43 ms 8.46 ms 68 Groupby/Shift/1000000/manual_time 0.764 ms 0.785 ms 902 Groupby/Shift/10000000/manual_time 9.51 ms 9.53 ms 63 Groupby/Shift/100000000/manual_time 145 ms 145 ms 4 Groupby/Aggregation/10000/manual_time 1.56 ms 1.58 ms 442 Groupby/Aggregation/16384/manual_time 1.59 ms 1.62 ms 435 Groupby/Aggregation/65536/manual_time 1.73 ms 1.76 ms 404 Groupby/Aggregation/262144/manual_time 2.95 ms 2.98 ms 237 Groupby/Aggregation/1048576/manual_time 9.20 ms 9.23 ms 73 Groupby/Aggregation/4194304/manual_time 36.3 ms 36.3 ms 19 Groupby/Aggregation/10000000/manual_time 92.0 ms 92.1 ms 7 Groupby/Scan/10000/manual_time 1.56 ms 1.58 ms 447 Groupby/Scan/16384/manual_time 1.62 ms 1.65 ms 429 Groupby/Scan/65536/manual_time 1.85 ms 1.88 ms 378 Groupby/Scan/262144/manual_time 3.54 ms 3.56 ms 197 Groupby/Scan/1048576/manual_time 12.0 ms 12.0 ms 57 Groupby/Scan/4194304/manual_time 48.6 ms 48.6 ms 14 Groupby/Scan/10000000/manual_time 126 ms 126 ms 4 Groupby/BasicNoRequest/10000/manual_time 0.158 ms 0.184 ms 4420 Groupby/BasicNoRequest/1000000/manual_time 1.72 ms 1.74 ms 408 Groupby/BasicNoRequest/10000000/manual_time 18.9 ms 18.9 ms 37 Groupby/BasicNoRequest/100000000/manual_time 198 ms 198 ms 3 Groupby/PreSortedNoRequests/1000000/manual_time 0.194 ms 0.214 ms 3624 Groupby/PreSortedNoRequests/10000000/manual_time 1.25 ms 1.27 ms 571 Groupby/PreSortedNoRequests/100000000/manual_time 12.6 ms 12.7 ms 50 ``` </details> After the patch: ``` Groupby/BasicNoRequest/10000/manual_time 0.058 ms 0.085 ms 11991 Groupby/BasicNoRequest/1000000/manual_time 0.282 ms 0.301 ms 2478 Groupby/BasicNoRequest/10000000/manual_time 2.42 ms 2.44 ms 291 Groupby/BasicNoRequest/100000000/manual_time 29.2 ms 29.2 ms 21 ``` <details> <summary>Full output</summary> <p> ``` 2021-12-12T13:37:50+00:00 Running ./GROUPBY_BENCH Run on (64 X 2654.22 MHz CPU s) CPU Caches: L1 Data 32 KiB (x32) L1 Instruction 32 KiB (x32) L2 Unified 1024 KiB (x32) L3 Unified 22528 KiB (x2) Load Average: 0.64, 0.50, 0.26 -------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------------------------------- Groupby/Basic/10000/manual_time 0.116 ms 0.142 ms 5918 Groupby/Basic/1000000/manual_time 0.523 ms 0.542 ms 1374 Groupby/Basic/10000000/manual_time 4.37 ms 4.39 ms 162 Groupby/Basic/100000000/manual_time 51.4 ms 51.5 ms 10 Groupby/PreSorted/1000000/manual_time 0.331 ms 0.350 ms 2121 Groupby/PreSorted/10000000/manual_time 2.21 ms 2.23 ms 316 Groupby/PreSorted/100000000/manual_time 22.2 ms 22.2 ms 27 Groupby/PreSortedNth/1000000/manual_time 0.160 ms 0.188 ms 4384 Groupby/PreSortedNth/10000000/manual_time 0.888 ms 0.915 ms 775 Groupby/PreSortedNth/100000000/manual_time 8.36 ms 8.39 ms 70 Groupby/Shift/1000000/manual_time 0.764 ms 0.785 ms 904 Groupby/Shift/10000000/manual_time 9.50 ms 9.52 ms 63 Groupby/Shift/100000000/manual_time 146 ms 146 ms 4 Groupby/Aggregation/10000/manual_time 1.53 ms 1.55 ms 446 Groupby/Aggregation/16384/manual_time 1.58 ms 1.61 ms 437 Groupby/Aggregation/65536/manual_time 1.72 ms 1.75 ms 405 Groupby/Aggregation/262144/manual_time 2.93 ms 2.96 ms 236 Groupby/Aggregation/1048576/manual_time 9.18 ms 9.21 ms 74 Groupby/Aggregation/4194304/manual_time 36.2 ms 36.3 ms 19 Groupby/Aggregation/10000000/manual_time 91.5 ms 91.6 ms 7 Groupby/Scan/10000/manual_time 1.55 ms 1.57 ms 452 Groupby/Scan/16384/manual_time 1.60 ms 1.62 ms 434 Groupby/Scan/65536/manual_time 1.84 ms 1.87 ms 379 Groupby/Scan/262144/manual_time 3.54 ms 3.56 ms 198 Groupby/Scan/1048576/manual_time 12.0 ms 12.0 ms 57 Groupby/Scan/4194304/manual_time 48.4 ms 48.4 ms 14 Groupby/Scan/10000000/manual_time 125 ms 125 ms 4 Groupby/BasicNoRequest/10000/manual_time 0.058 ms 0.085 ms 11991 Groupby/BasicNoRequest/1000000/manual_time 0.282 ms 0.301 ms 2478 Groupby/BasicNoRequest/10000000/manual_time 2.42 ms 2.44 ms 291 Groupby/BasicNoRequest/100000000/manual_time 29.2 ms 29.2 ms 21 Groupby/PreSortedNoRequests/1000000/manual_time 0.195 ms 0.215 ms 3604 Groupby/PreSortedNoRequests/10000000/manual_time 1.25 ms 1.27 ms 575 Groupby/PreSortedNoRequests/100000000/manual_time 12.7 ms 12.8 ms 50 ``` </details> Authors: - Alessandro Bellina (https://github.com/abellina) Approvers: - Jake Hemstad (https://github.com/jrhemstad) - Nghia Truong (https://github.com/ttnghia) - Conor Hoekstra (https://github.com/codereport) URL: rapidsai#9891
Given CI is blocked because of:
I am not sure someone needs to pull this patch manually, or if this PR is the right approach. |
Codecov Report
@@ Coverage Diff @@
## branch-21.12 #9898 +/- ##
================================================
+ Coverage 10.58% 10.60% +0.02%
================================================
Files 118 118
Lines 19686 20082 +396
================================================
+ Hits 2083 2130 +47
- Misses 17603 17952 +349
Continue to review full report at Codecov.
|
One thing that this is not doing is changing the version of the cudf jar artifact. @ajschmidt8 should I change the version, or does that happen separately? There's the version in the jar itself (https://github.com/rapidsai/cudf/blob/branch-21.12/java/pom.xml#L24), and at least one from the cpp side (https://github.com/rapidsai/cudf/blob/branch-22.02/cpp/CMakeLists.txt#L28). |
No need to touch the version strings. We will update any necessary strings during the hotfix release. |
I've used a jar that I temporarily created as 21.12.1 and had it used with our plugin (21.12.0) as staged. I have run our hash aggregate integration tests that touch this code, and those passed. I'll also run all the tests and then run some benchmark code as well on our side to double check. |
We ran more tests, including our Spark rapids CI against the would be 21.12.1 jar, and did not find issues. Thanks for the reviews! |
This is a cherry-pick of 335862b for the 21.12 branch.
I'll run a few tests with this, so posting as is (a clean cherry-pick) just to get the ball rolling.