Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] groupby aggregation falls back to sort aggregation for count on a string #10583

Closed
jbrennan333 opened this issue Apr 4, 2022 · 6 comments · Fixed by #10588
Closed

[BUG] groupby aggregation falls back to sort aggregation for count on a string #10583

jbrennan333 opened this issue Apr 4, 2022 · 6 comments · Fixed by #10588
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue Spark Functionality that helps Spark RAPIDS

Comments

@jbrennan333
Copy link
Contributor

Describe the bug
Regression testing of a customer query for spark-rapids found that the query is >20% slower with release 22.02 compared to release 21.08. Investigation found that the slowdown is coming from hash aggregations that are falling back to sort aggregations. With some debugging, I found that the reason we were falling back was we were failing the has_atomic_support() check in can_use_hash_groupby():

My debug prints in `can_use_hash_groupby()' while running the query.
JTB: no atomic support for type 23
JTB: has_struct=false, has_atomic_support=false, all_hash_aggregations=true
JTB: groupby: sorted = false, can_use_hash_groupby = false, _helper = false

Type 23 is STRING, so we were falling back for groupby's where the value type is string. The only aggregations that reference strings are counts: partial_gpucount((IF((__auto_generated_subquery_name.date_diff <= 7), __auto_generated_subquery_name.sn_id, CAST(NULL AS STRING)))#1019)

Looking at the code for can_use_hash_groupby(), it looks like we added a the cudf::has_atomic_support(r.values.type()) check as part of #9483, which added support for decimal128.

But I am not sure I understand why this check is there?

For a hash aggregation, we call detail::groupby(), which calls create_hash_map() to create the hash map. It creates a concurrent_unordered_map with the Key and Element types both set to size_type. If I’m reading it right, that pair of types is what needs to be atomic-able. It looks like it always will be. I don’t see where the value types come in to play, unless there is another atomic operation somewhere?

I think this has_atomic_support check is causing more aggregations to fallback to the sort_aggregation method than intended.
I've verified that if I remove that check, this customer query goes from 70 seconds down to 43 seconds (testing in CUDF 22.02), where in 21.08 it was ~55 seconds.

Steps/Code to reproduce bug
I don't have a simple repro case yet.

Expected behavior
I expect that the performance for the customers query should improve or stay the same.

Environment overview (please complete the following information)
I tested this on an internal cluster we have with 8 nodes where each node has an a100 gpu.

Environment details
N/A

Additional context
N/A

@jbrennan333 jbrennan333 added bug Something isn't working Needs Triage Need team to review and classify labels Apr 4, 2022
@jbrennan333 jbrennan333 added libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue labels Apr 4, 2022
@sameerz sameerz added the Spark Functionality that helps Spark RAPIDS label Apr 4, 2022
@jbrennan333
Copy link
Contributor Author

Based on my testing so far, we do need the has_atomic_support() check for decimal128 types. I am wondering if we should be checking the cudf::detail::target_type() for the aggregation, instead of the source?

@jlowe
Copy link
Member

jlowe commented Apr 4, 2022

I am wondering if we should be checking the cudf::detail::target_type() for the aggregation, instead of the source?

cc: @bdice who was looking at this recently

@bdice
Copy link
Contributor

bdice commented Apr 5, 2022

I noticed this with @isVoid last week. We discussed again today and agreed that using the "target type" might be the right thing to do here. Implementation reference:

using target_type_t = typename target_type_impl<Source, k>::type;

@kuhushukla
Copy link
Contributor

@jbrennan333 amazing debugging analysis. Kudos.

@jbrennan333
Copy link
Contributor Author

I put up PR #10588 as a possible fix for this.

@jbrennan333
Copy link
Contributor Author

Thanks @kuhushukla! @jlowe deserves credit as well for pointing me in the right direction.

rapids-bot bot pushed a commit that referenced this issue Apr 5, 2022
Closes #10583.

Change the has_atomic_support check in can_use_hash_groupby() to check the target type for the aggregation instead of the source type.

See discussion in #10583.

I have verified that this fixes the performance regression in our customer queries, and all unit tests still pass.

Authors:
  - Jim Brennan (https://github.com/jbrennan333)

Approvers:
  - Jake Hemstad (https://github.com/jrhemstad)
  - Yunsong Wang (https://github.com/PointKernel)
  - Nghia Truong (https://github.com/ttnghia)

URL: #10588
@bdice bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants