Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor binaryop/compiled/util.cpp #10756

Merged

Conversation

bdice
Copy link
Contributor

@bdice bdice commented Apr 29, 2022

This PR reduces the complexity of compile-time dispatches to resolve long compile times and massive memory usage in binaryop/compiled.util.cpp.

The file binaryop/compiled/util.cpp exposes two functions: is_supported_operation(out, lhs, rhs, op) and get_common_type(out, lhs, rhs). I refactored both of them, since they were both expensive to compile.

In is_supported_operation, I replaced a quadruple dispatch (!!!!) on (LHS type, RHS type, binary operation, output type) with a triple dispatch (LHS type, RHS type, BinaryOp) and some runtime single-dispatches to handle the output type.

In get_common_type, I replaced a triple type dispatch on (output type, LHS type, RHS type) with a few double type dispatches. I used the definition of std::common_type to simplify std::common_type_t<A, B, C> into std::common_type_t<std::common_type_t<A, B>, C>, which means we can double-dispatch twice and use runtime data_type values in between.

Impact: Peak memory usage (max resident set size) when compiling this file drops from 14.6 GB (280acdf) to 2.4 GB (ee2c26a), and the time to compile drops from 2:52.48 minutes (280acdf) to 57.91 seconds (ee2c26a).

@bdice bdice added libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 29, 2022
@bdice bdice self-assigned this Apr 29, 2022
@bdice bdice marked this pull request as ready for review April 29, 2022 01:18
@bdice bdice requested a review from a team as a code owner April 29, 2022 01:18
@codecov
Copy link

codecov bot commented Apr 29, 2022

Codecov Report

Merging #10756 (ecfadd1) into branch-22.06 (8d861ce) will increase coverage by 0.04%.
The diff coverage is 96.29%.

@@               Coverage Diff                @@
##           branch-22.06   #10756      +/-   ##
================================================
+ Coverage         86.40%   86.45%   +0.04%     
================================================
  Files               143      143              
  Lines             22448    22491      +43     
================================================
+ Hits              19396    19444      +48     
+ Misses             3052     3047       -5     
Impacted Files Coverage Δ
python/cudf/cudf/core/indexed_frame.py 91.70% <ø> (ø)
python/cudf/cudf/core/dataframe.py 93.77% <96.29%> (+0.08%) ⬆️
python/cudf/cudf/core/column/string.py 89.21% <0.00%> (+0.12%) ⬆️
python/cudf/cudf/core/groupby/groupby.py 91.79% <0.00%> (+0.22%) ⬆️
python/cudf/cudf/core/column/numerical.py 96.17% <0.00%> (+0.29%) ⬆️
python/cudf/cudf/core/tools/datetimes.py 84.49% <0.00%> (+0.30%) ⬆️
python/cudf/cudf/core/column/lists.py 92.91% <0.00%> (+0.83%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ce7b65...ecfadd1. Read the comment docs.

* @brief Functor that returns optional common type of 2 or 3 given types.
*
*/

struct common_type_functor {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is similar to the old code, except we remove a layer of dispatch (2 instead of 3 types). We make up for this in get_common_type by doing multiple double-dispatches with this functor in a way that is equivalent to the rules of std::common_type_t for multiple input types.

}
};

struct has_mutable_element_accessor_functor {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We introduce two new functors and methods to dispatch them (has_mutable_element_accessor, is_constructible) that we can use at runtime based on the value of data_type out. The method is_constructible is templated on ReturnType because it is known at compile time based on the invoked return type of the binary operator. However, the output type can be used as runtime information with a single dispatch.

@davidwendt
Copy link
Contributor

Any impact to performance -- the binaryop benchmarks?

Copy link
Contributor

@robertmaynard robertmaynard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't mark them all up, but all the operator()() / call() / etc functions here can be marked const

cpp/src/binaryop/compiled/util.cpp Outdated Show resolved Hide resolved
cpp/src/binaryop/compiled/util.cpp Outdated Show resolved Hide resolved
cpp/src/binaryop/compiled/util.cpp Outdated Show resolved Hide resolved
cpp/src/binaryop/compiled/util.cpp Outdated Show resolved Hide resolved
cpp/src/binaryop/compiled/util.cpp Outdated Show resolved Hide resolved
cpp/src/binaryop/compiled/util.cpp Outdated Show resolved Hide resolved
cpp/src/binaryop/compiled/util.cpp Outdated Show resolved Hide resolved
bdice and others added 2 commits April 29, 2022 12:06
@bdice
Copy link
Contributor Author

bdice commented Apr 30, 2022

@davidwendt To your question about benchmarks: BINARYOP_BENCH shows no differences in performance. It's within +/- 1% noise of the baseline for most benchmarks, and -0.6% (faster but probably just noise) for the geometric mean across all of the binary operation benchmarks. The utility functions changed here are probably not called many times and are fairly simple host code (aside from the depth of compile-time dispatches), so their performance is probably not visible in the benchmarks.

@karthikeyann
Copy link
Contributor

This is great! This file is one of the files which could not be compiled in Windows as well, due to sheer template combinations limit and huge memory usage.

Can you write a test on all types to check if old code and new code gives same result for the modified code, or top level is_supported_operation (for all types combo and all ops) ?

Copy link
Contributor

@karthikeyann karthikeyann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verify with a test (to ensure non-breaking). If it is taking too much time for this test to run, it need not be checked in.

@bdice
Copy link
Contributor Author

bdice commented May 10, 2022

@karthikeyann I wrote the test you requested. All data types and operations behave the same between the old and new code. You can see it here on a separate branch, but it's hacky and I don't plan to merge it because it requires compiling the old (huge) file: bdice@4a59d30

Would you (or a second reviewer) be able to approve this PR now?

@bdice bdice requested a review from karthikeyann May 10, 2022 21:40
Copy link
Contributor

@karthikeyann karthikeyann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 🚀

@bdice
Copy link
Contributor Author

bdice commented May 11, 2022

@gpucibot merge 🚀 🚀 🚀

(edit: oops, the bot requires an exact match @gpucibot merge and no other content)

@bdice
Copy link
Contributor Author

bdice commented May 11, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 0cc29a0 into rapidsai:branch-22.06 May 11, 2022
@PointKernel
Copy link
Member

Looks like the CI bot doesn't get along with rockets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants