-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework some code logic to reduce iterator and comparator inlining to improve compile time #12900
Rework some code logic to reduce iterator and comparator inlining to improve compile time #12900
Conversation
Is it fair to say that we should avoid the below functions to reduce build time?
|
Not in general. The transform iterators are certainly not the issue. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Looking forward to seeing the build time guideline doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fantastic. These optimizations look correct and seem sensible, when considering the kernel complexity that we are now able to split up. I'd like a few small comments like the ones I suggested here, to indicate that our choices of algorithms are informed by compile time. That will clear things up for the reader, and help prevent "clever" refactors down the line.
keep, | ||
stream); | ||
size_type const unique_size = [&] { | ||
if (cudf::detail::has_nested_columns(keys_view)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this compiles faster for nested types, does it also compile faster for non-nested types? If it's possible to unify these and have a single implementation of the algorithms, I would prefer that (rather than one transform + copy_if
for nested types and one unique_copy
for non-nested types).
If there are considerations like runtime, memory usage, etc. that warrant two separate implementations, then let's inform the reader with some comments explaining this decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does compile faster for non-nested types but the performance impact was too large (20-50% increase) for this path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the comments. Thanks for that -- it helps a lot for future readers, and makes us more aware of the process of making compile time improvements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this.
/merge |
1 similar comment
/merge |
Description
Disables inlining the device code logic for the row operators for nested column types did not work as hoped.
Some files took longer to compile and some functions ran 20% slower for large rows.
Reworking individual source files to break up the code logic into multiple kernels seems to work well for compile time while having a smaller effect on performance. The goal is to only rework the nested column code paths.
Here are some source files that have compile time issues and are improved in this PR.
Available benchmarks showed minimal impact to performance.
Checklist