Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework some code logic to reduce iterator and comparator inlining to improve compile time #12900

Merged
merged 40 commits into from
Mar 27, 2023

Conversation

davidwendt
Copy link
Contributor

@davidwendt davidwendt commented Mar 7, 2023

Description

Disables inlining the device code logic for the row operators for nested column types did not work as hoped.
Some files took longer to compile and some functions ran 20% slower for large rows.

Reworking individual source files to break up the code logic into multiple kernels seems to work well for compile time while having a smaller effect on performance. The goal is to only rework the nested column code paths.
Here are some source files that have compile time issues and are improved in this PR.

source file current PR
stream_compaction/unique_count.cu 18 min 13 min
groupby/sort/group_nunique.cu 16 min 2 min
stream_compaction/unique.cu 16 min 5 min
groupby/sort/sort_helper.cu 10 min 6.5 min
search/contains_scalar.cu 12 min 4.7 min
sort/is_sorted.cu 9 min 7 min
groupby/sort/group_std.cu 7 min 1.2 min
groupby/sort/group_m2.cu 6 min 1.2 min

Available benchmarks showed minimal impact to performance.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 7, 2023
@davidwendt davidwendt self-assigned this Mar 7, 2023
@davidwendt davidwendt changed the title Disable inline of row operators for nested column types Rework row operator usage for nested column types to improve compile time Mar 16, 2023
@PointKernel
Copy link
Member

Is it fair to say that we should avoid the below functions to reduce build time?

  • cudf::detail::make_counting_transform_iterator
  • thrust::make_transform_iterator
  • thrust::unique_copy
  • thrust::count_if
  • thrust::is_sorted

@davidwendt
Copy link
Contributor Author

Is it fair to say that we should avoid the below functions to reduce build time?

  • cudf::detail::make_counting_transform_iterator
  • thrust::make_transform_iterator
  • thrust::unique_copy
  • thrust::count_if
  • thrust::is_sorted

Not in general. The transform iterators are certainly not the issue. The copy_if, and is_sorted all end up calling some version of cub:reduce. If the iterators/comparators they are given are more than a few lines of logic, the code bloat can be extreme due to large amount of inlining of the inputs throughout the generated kernel. And unique_copy has similar issues with inlining the comparator.
I will try to work on a more formal document outlining what to avoid and how to get around this problem.

Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Looking forward to seeing the build time guideline doc.

cpp/src/groupby/sort/sort_helper.cu Show resolved Hide resolved
cpp/src/groupby/sort/group_m2.cu Outdated Show resolved Hide resolved
cpp/src/sort/is_sorted.cu Outdated Show resolved Hide resolved
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic. These optimizations look correct and seem sensible, when considering the kernel complexity that we are now able to split up. I'd like a few small comments like the ones I suggested here, to indicate that our choices of algorithms are informed by compile time. That will clear things up for the reader, and help prevent "clever" refactors down the line.

keep,
stream);
size_type const unique_size = [&] {
if (cudf::detail::has_nested_columns(keys_view)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this compiles faster for nested types, does it also compile faster for non-nested types? If it's possible to unify these and have a single implementation of the algorithms, I would prefer that (rather than one transform + copy_if for nested types and one unique_copy for non-nested types).

If there are considerations like runtime, memory usage, etc. that warrant two separate implementations, then let's inform the reader with some comments explaining this decision.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does compile faster for non-nested types but the performance impact was too large (20-50% increase) for this path.

cpp/src/stream_compaction/unique_count.cu Show resolved Hide resolved
cpp/src/sort/is_sorted.cu Show resolved Hide resolved
@github-actions github-actions bot added the CMake CMake build issue label Mar 22, 2023
@davidwendt davidwendt requested a review from bdice March 23, 2023 11:30
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the comments. Thanks for that -- it helps a lot for future readers, and makes us more aware of the process of making compile time improvements.

Copy link
Contributor

@ttnghia ttnghia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this.

@davidwendt
Copy link
Contributor Author

/merge

1 similar comment
@davidwendt
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 12dc130 into rapidsai:branch-23.04 Mar 27, 2023
@davidwendt davidwendt deleted the row-ops-no-inline branch March 27, 2023 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants