Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement cudf::merge for nested types without using comparators #14250

Merged
merged 22 commits into from
Oct 28, 2023

Conversation

divyegala
Copy link
Member

@divyegala divyegala commented Oct 4, 2023

Description

Part of #11844

This PR also uses new experimental comparators for non-nested types by introducing a new device constructor for cudf::experimental::row::lexicographic::device_row_comparator. In the case of non-nested types, preprocessing can be skipped so comparators can be created on the fly. This solution helps us avoid creating 3 comparator types because thrust::merge can call the operator with indices from either side of the table.

Furthermore, the PR reworks cudf/detail/merge.cuh by removing any CUDA headers/components to expose a true detail API of the form cudf/detail/merge.hpp.

Benchmark comparison for non-nested types

Compilation time increases from ~6 mins to ~7 mins.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@divyegala divyegala added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Oct 4, 2023
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Oct 4, 2023
@divyegala
Copy link
Member Author

divyegala commented Oct 4, 2023

Benchmark                                          Time             CPU      Time Old      Time New       CPU Old       CPU New
-------------------------------------------------------------------------------------------------------------------------------
Merge/pow2tables/2/manual_time                  +0.0034         +0.0051             0             0             0             0
Merge/pow2tables/4/manual_time                  +0.0150         +0.0136             1             1             1             1
Merge/pow2tables/8/manual_time                  +0.0045         +0.0042             2             2             2             2
Merge/pow2tables/16/manual_time                 +0.0108         +0.0108             4             4             4             4
Merge/pow2tables/32/manual_time                 +0.0128         +0.0127             9            10            10            10
Merge/pow2tables/64/manual_time                 +0.0123         +0.0123            22            23            22            23
Merge/pow2tables/128/manual_time                +0.0147         +0.0147            52            53            52            53
OVERALL_GEOMEAN                                 +0.0105         +0.0105             0             0             0             0

@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 16, 2023

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@divyegala
Copy link
Member Author

/ok to test

@divyegala divyegala changed the title Reimplement cudf::merge without using comparators Reimplement cudf::merge for nested types without using comparators Oct 16, 2023
@divyegala
Copy link
Member Author

/ok to test

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. I have a couple early suggestions on the draft.

cpp/src/merge/merge.cu Show resolved Hide resolved
cpp/src/merge/merge.cu Outdated Show resolved Hide resolved
@divyegala
Copy link
Member Author

/ok to test

@divyegala
Copy link
Member Author

/ok to test

@divyegala
Copy link
Member Author

/ok to test

@github-actions github-actions bot added the CMake CMake build issue label Oct 20, 2023
@divyegala
Copy link
Member Author

/ok to test

@divyegala divyegala marked this pull request as ready for review October 20, 2023 17:34
@divyegala divyegala requested a review from a team as a code owner October 20, 2023 17:34
@divyegala divyegala requested a review from PointKernel October 20, 2023 17:34
@divyegala divyegala self-assigned this Oct 20, 2023
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work. I have a few comments.

cpp/benchmarks/sort/rank_lists.cpp Show resolved Hide resolved
cpp/benchmarks/sort/rank_structs.cpp Show resolved Hide resolved
cpp/include/cudf/merge.hpp Outdated Show resolved Hide resolved
cpp/src/merge/merge.cu Outdated Show resolved Hide resolved
cpp/src/merge/merge.cu Outdated Show resolved Hide resolved
@divyegala divyegala requested a review from bdice October 26, 2023 00:18
@divyegala
Copy link
Member Author

/ok to test

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. I had one comment about whether a comment is really a TODO or not, but otherwise LGTM.

cpp/src/merge/merge.cu Outdated Show resolved Hide resolved
@divyegala
Copy link
Member Author

/ok to test

1 similar comment
@divyegala
Copy link
Member Author

/ok to test

Copy link
Contributor

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking very good. I have a couple of trivial nitpicks.

I'm not done. I've just hit the meat of the change. I'll resume the review shortly.

cpp/include/cudf/merge.hpp Show resolved Hide resolved
* `null_order::BEFORE` for all columns.
* @param comparator Physical element relational comparison functor.
*/
template <bool nested_disable = not has_nested_columns, CUDF_ENABLE_IF(nested_disable)>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting phrasing. I'll remember this for when I need it.

Is nested_disable used elsewhere?
If not, would CUDF_ENABLE_IF(not has_nested_columns) be less readable?

Copy link
Member Author

@divyegala divyegala Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change the variable name if you'd like, but I cannot use the class-template has_nested_columns directly here.

Let me explain why, and my apologies if you already knew this but I misunderstood what you are suggesting.

If I did:

template <CUDF_ENABLE_IF(not has_nested_columns)>

then we'd not be able to trigger SFINAE. This is because has_nested_columns is a class-template. So, when you write device_comparator<true> d_comp, the substitution for the enable-if on the constructor will trigger a failure as enable_if::type will be ill-formed.

As a result of which, we want to defer the SFINAE at the member function level so that it behaves the way it is intended to: as an overload.

cpp/src/merge/merge.cu Outdated Show resolved Hide resolved
cpp/src/merge/merge.cu Outdated Show resolved Hide resolved
{
size_type const left_size = left_table.num_rows();
size_type const right_size = right_table.num_rows();
size_type const total_size = left_size + right_size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another impertinent question: Do we need to protect against the case where the total_size overflows?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A couple of minor questions.
Edit: Thanks, that was an interesting read. I've learnt a thing or two from this change.

@divyegala
Copy link
Member Author

/ok to test

@divyegala
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit 2bc454a into rapidsai:branch-23.12 Oct 28, 2023
54 checks passed
Comment on lines +641 to +647
CUDF_EXPECTS(std::accumulate(tables_to_merge.cbegin(),
tables_to_merge.cend(),
cudf::size_type{0},
[](auto const& running_sum, auto const& tbl) {
return running_sum + tbl.num_rows();
}) <= std::numeric_limits<cudf::size_type>::max(),
"Total number of merged rows exceeds row limit");
Copy link
Contributor

@bdice bdice Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@divyegala (cc: @mythrocks) I don't think this actually protects against overflow. By the definition of overflow (if it were even defined behavior for signed types, which it's not...), the sum will always be less than the max limit of size_type when it rolls over. You need to use a larger type and accumulate in std::size_t precision. Can you file a follow-up PR with this change?

Suggested change
CUDF_EXPECTS(std::accumulate(tables_to_merge.cbegin(),
tables_to_merge.cend(),
cudf::size_type{0},
[](auto const& running_sum, auto const& tbl) {
return running_sum + tbl.num_rows();
}) <= std::numeric_limits<cudf::size_type>::max(),
"Total number of merged rows exceeds row limit");
CUDF_EXPECTS(std::accumulate(tables_to_merge.cbegin(),
tables_to_merge.cend(),
size_t{0},
[](auto const& running_sum, auto const& tbl) {
return running_sum + static_cast<size_t>(tbl.num_rows());
}) <= static_cast<size_t>(std::numeric_limits<cudf::size_type>::max()),
"Total number of merged rows exceeds row limit");

Examples of this pattern:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out @bdice! I'll raise a PR

rapids-bot bot pushed a commit that referenced this pull request Oct 31, 2023
#14250 added a check to ensure `cudf::merge` throws when the total number of merged rows exceed `cudf::size_type` limit, however @bdice pointed out that the check was not correct because the accumulation was still occurring in `cudf::size_type`. This PR computes the accumulation in `std::size_t`.

Authors:
  - Divye Gala (https://github.com/divyegala)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - MithunR (https://github.com/mythrocks)
  - Nghia Truong (https://github.com/ttnghia)

URL: #14345
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants