Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return weak orderings from device_row_comparator. #10793

Merged
merged 8 commits into from
May 11, 2022

Conversation

rwlee
Copy link
Contributor

@rwlee rwlee commented May 4, 2022

This PR changes the experimental device_row_comparator to return weak_ordering instead of bool.

Originally part of PR #9452. Aids PR #10730, which builds strongly-typed two table comparators and should return a weak_ordering.

@rwlee rwlee added 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 4, 2022
@rwlee rwlee requested a review from bdice May 4, 2022 20:10
@rwlee rwlee requested review from a team as code owners May 4, 2022 20:10
@rwlee rwlee requested a review from jrhemstad May 4, 2022 20:10
@github-actions github-actions bot added the CMake CMake build issue label May 4, 2022
@github-actions github-actions bot removed the CMake CMake build issue label May 4, 2022
@codecov
Copy link

codecov bot commented May 4, 2022

Codecov Report

Merging #10793 (08092fe) into branch-22.06 (8d861ce) will decrease coverage by 0.10%.
The diff coverage is n/a.

@@               Coverage Diff                @@
##           branch-22.06   #10793      +/-   ##
================================================
- Coverage         86.40%   86.29%   -0.11%     
================================================
  Files               143      143              
  Lines             22448    22597     +149     
================================================
+ Hits              19396    19501     +105     
- Misses             3052     3096      +44     
Impacted Files Coverage Δ
...thon/dask_cudf/dask_cudf/tests/test_distributed.py 18.86% <0.00%> (-67.93%) ⬇️
python/cudf/cudf/io/parquet.py 90.47% <0.00%> (-2.23%) ⬇️
python/cudf/cudf/core/index.py 92.06% <0.00%> (-0.25%) ⬇️
python/cudf/cudf/core/_compat.py 100.00% <0.00%> (ø)
python/cudf/cudf/core/indexed_frame.py 91.70% <0.00%> (ø)
python/cudf/cudf/core/series.py 95.17% <0.00%> (+<0.01%) ⬆️
python/cudf/cudf/testing/_utils.py 94.05% <0.00%> (+0.06%) ⬆️
python/cudf/cudf/core/dataframe.py 93.77% <0.00%> (+0.08%) ⬆️
python/cudf/cudf/core/column/string.py 89.21% <0.00%> (+0.12%) ⬆️
python/cudf/cudf/core/groupby/groupby.py 91.79% <0.00%> (+0.22%) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d3a39b3...08092fe. Read the comment docs.

Copy link
Contributor

@jrhemstad jrhemstad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know where or when this precisely needs to be done, but I'd like some words written about the equality comparator vs this comparator with weak_ordering::equivalent. The former should always be preferred in operations where you're only concerned with equality (like groupby/join/etc). This is because it will be more efficient than using this comparator with the weak_ordering::equivalent which would require doing two operator< vs a single operator==, which is important for something like strings.

@jrhemstad
Copy link
Contributor

I don't know where or when this precisely needs to be done, but I'd like some words written about the equality comparator vs this comparator with weak_ordering::equivalent. The former should always be preferred in operations where you're only concerned with equality (like groupby/join/etc). This is because it will be more efficient than using this comparator with the weak_ordering::equivalent which would require doing two operator< vs a single operator==, which is important for something like strings.

In other words, I don't want us to ever define a:

using equivalent_comparator = weak_ordering_comparator_impl<Comparator, weak_ordering::equivalent>;

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is surprisingly clean with the use of a parameter pack fold expression. Awesome suggestion @jrhemstad, and nice work implementing it @rwlee.

@rwlee showed me benchmarks offline (which I think he is going to paste here soon) and these changes didn't seem to have any negative performance impact.

With the documentation suggestion below, I approve.

@rwlee
Copy link
Contributor Author

rwlee commented May 10, 2022

@rwlee showed me benchmarks offline (which I think he is going to paste here soon) and these changes didn't seem to have any negative performance impact.

Benchmarks were posted above #10793 (comment) -- linking down here so they can be found after the review comment it is resolved.

@bdice bdice requested a review from jrhemstad May 10, 2022 15:11
@bdice bdice changed the title Split off weak ordering row operator changes Use weak orderings in device_row_comparator. May 10, 2022
@bdice
Copy link
Contributor

bdice commented May 10, 2022

@rwlee I edited the PR title and description to make it clearer what this is doing -- see these previous comments:

@bdice bdice changed the title Use weak orderings in device_row_comparator. Return weak orderings from device_row_comparator. May 10, 2022
using less_equivalent_comparator = weak_ordering_comparator_impl<device_row_comparator<Nullate>,
weak_ordering::LESS,
weak_ordering::EQUIVALENT>;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to have aliases for the remaining comparator too (i.e., > and >=).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then how about less_equivalent_comparator? We are not using it now, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#9452 uses less_equivalent_comparator.

Comment on lines 47 to 49
namespace cudf {

namespace experimental {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
namespace cudf {
namespace experimental {
namespace cudf::experimental {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has several layers of nested namespaces. For consistency, I would recommend leaving this unchanged.

@bdice
Copy link
Contributor

bdice commented May 11, 2022

I discussed this PR and #9452 with @rwlee offline. I'm merging this to unblock the next steps for #10730 and #9452.

@bdice
Copy link
Contributor

bdice commented May 11, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 325fa77 into rapidsai:branch-22.06 May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants