Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove floating point types from radix sort fast-path #7215

Merged

Conversation

davidwendt
Copy link
Contributor

@davidwendt davidwendt commented Jan 26, 2021

Closes #7212

Reference #7167 (comment)

Using radix sort for all fixed-width types causes an error in Spark when floating point columns contain NaN elements.

This PR removes floating-point column types from the radix fast-path. This means the original relational_compare row operator is used to handle sorting floating point columns since they could possibly contain NaN elements.

The NANSorting gtest included null elements so it did not catch the fast-path output discrepancy. This PR adds a NANSortingNonNull gtest to check for the desired NaN sorting behavior.

@davidwendt davidwendt added bug Something isn't working 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change labels Jan 26, 2021
@davidwendt davidwendt requested a review from a team as a code owner January 26, 2021 18:55
@davidwendt davidwendt self-assigned this Jan 26, 2021
@davidwendt davidwendt changed the title Remove float point types from radix sort fast-path Remove floating point types from radix sort fast-path Jan 26, 2021
Copy link
Contributor

@codereport codereport left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Might suggest rewording the comment.

Comment on lines +52 to +54
// A non-stable sort on a column of arithmetic type with no nulls will use a radix sort
// if specifying only the `thrust::less` or `thrust::greater` comparators.
// But this also requires making a copy of the input data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads a bit odd.

@kkraus14 kkraus14 added 5 - Ready to Merge Testing and reviews complete, ready to merge 0 - Waiting on Author Waiting for author to respond to review and removed 3 - Ready for Review Ready for review by team 5 - Ready to Merge Testing and reviews complete, ready to merge labels Jan 26, 2021
@revans2
Copy link
Contributor

revans2 commented Jan 26, 2021

I reran the tests that we saw failing and this has fixed them. Thanks.

@codecov
Copy link

codecov bot commented Jan 26, 2021

Codecov Report

Merging #7215 (339c3f1) into branch-0.18 (8860baf) will increase coverage by 0.11%.
The diff coverage is 92.85%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.18    #7215      +/-   ##
===============================================
+ Coverage        82.09%   82.20%   +0.11%     
===============================================
  Files               97       98       +1     
  Lines            16474    16692     +218     
===============================================
+ Hits             13524    13722     +198     
- Misses            2950     2970      +20     
Impacted Files Coverage Δ
python/cudf/cudf/_lib/__init__.py 100.00% <ø> (ø)
python/cudf/cudf/core/column/lists.py 91.66% <ø> (-0.09%) ⬇️
python/cudf/cudf/core/column/numerical.py 94.13% <ø> (-0.29%) ⬇️
python/cudf/cudf/core/column/string.py 86.65% <ø> (ø)
python/cudf/cudf/core/dataframe.py 90.49% <ø> (-0.22%) ⬇️
python/cudf/cudf/core/dtypes.py 89.94% <ø> (-0.45%) ⬇️
python/cudf/cudf/core/frame.py 90.10% <ø> (+0.12%) ⬆️
python/cudf/cudf/core/groupby/groupby.py 93.59% <ø> (ø)
python/cudf/cudf/core/multiindex.py 82.19% <ø> (+0.05%) ⬆️
python/cudf/cudf/core/reshape.py 91.19% <ø> (+0.16%) ⬆️
... and 38 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6a4c760...339c3f1. Read the comment docs.

@davidwendt davidwendt added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 0 - Waiting on Author Waiting for author to respond to review labels Jan 27, 2021
@davidwendt
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit d19cb40 into rapidsai:branch-0.18 Jan 27, 2021
@davidwendt davidwendt deleted the remove-fastpath-float-sort branch January 27, 2021 00:08
rapids-bot bot pushed a commit that referenced this pull request Feb 1, 2021
PR #7215 removed single floating point columns from radix sort fast-path but missed disabling the fast-path sort for floating-point in `cudf::sort()`. 

This PR fixes `cudf::sort` and adds a new test to the existing `RowOperatorTestForNAN.NANSortingNonNull` gtest.

Authors:
  - David (@davidwendt)

Approvers:
  - Ram (Ramakrishna Prabhu) (@rgsl888prabhu)
  - Conor Hoekstra (@codereport)

URL: #7250
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] NaN values no longer sort correctly after optimization
5 participants