Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes up the overflowed fixed-point round on nullable column #10316

Merged
merged 4 commits into from
Feb 22, 2022

Conversation

sperlingxx
Copy link
Contributor

@sperlingxx sperlingxx commented Feb 17, 2022

Signed-off-by: sperlingxx [email protected]

Fixes up the overflow round on nullable fixed-point columns. In previous implementation, we built the zero replacement column via cudf::detail::fill_in_place with a zero scalar. However, this API overwrites the null mask of original column with the validity of the zero scalar, which is unexpected.

@sperlingxx sperlingxx added bug Something isn't working 3 - Ready for Review Ready for review by team 4 - Needs Review Waiting for reviewer to review or respond non-breaking Non-breaking change labels Feb 17, 2022
@sperlingxx sperlingxx requested a review from a team as a code owner February 17, 2022 04:10
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Feb 17, 2022
@codecov
Copy link

codecov bot commented Feb 17, 2022

Codecov Report

Merging #10316 (40a0763) into branch-22.04 (8b0737d) will decrease coverage by 0.04%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-22.04   #10316      +/-   ##
================================================
- Coverage         10.67%   10.63%   -0.05%     
================================================
  Files               122      122              
  Lines             20874    20953      +79     
================================================
  Hits               2228     2228              
- Misses            18646    18725      +79     
Impacted Files Coverage Δ
python/cudf/cudf/core/frame.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/series.py 0.00% <0.00%> (ø)
python/cudf/cudf/utils/utils.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/dataframe.py 0.00% <0.00%> (ø)
python/cudf/cudf/testing/_utils.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/_base_index.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/column.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/string.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/indexed_frame.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/groupby/groupby.py 0.00% <0.00%> (ø)
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8b0737d...40a0763. Read the comment docs.

stream);
} else {
detail::copy_range(thrust::make_constant_iterator(static_cast<Type>(0)),
thrust::make_constant_iterator(false),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, if the input does not have null, the output will be a column of all nulls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you are right. It is unnecessary to set the null mask if the input doesn't have one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, then we can just use thrust::uninitialize_fill to fill the zero value for the output.

Copy link
Contributor Author

@sperlingxx sperlingxx Feb 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. While, I am quite curious why we use unititialize_fill instead of thrust::flll here. Is it because thrust::fill will have extra cost on initializing the data before assigning the value? @ttnghia

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's very interesting. I believe that for plain types they are the same. Otherwise, unititialize_fill calls copy constructor while thrust::fill call assignment operator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, it would be nice if you can add a unit test for this case 😃

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting!
The unit test for validity was added into one of the existed test blocks.

@sperlingxx sperlingxx requested a review from ttnghia February 21, 2022 01:47
@sperlingxx
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 7a17f28 into rapidsai:branch-22.04 Feb 22, 2022
@sperlingxx sperlingxx deleted the fix_fp_overflow_round branch February 22, 2022 02:29
sperlingxx added a commit to NVIDIA/spark-rapids that referenced this pull request Feb 23, 2022
Signed-off-by: sperlingxx <[email protected]>

Closes #3793

Pushes cuDF-related decimal utilities down to cuDF. This PR is relied on cuDF changes: rapidsai/cudf#9809, rapidsai/cudf#9907 and rapidsai/cudf#10316.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team 4 - Needs Review Waiting for reviewer to review or respond bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants