Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for integer overflow in contiguous-split #10437

Merged

Conversation

jbrennan333
Copy link
Contributor

This is a fix for an integer overflow in contiguous_split.cu::copy_data.

We encountered this problem when running a large query with the spark-rapids plug-in. The plug-in OOM callback was reporting negative numbers for some failed allocations. It treats the allocation size as a long. The allocations were coming from contiguous_split with a large number of rows (399,000,000) of type UINT64.

With help from @nvdbaranec, I was able to isolate the problem to this line:

size_t const bytes = buf.num_elements * buf.element_size;

The two operands are both INT32. So the result of the multiply is being sign-extended when assigning to bytes.

The fix is to cast these to size_t before the multiply.

@jbrennan333 jbrennan333 self-assigned this Mar 15, 2022
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Mar 15, 2022
@jbrennan333 jbrennan333 added bug Something isn't working 4 - Needs Review Waiting for reviewer to review or respond and removed libcudf Affects libcudf (C++/CUDA) code. labels Mar 15, 2022
@jbrennan333 jbrennan333 added libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change labels Mar 15, 2022
@codecov
Copy link

codecov bot commented Mar 15, 2022

Codecov Report

Merging #10437 (a5acc9f) into branch-22.04 (4596244) will increase coverage by 0.04%.
The diff coverage is 100.00%.

@@               Coverage Diff                @@
##           branch-22.04   #10437      +/-   ##
================================================
+ Coverage         86.13%   86.18%   +0.04%     
================================================
  Files               139      139              
  Lines             22438    22468      +30     
================================================
+ Hits              19328    19363      +35     
+ Misses             3110     3105       -5     
Impacted Files Coverage Δ
python/cudf/cudf/core/tools/numeric.py 89.24% <100.00%> (+0.11%) ⬆️
python/dask_cudf/dask_cudf/backends.py 86.44% <100.00%> (+1.47%) ⬆️
...ython/dask_cudf/dask_cudf/io/tests/test_parquet.py 100.00% <100.00%> (ø)
python/cudf/cudf/core/column/string.py 88.39% <0.00%> (+0.12%) ⬆️
python/cudf/cudf/core/groupby/groupby.py 91.57% <0.00%> (+0.22%) ⬆️
python/cudf/cudf/core/column/numerical.py 95.28% <0.00%> (+0.29%) ⬆️
python/cudf/cudf/core/tools/datetimes.py 84.49% <0.00%> (+0.30%) ⬆️
python/cudf/cudf/core/column/lists.py 90.56% <0.00%> (+0.47%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d4ce5d5...a5acc9f. Read the comment docs.

@jbrennan333 jbrennan333 marked this pull request as ready for review March 15, 2022 16:43
@jbrennan333 jbrennan333 requested a review from a team as a code owner March 15, 2022 16:43
cpp/src/copying/contiguous_split.cu Outdated Show resolved Hide resolved
Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks for helping clean up the code!

@PointKernel
Copy link
Member

rerun tests

@jbrennan333
Copy link
Contributor Author

They python build failure looks like build system issue:

[](https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cudf/job/prb/job/cudf-cpu-python-build/7219/console#L223)[](https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cudf/job/prb/job/cudf-cpu-python-build/7219/console#L224)
15:23:50 ‘EC2 (aws-b) - runner-m5d2xl (i-0fcffb338c35da323)’ doesn’t have label ‘cpu4’
15:23:50 ‘Jenkins’ doesn’t have label ‘cpu4’
15:49:46 [rapidsai » gpuci » cudf » prb » cudf-cpu-python-build » 11.5,3.9](https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cudf/job/prb/job/cudf-cpu-python-build/CUDA=11.5,PYTHON=3.9/)[](https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cudf/job/prb/job/cudf-cpu-python-build/7219/console#L225) completed with result FAILURE
15:49:46 [rapidsai » gpuci » cudf » prb » cudf-cpu-python-build » 11.5,3.8](https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cudf/job/prb/job/cudf-cpu-python-build/CUDA=11.5,PYTHON=3.8/) completed with result FAILURE

@jbrennan333
Copy link
Contributor Author

Thanks for the review @ttnghia!

The build failures appear to be unrelated - do we need to wait until they are resolved before we can merge this?

@PointKernel
Copy link
Member

The CI failure fix has been merged (see #10442). @jbrennan333 Could you merge with the latest branch-22.04?

@PointKernel
Copy link
Member

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 5d3f7dc into rapidsai:branch-22.04 Mar 16, 2022
@jbrennan333
Copy link
Contributor Author

Thanks for the reviews @PointKernel, @nvdbaranec, and @ttnghia! And thanks for testing the patch @nvdbaranec!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Needs Review Waiting for reviewer to review or respond bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants