Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect map size in scatter_to_gather corrupts struct columns #8507

Merged
merged 3 commits into from
Jun 14, 2021

Conversation

gerashegalov
Copy link
Contributor

@gerashegalov gerashegalov commented Jun 12, 2021

Fixes the scatter map size to the target column size.
Simplifies copy_if_else by skipping gather on lhs, and using lhs directly as the rhs scatter destinations for 0-rows in select col.

Closes #8356

Co-authored-by: @mythrocks

@gerashegalov gerashegalov added 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS non-breaking Non-breaking change labels Jun 12, 2021
@gerashegalov gerashegalov self-assigned this Jun 12, 2021
@gerashegalov gerashegalov requested a review from a team as a code owner June 12, 2021 15:40
Simplifies copy_if_else by skipping gather on lhs, and using lhs directly
as the rhs scatter destinations for 0-rows in select col.

Closes rapidsai#8356

Co-authored-by: MithunR <[email protected]>
@gerashegalov gerashegalov added the bug Something isn't working label Jun 12, 2021
@gerashegalov gerashegalov changed the title Incorrect scatter map size corrupts struct columns Incorrect map size in scatter_to_gather corrupts struct columns Jun 12, 2021
@codecov
Copy link

codecov bot commented Jun 12, 2021

Codecov Report

Merging #8507 (a27bfde) into branch-21.08 (709adb1) will increase coverage by 0.38%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-21.08    #8507      +/-   ##
================================================
+ Coverage         82.53%   82.91%   +0.38%     
================================================
  Files               110      110              
  Lines             17739    18094     +355     
================================================
+ Hits              14640    15002     +362     
+ Misses             3099     3092       -7     
Impacted Files Coverage Δ
python/cudf/cudf/io/feather.py 100.00% <0.00%> (ø)
python/cudf/cudf/comm/serialize.py 0.00% <0.00%> (ø)
python/cudf/cudf/_fuzz_testing/io.py 0.00% <0.00%> (ø)
python/cudf/cudf/utils/applyutils.py 100.00% <0.00%> (ø)
python/dask_cudf/dask_cudf/_version.py 0.00% <0.00%> (ø)
python/dask_cudf/dask_cudf/io/tests/test_csv.py 100.00% <0.00%> (ø)
python/dask_cudf/dask_cudf/io/tests/test_orc.py 100.00% <0.00%> (ø)
python/dask_cudf/dask_cudf/io/tests/test_json.py 100.00% <0.00%> (ø)
...ython/dask_cudf/dask_cudf/io/tests/test_parquet.py 100.00% <0.00%> (ø)
python/cudf/cudf/core/column/numerical.py 94.09% <0.00%> (+0.01%) ⬆️
... and 41 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 709adb1...a27bfde. Read the comment docs.

cpp/include/cudf/detail/scatter.cuh Show resolved Hide resolved
cpp/tests/copying/scatter_struct_tests.cpp Outdated Show resolved Hide resolved
@gerashegalov gerashegalov requested a review from ttnghia June 14, 2021 18:04
Copy link
Contributor

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if my approval is strictly kosher here, but 👍.

We can add a test for LEAD()/LAG() at a later date. I'm certain that this fixes that as well.

@mythrocks
Copy link
Contributor

@wbo4958, this PR should resolve #8281. :]

@harrism
Copy link
Member

harrism commented Jun 14, 2021

@mythrocks I discussed with @sameerz and @revans2 and they agreed this is not needed in 21.06 because Spark has disabled this functionality in that release due to the issue fixed here.

@harrism
Copy link
Member

harrism commented Jun 14, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 26f5671 into rapidsai:branch-21.08 Jun 14, 2021
@gerashegalov gerashegalov deleted the copy_if_else_8356 branch June 14, 2021 22:42
rapids-bot bot pushed a commit that referenced this pull request Jun 18, 2021
This PR is for #8281, since the bug has been fixed by #8507. So just enable the test.

Authors:
  - Bobby Wang (https://github.com/wbo4958)

Approvers:
  - Liangcai Li (https://github.com/firestarman)

URL: #8548
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] copy_if_else producing incorrect answers for structs
4 participants