Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unsanitized output of scan with strings #13455

Merged

Conversation

davidwendt
Copy link
Contributor

Description

Fixes output from cudf::scan to return sanitized null entries. The null rows are expected to contain no characters if the input null rows are also empty.
The code logic is changed to compute the output validity mask first and use the mask to help build the gather map used for creating the output strings column. This rework also cleans up an unused parameter from internal functions detail::scan_inclusive and detail::scan_exclusive.
An additional check is added to the scan_test utility to verify the results contain empty null entries.

Closes #13449

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added bug Something isn't working 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change labels May 25, 2023
@davidwendt davidwendt self-assigned this May 25, 2023
@davidwendt davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels May 25, 2023
@davidwendt davidwendt marked this pull request as ready for review May 25, 2023 22:57
@davidwendt davidwendt requested a review from a team as a code owner May 25, 2023 22:57
@davidwendt davidwendt requested review from vyasr and PointKernel May 25, 2023 22:57
@GregoryKimball
Copy link
Contributor

Thank you @davidwendt!! That was fast!

@davidwendt
Copy link
Contributor Author

@razajafri Do you want to test this fixes your test from #13449 before I merge it?

@razajafri
Copy link
Contributor

@razajafri Do you want to test this fixes your test from #13449 before I merge it?

Ideally, I would like to do that but I won't be able to get to it until EOD today.

@revans2
Copy link
Contributor

revans2 commented May 31, 2023

@razajafri Do you want to test this fixes your test from #13449 before I merge it?

Ideally, I would like to do that but I won't be able to get to it until EOD today.

I did a quick run and it is looking good. I would like to run all of our tests, but the ones that were failing before are passing with this change now.

@davidwendt
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 858ddb6 into rapidsai:branch-23.08 Jun 2, 2023
@davidwendt davidwendt deleted the scan-strings-sanitize-output branch June 2, 2023 13:09
@razajafri
Copy link
Contributor

@davidwendt Thank you for the quick resolution. I just tested this with the plugin and it works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] cudf::scan on a String column returns a Column with non-empty nulls
6 participants