Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework cudf::find_and_replace_all to use gather-based make_strings_column #15305

Merged
merged 6 commits into from
Mar 22, 2024

Conversation

davidwendt
Copy link
Contributor

Description

Reworks cudf::find_and_replace_all for strings to work with long strings and enable it to support large strings.
The custom kernels were replaced with a gather-based make_strings_column already optimized for long and short strings.
Large strings will automatically be supported in make_strings_column in a future PR.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 14, 2024
@davidwendt davidwendt self-assigned this Mar 14, 2024
@github-actions github-actions bot added the CMake CMake build issue label Mar 14, 2024
@davidwendt davidwendt changed the base branch from branch-24.04 to branch-24.06 March 18, 2024 13:45
@davidwendt davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Mar 20, 2024
@davidwendt davidwendt marked this pull request as ready for review March 20, 2024 18:28
@davidwendt davidwendt requested review from a team as code owners March 20, 2024 18:28
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr)
{
if (input.is_empty()) { return cudf::make_empty_column(type_id::STRING); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check if (input.is_empty() or values_to_replace.empty() or replacement_values.empty()) instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already handled by the caller:
https://github.com/rapidsai/cudf/pull/15305/files#diff-8c11e29e23c203bd58fd1ee8b6aca9d9b49e06edbaced8e68bbf4728f9a1ebc5R310-R312
So this check is probably not necessary. I can remove it.

Copy link
Member

@mhaseeb123 mhaseeb123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution. This def makes things neater and clearer. Looks good to me!

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMake approval.

@davidwendt
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit b29fc1d into rapidsai:branch-24.06 Mar 22, 2024
68 checks passed
@davidwendt davidwendt deleted the find-replace-ls branch March 22, 2024 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants