-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use offsetalator in cudf::strings::findall #14745
Use offsetalator in cudf::strings::findall #14745
Conversation
cpp/src/strings/search/findall.cu
Outdated
// Create indices vector with the total number of groups that will be extracted | ||
auto const total_matches = | ||
cudf::detail::get_value<size_type>(offsets->view(), strings_count, stream); | ||
auto const sizes = count_matches(*d_strings, *d_prog, strings_count, stream, mr); //+1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the //+1
mean here? I see previously the code passed strings_count + 1
. Did the behavior change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch. That was left-over note-to-self. The +1 is not needed since the number of counts equals the strings_count. Previously we were storing the counts in an offsets column temporarily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the comment here: ba0d432
/merge |
…15043) Fixes `cudf::strings::extract_all()` to use `cudf::detail::make_offsets_child_column` so it properly computes the output-size and checks for overflow when building offsets for a lists column. Also undo some changes from #14745 that incorrectly called `cudf::strings::detail::make_offsets_child_column` to create offsets for a lists column. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) URL: #15043
Description
Use
make_offsets_child_column
andoffsetalator_iterator
to build/access offsets instead of hardcoded types.This cleans up the code nicely by automatically handling offset overflow and computing the total number of matches.
Checklist