-
Notifications
You must be signed in to change notification settings - Fork 922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cudf::strings::findall_record API #9911
Add cudf::strings::findall_record API #9911
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-22.04 #9911 +/- ##
================================================
+ Coverage 10.37% 10.43% +0.05%
================================================
Files 119 119
Lines 20149 20590 +441
================================================
+ Hits 2091 2148 +57
- Misses 18058 18442 +384
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CMake changes LGTM
@davidwendt this appears to have fallen through the cracks on my part, apologies. Is this a must for 22.02? |
I don't think so. I'll move it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to mark this as breaking since it's changing the name of a public API (findall_re
-> findall
).
Also, I tried to fix as many copyright years as I could but I probably missed a few (especially those that weren't part of the diff like the copyright line in contains.cpp).
@gpucibot merge |
Reference #9856 specifically #9856 (comment)
Adds
cudf::strings::findall_record
which was initially implemented in nvstrings but not ported over since LIST column types did not exist at the time and returning a vector of small columns was very inefficient. This API should also allow using the current python functioncudf.str.findall()
with theexpand=False
parameter more effectively. A follow-on PR will address these python changes.This PR reorganizes the libcudf strings find source files into the
cpp/src/strings/search
subdirectory as well. Also,findall()
has only a regex version so the_re
suffix is dropped from the name in the libcudf implementation. The python changes in this PR address only the name change and the addition of the new API in the cython interface.Depends on #9909 -- shares the
cudf::strings::detail::count_matches()
utility function.