-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add gbenchmark for strings contains_re/count_re functions #7366
Add gbenchmark for strings contains_re/count_re functions #7366
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-0.19 #7366 +/- ##
==============================================
Coverage ? 82.21%
==============================================
Files ? 100
Lines ? 16971
Branches ? 0
==============================================
Hits ? 13953
Misses ? 3018
Partials ? 0 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CMake lgtm
@gpucibot merge |
Reference #5698
This creates a gbenchmark for the
cudf::strings::contains_re
andcudf::strings::count_re
. The device logic is mostly the same forcudf::strings::contains_re
andcudf::strings::matches_re
somatches_re
will be covered as well.Also included here is a small change to the regex code where it fast-forwards the current character position iterator in certain cases. Using a increment approach vs recreating the iterator at the new position improved performance by ~20% for strings that contain non-ASCII UTF-8 characters.