Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gbenchmark for strings contains_re/count_re functions #7366

Merged

Conversation

davidwendt
Copy link
Contributor

Reference #5698
This creates a gbenchmark for the cudf::strings::contains_re and cudf::strings::count_re. The device logic is mostly the same for cudf::strings::contains_re and cudf::strings::matches_re so matches_re will be covered as well.

Also included here is a small change to the regex code where it fast-forwards the current character position iterator in certain cases. Using a increment approach vs recreating the iterator at the new position improved performance by ~20% for strings that contain non-ASCII UTF-8 characters.

@davidwendt davidwendt added 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python) improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 10, 2021
@davidwendt davidwendt self-assigned this Feb 10, 2021
@davidwendt davidwendt requested review from a team as code owners February 10, 2021 19:18
@github-actions github-actions bot added the CMake CMake build issue label Feb 10, 2021
@codecov
Copy link

codecov bot commented Feb 10, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-0.19@a08ec0e). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@              Coverage Diff               @@
##             branch-0.19    #7366   +/-   ##
==============================================
  Coverage               ?   82.21%           
==============================================
  Files                  ?      100           
  Lines                  ?    16971           
  Branches               ?        0           
==============================================
  Hits                   ?    13953           
  Misses                 ?     3018           
  Partials               ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a08ec0e...914ba32. Read the comment docs.

Copy link
Collaborator

@kkraus14 kkraus14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMake lgtm

@davidwendt
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit af2fc1b into rapidsai:branch-0.19 Feb 18, 2021
@davidwendt davidwendt deleted the benchmark-strings-contains branch February 18, 2021 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants