Add gbenchmark for strings contains_re/count_re functions #7366

davidwendt · 2021-02-10T19:18:53Z

Reference #5698
This creates a gbenchmark for the cudf::strings::contains_re and cudf::strings::count_re. The device logic is mostly the same for cudf::strings::contains_re and cudf::strings::matches_re so matches_re will be covered as well.

Also included here is a small change to the regex code where it fast-forwards the current character position iterator in certain cases. Using a increment approach vs recreating the iterator at the new position improved performance by ~20% for strings that contain non-ASCII UTF-8 characters.

codecov · 2021-02-10T23:37:35Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-0.19@a08ec0e). Click here to learn what that means.
The diff coverage is n/a.

@@              Coverage Diff               @@
##             branch-0.19    #7366   +/-   ##
==============================================
  Coverage               ?   82.21%           
==============================================
  Files                  ?      100           
  Lines                  ?    16971           
  Branches               ?        0           
==============================================
  Hits                   ?    13953           
  Misses                 ?     3018           
  Partials               ?        0

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a08ec0e...914ba32. Read the comment docs.

kkraus14

CMake lgtm

davidwendt · 2021-02-18T16:35:23Z

@gpucibot merge

davidwendt added 2 commits February 10, 2021 14:08

Add gbenchmark for strings contains_re/count_re functions

6456f5f

increment update dstr iter instead of creating a new one

0bec861

davidwendt added 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python) improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 10, 2021

davidwendt self-assigned this Feb 10, 2021

davidwendt requested review from a team as code owners February 10, 2021 19:18

davidwendt requested review from devavret and rgsl888prabhu February 10, 2021 19:18

github-actions bot added the CMake CMake build issue label Feb 10, 2021

add findall_re to benchmark as well

289af40

rgsl888prabhu approved these changes Feb 12, 2021

View reviewed changes

fix merge conflicts

914ba32

devavret approved these changes Feb 18, 2021

View reviewed changes

kkraus14 approved these changes Feb 18, 2021

View reviewed changes

rapids-bot bot merged commit af2fc1b into rapidsai:branch-0.19 Feb 18, 2021

davidwendt deleted the benchmark-strings-contains branch February 18, 2021 18:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add gbenchmark for strings contains_re/count_re functions #7366

Add gbenchmark for strings contains_re/count_re functions #7366

davidwendt commented Feb 10, 2021

codecov bot commented Feb 10, 2021 •

edited

Loading

kkraus14 left a comment

davidwendt commented Feb 18, 2021

Add gbenchmark for strings contains_re/count_re functions #7366

Add gbenchmark for strings contains_re/count_re functions #7366

Conversation

davidwendt commented Feb 10, 2021

codecov bot commented Feb 10, 2021 • edited Loading

Codecov Report

kkraus14 left a comment

Choose a reason for hiding this comment

davidwendt commented Feb 18, 2021

codecov bot commented Feb 10, 2021 •

edited

Loading