Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add string scalar replace benchmark #7369

Merged
merged 5 commits into from
Feb 16, 2021

Conversation

jlowe
Copy link
Member

@jlowe jlowe commented Feb 11, 2021

Reference #5698
This creates a gbenchmark for the scalar form of cudf::strings::replace.

@jlowe jlowe added 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue strings strings issues (C++ and Python) improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 11, 2021
@jlowe jlowe requested a review from davidwendt February 11, 2021 16:38
@jlowe jlowe self-assigned this Feb 11, 2021
@jlowe jlowe requested review from a team as code owners February 11, 2021 16:38
@jlowe jlowe requested a review from nvdbaranec February 11, 2021 16:38
@jlowe
Copy link
Member Author

jlowe commented Feb 11, 2021

These are the results on a V100. I filed #7370 to improve the performance on long strings.

---------------------------------------------------------------------------------------------------------------------
Benchmark                                                           Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------------------------
StringReplaceScalar/replace_scalar/4096/32/manual_time          0.292 ms        0.309 ms         2391 bytes_per_second=237.272M/s
StringReplaceScalar/replace_scalar/32768/32/manual_time         0.310 ms        0.328 ms         2247 bytes_per_second=1.74071G/s
StringReplaceScalar/replace_scalar/262144/32/manual_time        0.430 ms        0.450 ms         1621 bytes_per_second=10.0248G/s
StringReplaceScalar/replace_scalar/4096/64/manual_time          0.472 ms        0.489 ms         1480 bytes_per_second=289.85M/s
StringReplaceScalar/replace_scalar/32768/64/manual_time         0.537 ms        0.556 ms         1273 bytes_per_second=1.99466G/s
StringReplaceScalar/replace_scalar/262144/64/manual_time        0.870 ms        0.890 ms          789 bytes_per_second=9.82507G/s
StringReplaceScalar/replace_scalar/4096/512/manual_time          6.69 ms         6.71 ms          104 bytes_per_second=163.756M/s
StringReplaceScalar/replace_scalar/32768/512/manual_time         6.85 ms         6.87 ms          102 bytes_per_second=1.24332G/s
StringReplaceScalar/replace_scalar/262144/512/manual_time        17.2 ms         17.2 ms           41 bytes_per_second=3.97397G/s
StringReplaceScalar/replace_scalar/4096/4096/manual_time          276 ms          276 ms            3 bytes_per_second=31.6596M/s
StringReplaceScalar/replace_scalar/32768/4096/manual_time         306 ms          306 ms            2 bytes_per_second=228.699M/s
StringReplaceScalar/replace_scalar/262144/4096/manual_time        783 ms          783 ms            1 bytes_per_second=713.596M/s
StringReplaceScalar/replace_scalar/4096/8192/manual_time         1102 ms         1102 ms            1 bytes_per_second=15.761M/s
StringReplaceScalar/replace_scalar/32768/8192/manual_time        1376 ms         1376 ms            1 bytes_per_second=101.351M/s
StringReplaceScalar/replace_scalar/262144/8192/manual_time       3174 ms         3174 ms            1 bytes_per_second=351.438M/s

@jlowe
Copy link
Member Author

jlowe commented Feb 11, 2021

Here's the updated benchmarks using the custom arg generator showing the combinations that are run:

---------------------------------------------------------------------------------------------------------------------
Benchmark                                                           Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------------------------
StringReplaceScalar/replace_scalar/4096/32/manual_time          0.293 ms        0.310 ms         2377 bytes_per_second=236.511M/s
StringReplaceScalar/replace_scalar/4096/128/manual_time         0.948 ms        0.966 ms          731 bytes_per_second=286.424M/s
StringReplaceScalar/replace_scalar/4096/512/manual_time          6.69 ms         6.71 ms          104 bytes_per_second=163.745M/s
StringReplaceScalar/replace_scalar/4096/2048/manual_time         66.6 ms         66.7 ms           10 bytes_per_second=65.5526M/s
StringReplaceScalar/replace_scalar/4096/8192/manual_time         1102 ms         1102 ms            1 bytes_per_second=15.7623M/s
StringReplaceScalar/replace_scalar/32768/32/manual_time         0.314 ms        0.332 ms         2247 bytes_per_second=1.71925G/s
StringReplaceScalar/replace_scalar/32768/128/manual_time         1.08 ms         1.10 ms          642 bytes_per_second=1.97272G/s
StringReplaceScalar/replace_scalar/32768/512/manual_time         6.85 ms         6.87 ms          102 bytes_per_second=1.24247G/s
StringReplaceScalar/replace_scalar/32768/2048/manual_time        72.6 ms         72.7 ms           10 bytes_per_second=481.185M/s
StringReplaceScalar/replace_scalar/32768/8192/manual_time        1379 ms         1379 ms            1 bytes_per_second=101.085M/s
StringReplaceScalar/replace_scalar/262144/32/manual_time        0.431 ms        0.451 ms         1618 bytes_per_second=10.0098G/s
StringReplaceScalar/replace_scalar/262144/128/manual_time        1.90 ms         1.92 ms          365 bytes_per_second=8.96633G/s
StringReplaceScalar/replace_scalar/262144/512/manual_time        17.1 ms         17.1 ms           41 bytes_per_second=3.99232G/s
StringReplaceScalar/replace_scalar/262144/2048/manual_time        210 ms          210 ms            3 bytes_per_second=1.29973G/s
StringReplaceScalar/replace_scalar/2097152/32/manual_time        2.45 ms         2.47 ms          284 bytes_per_second=14.0643G/s
StringReplaceScalar/replace_scalar/2097152/128/manual_time       11.8 ms         11.9 ms           59 bytes_per_second=11.5248G/s
StringReplaceScalar/replace_scalar/2097152/512/manual_time       97.1 ms         97.0 ms            7 bytes_per_second=5.62153G/s
StringReplaceScalar/replace_scalar/16777216/32/manual_time       21.9 ms         21.9 ms           32 bytes_per_second=12.6234G/s

@codecov
Copy link

codecov bot commented Feb 11, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-0.19@083eb2a). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@              Coverage Diff               @@
##             branch-0.19    #7369   +/-   ##
==============================================
  Coverage               ?   81.79%           
==============================================
  Files                  ?      100           
  Lines                  ?    16610           
  Branches               ?        0           
==============================================
  Hits                   ?    13586           
  Misses                 ?     3024           
  Partials               ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 083eb2a...e1fba03. Read the comment docs.

@kkraus14
Copy link
Collaborator

@gpucibot merge

@rapids-bot rapids-bot bot merged commit b090d96 into rapidsai:branch-0.19 Feb 16, 2021
rapids-bot bot pushed a commit that referenced this pull request Feb 17, 2021
#7384)

Reference #7370 

This PR simplifies the current `cudf::strings::replace` (non-regex) functions by refactoring to use the more efficient `make_strings_children` utility. This refactoring improves performance by about 2x on these APIs as measured by the gbenchmark PR #7369.

<details>
  <summary>Baseline gbenchmark for replace-scalar</summary>

```
---------------------------------------------------------------------------------------------------------------------
Benchmark                                                           Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------------------------
StringReplaceScalar/replace_scalar/4096/32/manual_time          0.308 ms        0.316 ms         2345 bytes_per_second=224.631M/s
StringReplaceScalar/replace_scalar/4096/128/manual_time          1.01 ms         1.03 ms          684 bytes_per_second=269.171M/s
StringReplaceScalar/replace_scalar/4096/512/manual_time          7.35 ms         7.38 ms           95 bytes_per_second=149.028M/s
StringReplaceScalar/replace_scalar/4096/2048/manual_time         74.1 ms         74.2 ms            9 bytes_per_second=58.9153M/s
StringReplaceScalar/replace_scalar/4096/8192/manual_time         1170 ms         1170 ms            1 bytes_per_second=14.8457M/s
StringReplaceScalar/replace_scalar/32768/32/manual_time         0.314 ms        0.333 ms         2225 bytes_per_second=1.7147G/s
StringReplaceScalar/replace_scalar/32768/128/manual_time         1.16 ms         1.18 ms          604 bytes_per_second=1.83688G/s
StringReplaceScalar/replace_scalar/32768/512/manual_time         7.56 ms         7.58 ms           92 bytes_per_second=1.12604G/s
StringReplaceScalar/replace_scalar/32768/2048/manual_time        80.8 ms         80.9 ms            9 bytes_per_second=432.314M/s
StringReplaceScalar/replace_scalar/32768/8192/manual_time        1526 ms         1521 ms            1 bytes_per_second=91.3563M/s
StringReplaceScalar/replace_scalar/262144/32/manual_time        0.430 ms        0.449 ms         1622 bytes_per_second=10.0357G/s
StringReplaceScalar/replace_scalar/262144/128/manual_time        1.94 ms         1.96 ms          361 bytes_per_second=8.80298G/s
StringReplaceScalar/replace_scalar/262144/512/manual_time        18.1 ms         18.0 ms           39 bytes_per_second=3.77253G/s
StringReplaceScalar/replace_scalar/262144/2048/manual_time        227 ms          227 ms            3 bytes_per_second=1.20334G/s
StringReplaceScalar/replace_scalar/2097152/32/manual_time        2.48 ms         2.50 ms          282 bytes_per_second=13.9373G/s
StringReplaceScalar/replace_scalar/2097152/128/manual_time       11.8 ms         11.9 ms           59 bytes_per_second=11.5245G/s
StringReplaceScalar/replace_scalar/2097152/512/manual_time        101 ms          101 ms            7 bytes_per_second=5.42976G/s
StringReplaceScalar/replace_scalar/16777216/32/manual_time       22.2 ms         22.2 ms           31 bytes_per_second=12.4258G/s
```

</details>

<details>
  <summary>gbenchmark results for refactored replace-scalar</summary>

```
---------------------------------------------------------------------------------------------------------------------
Benchmark                                                           Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------------------------
StringReplaceScalar/replace_scalar/4096/32/manual_time          0.144 ms        0.162 ms         4871 bytes_per_second=481.559M/s
StringReplaceScalar/replace_scalar/4096/128/manual_time         0.428 ms        0.446 ms         1633 bytes_per_second=634.055M/s
StringReplaceScalar/replace_scalar/4096/512/manual_time          2.65 ms         2.67 ms          263 bytes_per_second=413.561M/s
StringReplaceScalar/replace_scalar/4096/2048/manual_time         28.8 ms         28.8 ms           24 bytes_per_second=151.733M/s
StringReplaceScalar/replace_scalar/4096/8192/manual_time          479 ms          479 ms            2 bytes_per_second=36.2387M/s
StringReplaceScalar/replace_scalar/32768/32/manual_time         0.161 ms        0.178 ms         4347 bytes_per_second=3.35237G/s
StringReplaceScalar/replace_scalar/32768/128/manual_time        0.466 ms        0.484 ms         1502 bytes_per_second=4.57268G/s
StringReplaceScalar/replace_scalar/32768/512/manual_time         2.94 ms         2.96 ms          238 bytes_per_second=2.89405G/s
StringReplaceScalar/replace_scalar/32768/2048/manual_time        37.4 ms         37.4 ms           19 bytes_per_second=933.899M/s
StringReplaceScalar/replace_scalar/32768/8192/manual_time         567 ms          565 ms            1 bytes_per_second=245.929M/s
StringReplaceScalar/replace_scalar/262144/32/manual_time        0.316 ms        0.334 ms         2198 bytes_per_second=13.6601G/s
StringReplaceScalar/replace_scalar/262144/128/manual_time        1.39 ms         1.41 ms          498 bytes_per_second=12.237G/s
StringReplaceScalar/replace_scalar/262144/512/manual_time        12.8 ms         12.9 ms           54 bytes_per_second=5.30963G/s
StringReplaceScalar/replace_scalar/262144/2048/manual_time        157 ms          157 ms            4 bytes_per_second=1.73861G/s
StringReplaceScalar/replace_scalar/2097152/32/manual_time        1.84 ms         1.86 ms          379 bytes_per_second=18.7409G/s
StringReplaceScalar/replace_scalar/2097152/128/manual_time       9.50 ms         9.52 ms           74 bytes_per_second=14.3717G/s
StringReplaceScalar/replace_scalar/2097152/512/manual_time       84.7 ms         84.7 ms            8 bytes_per_second=6.44185G/s
StringReplaceScalar/replace_scalar/16777216/32/manual_time       14.0 ms         14.0 ms           50 bytes_per_second=19.6828G/s
```

</details>

Improvements for #7370 should base off of these changes.

Authors:
  - David (@davidwendt)

Approvers:
  - Jason Lowe (@jlowe)
  - @nvdbaranec
  - Mark Harris (@harrism)

URL: #7384
rapids-bot bot pushed a commit that referenced this pull request Feb 19, 2021
… gbenchmark (#7403)

Reference #5698

This builds off of PR #7369 to add `cudf::strings::replace_slice` and the multi-column version of `cudf::strings::replace` to the current gbenchmark that only measures scalar strings replace.

The current `replace_scalar_benchmark.cpp` is also renamed to `replace_benchmark.cpp` since it now handles more than the scalar replace.

Authors:
  - David (@davidwendt)

Approvers:
  - Jason Lowe (@jlowe)
  - Keith Kraus (@kkraus14)
  - @nvdbaranec
  - Karthikeyan (@karthikeyann)

URL: #7403
@jlowe jlowe deleted the replace_benchmark branch September 10, 2021 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants