Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement for libcudf upper/lower conversion for long strings #13142

Merged
merged 31 commits into from
May 9, 2023

Conversation

davidwendt
Copy link
Contributor

@davidwendt davidwendt commented Apr 14, 2023

Description

Improves on performance for longer strings with cudf::strings::to_lower() cudf::strings::to_upper() and cudf::strings::swapcase() APIs.

The current implementation works well with smallish strings and so this new implementation splits into a longish string algorithm when average number of bytes per string is 64 bytes or greater. The new implementation is similar but computes the output size of each string with a warp per string function. In addition, a check is added for long strings testing if all bytes are ASCII and thereby can run a faster kernel for this case.

Reference #13048

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python) improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 14, 2023
@davidwendt davidwendt self-assigned this Apr 14, 2023
@github-actions github-actions bot added the CMake CMake build issue label Apr 19, 2023
@davidwendt
Copy link
Contributor Author

davidwendt commented Apr 19, 2023

Some benchmarks used to help find the proper threshold

UTF-8 encoded

|  width  |  num_rows  |   Ref Time |   Cmp Time |          Diff |   %Diff |
|---------|------------|------------|------------|---------------|---------|
|   32    |    4096    | 139.464 us | 140.954 us |      1.490 us |   1.07% |
|   48    |    4096    | 158.875 us | 158.903 us |      0.028 us |   0.02% |
|   64    |    4096    | 176.689 us | 179.506 us |      2.817 us |   1.59% |
|   80    |    4096    | 199.294 us | 200.059 us |      0.765 us |   0.38% |
|   96    |    4096    | 218.589 us | 222.201 us |      3.612 us |   1.65% |
|   112   |    4096    | 240.699 us | 241.845 us |      1.146 us |   0.48% |
|   128   |    4096    | 261.917 us | 225.478 us |    -36.439 us | -13.91% |
|   144   |    4096    | 285.389 us | 236.887 us |    -48.501 us | -16.99% |
|   160   |    4096    | 305.034 us | 250.683 us |    -54.351 us | -17.82% |
|   176   |    4096    | 334.522 us | 265.868 us |    -68.654 us | -20.52% |
|   192   |    4096    | 357.492 us | 282.691 us |    -74.800 us | -20.92% |
|   208   |    4096    | 373.461 us | 288.581 us |    -84.880 us | -22.73% |
|   224   |    4096    | 400.474 us | 307.051 us |    -93.423 us | -23.33% |
|   240   |    4096    | 426.321 us | 320.548 us |   -105.772 us | -24.81% |
|   256   |    4096    | 443.517 us | 331.745 us |   -111.773 us | -25.20% |
|   272   |    4096    | 471.975 us | 345.760 us |   -126.216 us | -26.74% |
|   284   |    4096    | 490.323 us | 359.025 us |   -131.298 us | -26.78% |
|   300   |    4096    | 506.685 us | 366.929 us |   -139.756 us | -27.58% |
|   32    |   32768    | 143.739 us | 144.959 us |      1.221 us |   0.85% |
|   48    |   32768    | 164.004 us | 163.662 us |     -0.341 us |  -0.21% |
|   64    |   32768    | 334.301 us | 320.952 us |    -13.350 us |  -3.99% |
|   80    |   32768    | 360.091 us | 346.512 us |    -13.579 us |  -3.77% |
|   96    |   32768    | 387.552 us | 371.259 us |    -16.293 us |  -4.20% |
|   112   |   32768    | 390.625 us | 391.448 us |      0.823 us |   0.21% |
|   128   |   32768    | 413.678 us | 416.810 us |      3.131 us |   0.76% |
|   144   |   32768    | 441.454 us | 415.916 us |    -25.538 us |  -5.79% |
|   160   |   32768    | 464.721 us | 430.787 us |    -33.934 us |  -7.30% |
|   176   |   32768    | 493.800 us | 449.489 us |    -44.310 us |  -8.97% |
|   192   |   32768    | 520.755 us | 465.078 us |    -55.677 us | -10.69% |
|   208   |   32768    | 541.959 us | 479.342 us |    -62.617 us | -11.55% |
|   224   |   32768    | 570.302 us | 497.108 us |    -73.194 us | -12.83% |
|   240   |   32768    | 602.929 us | 516.911 us |    -86.018 us | -14.27% |
|   256   |   32768    | 615.334 us | 529.682 us |    -85.651 us | -13.92% |
|   272   |   32768    | 647.557 us | 552.137 us |    -95.421 us | -14.74% |
|   284   |   32768    | 670.339 us | 565.372 us |   -104.967 us | -15.66% |
|   300   |   32768    | 691.753 us | 579.555 us |   -112.198 us | -16.22% |
|   32    |   262144   | 368.272 us | 365.435 us |     -2.837 us |  -0.77% |
|   48    |   262144   | 452.968 us | 455.215 us |      2.247 us |   0.50% |
|   64    |   262144   | 568.695 us | 573.062 us |      4.366 us |   0.77% |
|   80    |   262144   | 821.041 us | 800.553 us |    -20.489 us |  -2.50% |
|   96    |   262144   |   1.203 ms |   1.201 ms |     -1.299 us |  -0.11% |
|   112   |   262144   |   1.712 ms |   1.707 ms |     -4.889 us |  -0.29% |
|   128   |   262144   |   2.213 ms |   2.296 ms |     82.105 us |   3.71% |
|   144   |   262144   |   2.808 ms |   2.876 ms |     68.463 us |   2.44% |
|   160   |   262144   |   3.319 ms |   3.366 ms |     47.901 us |   1.44% |
|   176   |   262144   |   4.018 ms |   3.957 ms |    -61.271 us |  -1.52% |
|   192   |   262144   |   4.815 ms |   4.594 ms |   -220.492 us |  -4.58% |
|   208   |   262144   |   5.850 ms |   5.299 ms |   -550.946 us |  -9.42% |
|   224   |   262144   |   6.618 ms |   5.920 ms |   -697.494 us | -10.54% |
|   240   |   262144   |   7.591 ms |   6.596 ms |   -995.731 us | -13.12% |
|   256   |   262144   |   8.731 ms |   7.331 ms |  -1399.589 us | -16.03% |
|   272   |   262144   |   9.352 ms |   8.039 ms |  -1312.864 us | -14.04% |
|   284   |   262144   |  10.186 ms |   8.674 ms |  -1511.643 us | -14.84% |
|   300   |   262144   |  10.985 ms |   9.464 ms |  -1520.345 us | -13.84% |
|   32    |  2097152   |   2.123 ms |   2.114 ms |     -9.057 us |  -0.43% |
|   48    |  2097152   |   2.964 ms |   2.966 ms |      1.893 us |   0.06% |
|   64    |  2097152   |   3.635 ms |   3.638 ms |      2.645 us |   0.07% |
|   80    |  2097152   |   4.801 ms |   4.888 ms |     86.769 us |   1.81% |
|   96    |  2097152   |   7.525 ms |   7.547 ms |     22.433 us |   0.30% |
|   112   |  2097152   |  11.871 ms |  11.857 ms |    -13.990 us |  -0.12% |
|   128   |  2097152   |  16.668 ms |  16.709 ms |     40.598 us |   0.24% |
|   144   |  2097152   |  20.803 ms |  21.075 ms |    271.984 us |   1.31% |
|   160   |  2097152   |  25.761 ms |  25.832 ms |     71.071 us |   0.28% |
|   176   |  2097152   |  31.161 ms |  31.113 ms |    -47.389 us |  -0.15% |
|   192   |  2097152   |  36.495 ms |  36.719 ms |    223.964 us |   0.61% |
|   208   |  2097152   |  42.361 ms |  42.323 ms |    -38.315 us |  -0.09% |
|   224   |  2097152   |  48.315 ms |  47.921 ms |   -394.147 us |  -0.82% |
|   240   |  2097152   |  54.518 ms |  53.334 ms |  -1183.929 us |  -2.17% |
|   256   |  2097152   |  61.370 ms |  58.930 ms |  -2440.378 us |  -3.98% |
|   272   |  2097152   |  67.845 ms |  63.620 ms |  -4224.839 us |  -6.23% |
|   284   |  2097152   |  72.688 ms |  67.802 ms |  -4885.784 us |  -6.72% |
|   300   |  2097152   |  78.890 ms |  73.347 ms |  -5542.819 us |  -7.03% |

Also some ASCII-only benchmark results for fun.

ASCII encoded

|  width  |  num_rows  |   Ref Time |   Cmp Time |          Diff |   %Diff |
|---------|------------|------------|------------|---------------|---------|
|   32    |    4096    | 139.478 us | 142.197 us |      2.719 us |   1.95% |
|   48    |    4096    | 161.565 us | 163.331 us |      1.767 us |   1.09% |
|   64    |    4096    | 181.692 us | 183.175 us |      1.483 us |   0.82% |
|   80    |    4096    | 204.902 us | 206.413 us |      1.511 us |   0.74% |
|   96    |    4096    | 228.466 us | 230.612 us |      2.145 us |   0.94% |
|   112   |    4096    | 253.778 us | 254.149 us |      0.371 us |   0.15% |
|   128   |    4096    | 275.308 us | 112.322 us |   -162.985 us | -59.20% |
|   144   |    4096    | 304.149 us | 111.397 us |   -192.751 us | -63.37% |
|   160   |    4096    | 327.739 us | 111.962 us |   -215.777 us | -65.84% |
|   176   |    4096    | 355.701 us | 113.248 us |   -242.453 us | -68.16% |
|   192   |    4096    | 386.597 us | 115.851 us |   -270.746 us | -70.03% |
|   208   |    4096    | 406.453 us | 114.336 us |   -292.117 us | -71.87% |
|   224   |    4096    | 432.067 us | 115.240 us |   -316.827 us | -73.33% |
|   240   |    4096    | 460.123 us | 114.714 us |   -345.409 us | -75.07% |
|   256   |    4096    | 484.789 us | 115.387 us |   -369.402 us | -76.20% |
|   272   |    4096    | 511.011 us | 115.030 us |   -395.981 us | -77.49% |
|   284   |    4096    | 529.764 us | 115.128 us |   -414.636 us | -78.27% |
|   300   |    4096    | 709.508 us | 254.920 us |   -454.588 us | -64.07% |
|   32    |   32768    | 144.803 us | 146.937 us |      2.134 us |   1.47% |
|   48    |   32768    | 323.522 us | 305.653 us |    -17.869 us |  -5.52% |
|   64    |   32768    | 187.701 us | 188.913 us |      1.213 us |   0.65% |
|   80    |   32768    | 353.027 us | 351.905 us |     -1.122 us |  -0.32% |
|   96    |   32768    | 378.826 us | 379.770 us |      0.944 us |   0.25% |
|   112   |   32768    | 406.792 us | 408.756 us |      1.963 us |   0.48% |
|   128   |   32768    | 430.999 us | 433.291 us |      2.293 us |   0.53% |
|   144   |   32768    | 454.199 us | 263.664 us |   -190.534 us | -41.95% |
|   160   |   32768    | 481.146 us | 266.852 us |   -214.294 us | -44.54% |
|   176   |   32768    | 510.508 us | 267.896 us |   -242.612 us | -47.52% |
|   192   |   32768    | 540.338 us | 267.730 us |   -272.608 us | -50.45% |
|   208   |   32768    | 567.150 us | 269.128 us |   -298.023 us | -52.55% |
|   224   |   32768    | 593.716 us | 270.425 us |   -323.291 us | -54.45% |
|   240   |   32768    | 621.767 us | 271.241 us |   -350.526 us | -56.38% |
|   256   |   32768    | 645.887 us | 272.587 us |   -373.300 us | -57.80% |
|   272   |   32768    | 681.750 us | 283.170 us |   -398.580 us | -58.46% |
|   284   |   32768    | 701.165 us | 284.398 us |   -416.767 us | -59.44% |
|   300   |   32768    | 730.564 us | 287.202 us |   -443.362 us | -60.69% |
|   32    |   262144   | 374.509 us | 374.713 us |      0.204 us |   0.05% |
|   48    |   262144   | 466.607 us | 467.701 us |      1.094 us |   0.23% |
|   64    |   262144   | 591.481 us | 611.986 us |     20.505 us |   3.47% |
|   80    |   262144   | 827.996 us | 832.720 us |      4.724 us |   0.57% |
|   96    |   262144   |   1.237 ms |   1.253 ms |     15.853 us |   1.28% |
|   112   |   262144   |   1.801 ms |   1.785 ms |    -16.051 us |  -0.89% |
|   128   |   262144   |   2.388 ms | 595.482 us |  -1792.254 us | -75.06% |
|   144   |   262144   |   2.972 ms | 663.376 us |  -2308.238 us | -77.68% |
|   160   |   262144   |   3.607 ms | 730.779 us |  -2875.985 us | -79.74% |
|   176   |   262144   |   4.345 ms | 793.734 us |  -3551.176 us | -81.73% |
|   192   |   262144   |   5.127 ms | 858.045 us |  -4268.581 us | -83.26% |
|   208   |   262144   |   6.367 ms | 923.390 us |  -5443.978 us | -85.50% |
|   224   |   262144   |   7.250 ms | 988.587 us |  -6261.502 us | -86.36% |
|   240   |   262144   |   8.354 ms |   1.052 ms |  -7301.612 us | -87.40% |
|   256   |   262144   |   9.291 ms |   1.115 ms |  -8175.324 us | -88.00% |
|   272   |   262144   |  10.203 ms |   1.181 ms |  -9021.873 us | -88.43% |
|   284   |   262144   |  10.893 ms |   1.242 ms |  -9650.689 us | -88.60% |
|   300   |   262144   |  11.821 ms |   1.306 ms | -10515.608 us | -88.95% |
|   32    |  2097152   |   2.145 ms |   2.136 ms |     -8.898 us |  -0.41% |
|   48    |  2097152   |   3.021 ms |   3.004 ms |    -17.131 us |  -0.57% |
|   64    |  2097152   |   3.716 ms |   3.668 ms |    -47.992 us |  -1.29% |
|   80    |  2097152   |   4.933 ms |   4.976 ms |     43.159 us |   0.87% |
|   96    |  2097152   |   7.588 ms |   7.620 ms |     32.803 us |   0.43% |
|   112   |  2097152   |  12.018 ms |  12.187 ms |    168.769 us |   1.40% |
|   128   |  2097152   |  17.037 ms |   4.520 ms | -12517.386 us | -73.47% |
|   144   |  2097152   |  21.979 ms |   3.874 ms | -18105.243 us | -82.37% |
|   160   |  2097152   |  27.158 ms |   4.271 ms | -22887.534 us | -84.27% |
|   176   |  2097152   |  32.809 ms |   4.668 ms | -28141.898 us | -85.77% |
|   192   |  2097152   |  38.495 ms |   5.024 ms | -33471.669 us | -86.95% |
|   208   |  2097152   |  44.596 ms |   5.411 ms | -39184.793 us | -87.87% |
|   224   |  2097152   |  50.967 ms |   5.246 ms | -45721.448 us | -89.71% |
|   240   |  2097152   |  58.064 ms |   5.246 ms | -52818.466 us | -90.97% |
|   256   |  2097152   |  65.177 ms |   5.980 ms | -59197.256 us | -90.82% |
|   272   |  2097152   |  72.123 ms |   8.408 ms | -63715.670 us | -88.34% |
|   284   |  2097152   |  77.502 ms |   7.803 ms | -69699.548 us | -89.93% |
|   300   |  2097152   |  85.013 ms |   7.721 ms | -77292.000 us | -90.92% |

@davidwendt davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Apr 21, 2023
@davidwendt davidwendt marked this pull request as ready for review April 24, 2023 21:43
@davidwendt davidwendt requested a review from a team as a code owner April 24, 2023 21:43
@davidwendt davidwendt requested a review from harrism April 24, 2023 21:43
@davidwendt davidwendt requested a review from mythrocks April 24, 2023 21:43
cpp/tests/strings/case_tests.cpp Outdated Show resolved Hide resolved
cpp/benchmarks/string/case.cpp Show resolved Hide resolved
Copy link
Contributor

@karthikeyann karthikeyann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@davidwendt
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 7246a9e into rapidsai:branch-23.06 May 9, 2023
@davidwendt davidwendt deleted the strings-case-perf branch May 9, 2023 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants