Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of copy_if_else for long strings #15017

Merged
merged 4 commits into from
Feb 14, 2024

Conversation

davidwendt
Copy link
Contributor

@davidwendt davidwendt commented Feb 9, 2024

Description

Reworks the cudf::strings::detail::copy_if_else() to improve performance for long strings. The rework builds a vector of rows to pass to the make_strings_column factory that uses the optimized gather_chars function.
Also includes a benchmark for copy_if_else specifically for strings columns.

Closes #15014

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python) improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 9, 2024
@davidwendt davidwendt self-assigned this Feb 9, 2024
@github-actions github-actions bot added the CMake CMake build issue label Feb 9, 2024
@davidwendt
Copy link
Contributor Author

Benchmark results show significant improvement for long strings.

# copy_if_else

## [0] Quadro GV100

|  row_width  |  num_rows  |   Ref Time |   Cmp Time |           Diff |   %Diff |
|-------------|------------|------------|------------|----------------|---------|
|     32      |    4096    |  81.672 us |  92.342 us |      10.670 us |  13.06% |
|     64      |    4096    |  83.276 us |  96.514 us |      13.239 us |  15.90% |
|     128     |    4096    | 108.554 us | 123.205 us |      14.650 us |  13.50% |
|     256     |    4096    | 176.912 us |  74.858 us |    -102.054 us | -57.69% |
|     512     |    4096    | 329.840 us |  76.111 us |    -253.728 us | -76.92% |
|    1024     |    4096    | 621.637 us |  80.308 us |    -541.328 us | -87.08% |
|    2048     |    4096    |   1.194 ms |  87.369 us |   -1107.081 us | -92.69% |
|    4096     |    4096    |   2.349 ms | 102.037 us |   -2246.858 us | -95.66% |
|     32      |   32768    |  78.397 us |  93.497 us |      15.100 us |  19.26% |
|     64      |   32768    | 100.368 us | 114.416 us |      14.047 us |  14.00% |
|     128     |   32768    | 149.653 us | 164.220 us |      14.567 us |   9.73% |
|     256     |   32768    | 268.554 us |  99.805 us |    -168.749 us | -62.84% |
|     512     |   32768    | 487.175 us | 109.880 us |    -377.295 us | -77.45% |
|    1024     |   32768    | 928.489 us | 134.407 us |    -794.082 us | -85.52% |
|    2048     |   32768    |   1.803 ms | 191.322 us |   -1611.378 us | -89.39% |
|    4096     |   32768    |   3.517 ms | 306.826 us |   -3210.370 us | -91.28% |
|     32      |   262144   | 147.473 us | 181.905 us |      34.432 us |  23.35% |
|     64      |   262144   | 299.250 us | 340.482 us |      41.232 us |  13.78% |
|     128     |   262144   |   1.736 ms |   1.882 ms |     145.620 us |   8.39% |
|     256     |   262144   |   8.997 ms | 315.059 us |   -8681.495 us | -96.50% |
|     512     |   262144   |  25.837 ms | 390.189 us |  -25446.713 us | -98.49% |
|    1024     |   262144   |  57.680 ms | 583.066 us |  -57097.086 us | -98.99% |
|    2048     |   262144   | 127.125 ms |   1.025 ms | -126099.652 us | -99.19% |
|    4096     |   262144   | 259.757 ms |   1.956 ms | -257800.617 us | -99.25% |
|     32      |  2097152   | 672.168 us | 841.930 us |     169.763 us |  25.26% |
|     64      |  2097152   |   1.826 ms |   2.008 ms |     182.080 us |   9.97% |
|     128     |  2097152   |  16.107 ms |  18.494 ms |       2.387 ms |  14.82% |
|     256     |  2097152   |  79.610 ms |   1.784 ms |  -77825.162 us | -97.76% |
|     512     |  2097152   | 222.464 ms |   2.418 ms | -220045.145 us | -98.91% |
|     32      |  16777216  |   4.860 ms |   5.986 ms |       1.126 ms |  23.17% |
|     64      |  16777216  |  13.876 ms |  15.054 ms |       1.178 ms |   8.49% |

Small strings suffer a bit but most are within 1-2 ms of the baseline.

@revans2
Copy link
Contributor

revans2 commented Feb 9, 2024

This looks great thanks so much!

@davidwendt davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Feb 12, 2024
@davidwendt davidwendt marked this pull request as ready for review February 12, 2024 22:04
@davidwendt davidwendt requested a review from a team as a code owner February 12, 2024 22:04
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice! It also makes the code simpler.

cpp/benchmarks/string/copy_if_else.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great change: higher throughput, less code!
Some minor suggestions.

cpp/benchmarks/string/copy_if_else.cpp Outdated Show resolved Hide resolved
cpp/include/cudf/strings/detail/copy_if_else.cuh Outdated Show resolved Hide resolved
@davidwendt davidwendt requested a review from vuule February 14, 2024 15:06
@davidwendt
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit f43f7c5 into rapidsai:branch-24.04 Feb 14, 2024
69 checks passed
@davidwendt davidwendt deleted the perf-copy-if-else branch February 14, 2024 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Improve performance of copy_if_else for long strings
4 participants