Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use cooperative-groups instead of cub warp-reduce for strings contains #17540

Merged
merged 1 commit into from
Dec 9, 2024

Conversation

davidwendt
Copy link
Contributor

@davidwendt davidwendt commented Dec 6, 2024

Description

Replaces the cub::WarpReduce usage in cudf::strings::contains with cooperative-groups any().
The change is only for the contains_warp_parallel kernel which is used for wider strings.
Using cooperative-groups generates more efficient code for the same results and gives an additional 11-14% performance improvement.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python) improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Dec 6, 2024
@davidwendt davidwendt self-assigned this Dec 6, 2024
@davidwendt davidwendt requested a review from a team as a code owner December 6, 2024 15:05
@davidwendt davidwendt requested review from shrshi and lamarrr December 6, 2024 15:05
@davidwendt
Copy link
Contributor Author

davidwendt commented Dec 6, 2024

Benchmark results show no change for smaller strings as expected but an 11-14% improvement for wider strings:


## [0] NVIDIA RTX 5880 Ada Generation

| max_width | num_rows | hit_rate |   api    |  target  |   Ref Time |   Cmp Time |        Diff |   %Diff |
|-----------|----------|----------|----------|----------|------------|------------|-------------|---------|
|    32     |  32768   |    20    | contains |  scalar  |  38.918 us |  38.654 us |   -0.264 us |  -0.68% |
|    64     |  32768   |    20    | contains |  scalar  |  64.614 us |  64.554 us |   -0.059 us |  -0.09% |
|    128    |  32768   |    20    | contains |  scalar  |  53.720 us |  49.504 us |   -4.217 us |  -7.85% |
|    256    |  32768   |    20    | contains |  scalar  |  68.812 us |  63.345 us |   -5.467 us |  -7.95% |
|    32     |  262144  |    20    | contains |  scalar  |  50.448 us |  50.143 us |   -0.304 us |  -0.60% |
|    64     |  262144  |    20    | contains |  scalar  |  91.182 us |  90.712 us |   -0.470 us |  -0.52% |
|    128    |  262144  |    20    | contains |  scalar  | 247.065 us | 214.438 us |  -32.627 us | -13.21% |
|    256    |  262144  |    20    | contains |  scalar  | 355.056 us | 313.134 us |  -41.922 us | -11.81% |
|    32     | 2097152  |    20    | contains |  scalar  | 175.870 us | 174.549 us |   -1.321 us |  -0.75% |
|    64     | 2097152  |    20    | contains |  scalar  | 427.138 us | 426.376 us |   -0.763 us |  -0.18% |
|    128    | 2097152  |    20    | contains |  scalar  |   1.791 ms |   1.539 ms | -252.440 us | -14.09% |
|    256    | 2097152  |    20    | contains |  scalar  |   2.638 ms |   2.308 ms | -330.054 us | -12.51% |
|    32     |  32768   |    80    | contains |  scalar  |  37.129 us |  37.174 us |    0.045 us |   0.12% |
|    64     |  32768   |    80    | contains |  scalar  |  63.504 us |  63.180 us |   -0.324 us |  -0.51% |
|    128    |  32768   |    80    | contains |  scalar  |  57.137 us |  52.307 us |   -4.831 us |  -8.45% |
|    256    |  32768   |    80    | contains |  scalar  |  72.910 us |  66.125 us |   -6.786 us |  -9.31% |
|    32     |  262144  |    80    | contains |  scalar  |  53.414 us |  53.214 us |   -0.200 us |  -0.37% |
|    64     |  262144  |    80    | contains |  scalar  |  98.814 us |  98.445 us |   -0.370 us |  -0.37% |
|    128    |  262144  |    80    | contains |  scalar  | 277.457 us | 239.903 us |  -37.554 us | -13.54% |
|    256    |  262144  |    80    | contains |  scalar  | 385.274 us | 334.618 us |  -50.656 us | -13.15% |
|    32     | 2097152  |    80    | contains |  scalar  | 210.464 us | 208.620 us |   -1.845 us |  -0.88% |
|    64     | 2097152  |    80    | contains |  scalar  | 412.716 us | 411.962 us |   -0.754 us |  -0.18% |
|    128    | 2097152  |    80    | contains |  scalar  |   2.032 ms |   1.743 ms | -289.033 us | -14.22% |
|    256    | 2097152  |    80    | contains |  scalar  |   2.875 ms |   2.477 ms | -397.693 us | -13.83% |


Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loos great! 🔥

@davidwendt
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit a0fc6a8 into rapidsai:branch-25.02 Dec 9, 2024
114 of 115 checks passed
@davidwendt davidwendt deleted the cg-contains-warp branch December 9, 2024 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants