Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework cudf::replace_nulls to use strings::detail::copy_if_else #15286

Merged
merged 6 commits into from
Apr 2, 2024

Conversation

davidwendt
Copy link
Contributor

Description

Removes the specialized kernels for strings in cudf::replace_nulls and replaces them with a call to cudf::strings::detail::copy_if_else which is already enabled with offsetalator support and optimized for long strings.
This will also allow cudf::replace_nulls to use large strings with no further changes.
Also includes a replace_nulls benchmark for strings.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 13, 2024
@davidwendt davidwendt self-assigned this Mar 13, 2024
@github-actions github-actions bot added the CMake CMake build issue label Mar 13, 2024
@davidwendt davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Mar 13, 2024
@davidwendt davidwendt changed the base branch from branch-24.04 to branch-24.06 March 18, 2024 19:26
@davidwendt davidwendt marked this pull request as ready for review March 19, 2024 14:56
@davidwendt davidwendt requested a review from a team as a code owner March 19, 2024 14:56
@ttnghia
Copy link
Contributor

ttnghia commented Mar 20, 2024

Can you post some benchmark number comparing before vs after this change please?

@davidwendt
Copy link
Contributor Author

Performance benchmark for this change.

NVIDIA RTX A6000

| row_width |  num_rows |   Ref Time |   Cmp Time |           Diff |   %Diff |
|-----------|-----------|------------|------------|----------------|---------|
|    32     |    32768  |  72.199 us |  94.023 us |      21.825 us |  30.23% |
|    64     |    32768  |  86.534 us | 111.229 us |      24.695 us |  28.54% |
|    128    |    32768  | 127.262 us | 149.321 us |      22.059 us |  17.33% |
|    256    |    32768  | 226.015 us |  90.125 us |    -135.890 us | -60.12% |
|    512    |    32768  | 489.888 us | 101.352 us |    -388.537 us | -79.31% |
|   1024    |    32768  |   1.103 ms | 126.929 us |    -976.487 us | -88.50% |
|   2048    |    32768  |   2.360 ms | 176.211 us |   -2183.316 us | -92.53% |
|    32     |   262144  | 112.676 us | 164.440 us |      51.765 us |  45.94% |
|    64     |   262144  | 221.317 us | 304.757 us |      83.440 us |  37.70% |
|    128    |   262144  |   1.706 ms |   1.907 ms |     201.129 us |  11.79% |
|    256    |   262144  |   5.409 ms | 220.857 us |   -5188.351 us | -95.92% |
|    512    |   262144  |  14.826 ms | 303.993 us |  -14521.897 us | -97.95% |
|   1024    |   262144  |  35.427 ms | 499.784 us |  -34927.009 us | -98.59% |
|   2048    |   262144  |  80.428 ms | 891.051 us |  -79536.536 us | -98.89% |
|    32     |  2097152  | 461.374 us | 638.544 us |     177.169 us |  38.40% |
|    64     |  2097152  |   1.243 ms |   1.463 ms |     220.090 us |  17.70% |
|    128    |  2097152  |  10.630 ms |  13.658 ms |       3.028 ms |  28.49% |
|    256    |  2097152  |  44.534 ms |   1.178 ms |  -43356.375 us | -97.36% |
|    512    |  2097152  | 121.787 ms |   1.894 ms | -119892.600 us | -98.44% |
|    32     | 16777216  |   3.295 ms |   4.492 ms |       1.198 ms |  36.35% |
|    64     | 16777216  |   9.511 ms |  11.068 ms |       1.556 ms |  16.37% |

This result is mainly from the gather logic now.
Mostly improvements on long strings -- some close to 90x.
Shorter strings are a bit slower by a few milliseconds.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me. Very nice performance improvements, and the kernels are simpler.

@davidwendt
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 13a5c7b into rapidsai:branch-24.06 Apr 2, 2024
75 checks passed
@davidwendt davidwendt deleted the replace-nulls-ls branch April 2, 2024 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants