Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of nvtext::edit_distance #13912

Merged
merged 16 commits into from
Aug 30, 2023

Conversation

davidwendt
Copy link
Contributor

Description

Improves performance of nvtext::edit_distance by reworking the algorithm with shorter working buffer and simpler logic.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python) improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Aug 17, 2023
@davidwendt davidwendt self-assigned this Aug 17, 2023
@github-actions github-actions bot added the CMake CMake build issue label Aug 17, 2023
@davidwendt
Copy link
Contributor Author

Benchmark shows improvement of about 2x overall except for longer strings.

## [0] NVIDIA RTX A6000

|  num_rows  |  row_width  |   Ref Time |   Cmp Time |           Diff |   %Diff |
|------------|-------------|------------|------------|----------------|---------|
|    1024    |     32      | 737.645 us | 345.926 us |    -391.719 us | -53.10% |
|    4096    |     32      | 719.351 us | 340.844 us |    -378.507 us | -52.62% |
|    8192    |     32      | 835.842 us | 370.403 us |    -465.439 us | -55.69% |
|   16364    |     32      | 839.622 us | 364.336 us |    -475.286 us | -56.61% |
|   32768    |     32      | 843.410 us | 406.371 us |    -437.039 us | -51.82% |
|   262144   |     32      |   6.593 ms |   9.956 ms |       3.363 ms |  51.00% |
|    1024    |     64      |   2.715 ms |   1.065 ms |   -1650.365 us | -60.78% |
|    4096    |     64      |   2.660 ms |   1.029 ms |   -1630.859 us | -61.31% |
|    8192    |     64      |   3.062 ms |   1.181 ms |   -1880.347 us | -61.41% |
|   16364    |     64      |   3.066 ms |   1.206 ms |   -1859.765 us | -60.66% |
|   32768    |     64      |   3.095 ms |   1.386 ms |   -1709.301 us | -55.23% |
|   262144   |     64      |  80.145 ms |  69.720 ms |  -10425.716 us | -13.01% |
|    1024    |     128     |  11.869 ms |   5.162 ms |   -6706.545 us | -56.51% |
|    4096    |     128     |  10.938 ms |   4.742 ms |   -6196.654 us | -56.65% |
|    8192    |     128     |  12.902 ms |   5.677 ms |   -7225.385 us | -56.00% |
|   16364    |     128     |  12.494 ms |   5.377 ms |   -7117.810 us | -56.97% |
|   32768    |     128     |  12.487 ms |   6.433 ms |   -6053.744 us | -48.48% |
|   262144   |     128     | 502.236 ms | 359.770 ms | -142465.579 us | -28.37% |
|    1024    |     256     |  55.791 ms |  23.461 ms |  -32330.426 us | -57.95% |
|    4096    |     256     |  52.921 ms |  22.539 ms |  -30382.290 us | -57.41% |
|    8192    |     256     |  63.441 ms |  26.256 ms |  -37185.029 us | -58.61% |
|   16364    |     256     |  64.918 ms |  29.500 ms |  -35418.490 us | -54.56% |
|   32768    |     256     |  76.942 ms |  34.649 ms |  -42293.043 us | -54.97% |
|   262144   |     256     |    2.759 s |    1.891 s | -868409.805 us | -31.47% |

@davidwendt davidwendt changed the title Improve performance of nvtext::edit_distance Improve performance of nvtext::edit_distance Aug 21, 2023
@davidwendt davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Aug 21, 2023
@davidwendt davidwendt marked this pull request as ready for review August 21, 2023 23:12
@davidwendt davidwendt requested a review from a team as a code owner August 21, 2023 23:12
cpp/benchmarks/text/edit_distance.cpp Show resolved Hide resolved
cpp/benchmarks/text/edit_distance.cpp Outdated Show resolved Hide resolved
cpp/src/text/edit_distance.cu Outdated Show resolved Hide resolved
cpp/src/text/edit_distance.cu Outdated Show resolved Hide resolved
cpp/src/text/edit_distance.cu Outdated Show resolved Hide resolved
cpp/src/text/edit_distance.cu Outdated Show resolved Hide resolved
cpp/src/text/edit_distance.cu Outdated Show resolved Hide resolved
@davidwendt davidwendt requested a review from bdice August 23, 2023 13:05
@davidwendt davidwendt changed the title Improve performance of nvtext::edit_distance Improve performance of nvtext::edit_distance Aug 24, 2023
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing docs, otherwise looks fine. I'll do a final review pass once docs are added.

cpp/src/text/edit_distance.cu Show resolved Hide resolved
cpp/src/text/edit_distance.cu Outdated Show resolved Hide resolved
cpp/src/text/edit_distance.cu Show resolved Hide resolved
Copy link
Contributor

@karthikeyann karthikeyann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@davidwendt
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 2b5e0fb into rapidsai:branch-23.10 Aug 30, 2023
@davidwendt davidwendt deleted the edit-distance-perf branch August 30, 2023 16:51
rapids-bot bot pushed a commit that referenced this pull request Oct 16, 2023
Fixes a bug in `nvtext::edit_distance_matrix` where the internal offsets vector is initialized to 0. 
This error was introduced in #13912 
The bug was found while working on a different PR which re-ordered the nvtext gtests execution causing device memory to be reused from the rmm pool in a different way.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Mark Harris (https://github.com/harrism)
  - Nghia Truong (https://github.com/ttnghia)

URL: #14283
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants