Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite CD solver using more BLAS #4446

Merged

Conversation

achirkin
Copy link
Contributor

@achirkin achirkin commented Dec 13, 2021

Reduce the frequency of device-host data transfers and replace some operations with BLAS axpy/gemv routines.

This brings approximately 1.2x-3x speedup against the previous version (more speedup for smaller problem sizes).

@achirkin achirkin requested a review from a team as a code owner December 13, 2021 12:39
@achirkin achirkin added 3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed CUDA/C++ labels Dec 13, 2021
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see the BLAS both simplifying the logic here and making it faster. Mostly trivial things but it can use a little additional clarity in places where operations have been consolidated.

cpp/src/solver/cd.cuh Outdated Show resolved Hide resolved
cpp/src/solver/cd.cuh Outdated Show resolved Hide resolved
cpp/src/solver/cd.cuh Outdated Show resolved Hide resolved
cpp/src/solver/cd.cuh Outdated Show resolved Hide resolved
cpp/src/solver/cd.cuh Outdated Show resolved Hide resolved
cpp/src/solver/cd.cuh Outdated Show resolved Hide resolved
cpp/src/solver/cd.cuh Outdated Show resolved Hide resolved
cpp/src/solver/cd.cuh Outdated Show resolved Hide resolved
cpp/src/solver/cd.cuh Show resolved Hide resolved
@cjnolet
Copy link
Member

cjnolet commented Dec 13, 2021

rerun tests

@achirkin achirkin added 2 - In Progress Currenty a work in progress and removed 3 - Ready for Review Ready for review by team labels Dec 14, 2021
@achirkin achirkin added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currenty a work in progress labels Dec 20, 2021
@achirkin achirkin requested a review from a team as a code owner January 20, 2022 08:28
@github-actions github-actions bot added the CMake label Jan 20, 2022
@achirkin
Copy link
Contributor Author

rerun tests

cpp/cmake/thirdparty/get_raft.cmake Outdated Show resolved Hide resolved
@achirkin achirkin changed the base branch from branch-22.02 to branch-22.04 January 25, 2022 06:57
rapids-bot bot pushed a commit to rapidsai/raft that referenced this pull request Feb 4, 2022
Add a few overloads for raft-CUBLAS `gemv`, `gemm`, `axpy` functions to support switching between host and device pointer mode. This allows passing some of the parameters (constants `alpha`, `beta`) as device pointers, which sometimes improves performance.

By default, CUBLAS context is created in the host pointer mode. To keep this presumption, the device pointer mode is enabled only for the time of a particular CUBLAS call.

This feature is required for rapidsai/cuml#4446.

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #453
@github-actions github-actions bot removed the CMake label Feb 5, 2022
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @achirkin for the PR. While we are here, maybe we can add a few lines of comment that would help reading the code? Otherwise it looks good to me, pre-approving.

cpp/src/solver/cd.cuh Show resolved Hide resolved
cpp/src/solver/cd.cuh Show resolved Hide resolved
cpp/src/solver/cd.cuh Show resolved Hide resolved
cpp/src/solver/cd.cuh Show resolved Hide resolved
cpp/src/solver/cd.cuh Show resolved Hide resolved
@achirkin
Copy link
Contributor Author

achirkin commented Feb 8, 2022

rerun tests

@github-actions github-actions bot added the CMake label Feb 8, 2022
@github-actions github-actions bot removed the CMake label Feb 8, 2022
@achirkin
Copy link
Contributor Author

achirkin commented Feb 9, 2022

rerun tests

@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.04@9921c61). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-22.04    #4446   +/-   ##
===============================================
  Coverage                ?   85.73%           
===============================================
  Files                   ?      239           
  Lines                   ?    19585           
  Branches                ?        0           
===============================================
  Hits                    ?    16792           
  Misses                  ?     2793           
  Partials                ?        0           
Flag Coverage Δ
dask 46.18% <0.00%> (?)
non-dask 78.73% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9921c61...35eaef4. Read the comment docs.

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cjnolet
Copy link
Member

cjnolet commented Feb 9, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit f3c1544 into rapidsai:branch-22.04 Feb 9, 2022
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
Reduce the frequency of device-host data transfers and replace some operations with BLAS axpy/gemv routines.

This brings approximately 1.2x-3x speedup against the previous version (more speedup for smaller problem sizes).

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)
  - Dante Gama Dessavre (https://github.com/dantegd)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#4446
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CUDA/C++ improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants