-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite CD solver using more BLAS #4446
Rewrite CD solver using more BLAS #4446
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to see the BLAS both simplifying the logic here and making it faster. Mostly trivial things but it can use a little additional clarity in places where operations have been consolidated.
rerun tests |
rerun tests |
Add a few overloads for raft-CUBLAS `gemv`, `gemm`, `axpy` functions to support switching between host and device pointer mode. This allows passing some of the parameters (constants `alpha`, `beta`) as device pointers, which sometimes improves performance. By default, CUBLAS context is created in the host pointer mode. To keep this presumption, the device pointer mode is enabled only for the time of a particular CUBLAS call. This feature is required for rapidsai/cuml#4446. Authors: - Artem M. Chirkin (https://github.com/achirkin) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: #453
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @achirkin for the PR. While we are here, maybe we can add a few lines of comment that would help reading the code? Otherwise it looks good to me, pre-approving.
rerun tests |
rerun tests |
Codecov Report
@@ Coverage Diff @@
## branch-22.04 #4446 +/- ##
===============================================
Coverage ? 85.73%
===============================================
Files ? 239
Lines ? 19585
Branches ? 0
===============================================
Hits ? 16792
Misses ? 2793
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@gpucibot merge |
Reduce the frequency of device-host data transfers and replace some operations with BLAS axpy/gemv routines. This brings approximately 1.2x-3x speedup against the previous version (more speedup for smaller problem sizes). Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Dante Gama Dessavre (https://github.com/dantegd) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4446
Reduce the frequency of device-host data transfers and replace some operations with BLAS axpy/gemv routines.
This brings approximately 1.2x-3x speedup against the previous version (more speedup for smaller problem sizes).