Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize LU factorization without pivoting #680

Open
wants to merge 75 commits into
base: develop
Choose a base branch
from

Conversation

EdDAzevedo
Copy link
Contributor

Optimize getrf_npvt (LU factorization without pivoting) by using a block algorithm that is similar to the block algorithm used in Cholesky factorization.

The diagonal block is factored using a specialized kernel that loads the matrix into the 64 Kbytes of LDS shared memory.
Then rocblas TRSM is used to generate a column panel in "L" and row panel in "U".
Then rocblas GEMM is used to update the right unfactored sub-matrix.

The new routines are named as "getf2_nopiv" and "getrf_nopiv" to make minimal changes to the existing code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants