Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flexible block size for SYEVJ/HEEVJ #859

Open
wants to merge 51 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
9c359d4
use Amat, Jmat
EdDAzevedo Oct 30, 2024
2fa6a51
change order of Amat, Jmat
EdDAzevedo Oct 30, 2024
7df5605
snapshot before changing args to offd_rotate
EdDAzevedo Oct 30, 2024
5a565f4
use Jsh shared memory
EdDAzevedo Oct 30, 2024
39d9de0
development snapshot
EdDAzevedo Nov 2, 2024
6592be0
debug snapshot
EdDAzevedo Nov 3, 2024
fddb33c
option to use Jsh in offd_rotate
EdDAzevedo Nov 4, 2024
4010bf9
use lmemsizeOR
EdDAzevedo Nov 4, 2024
961aed2
change launch config for offd_rotate
EdDAzevedo Nov 4, 2024
cee54d6
snapshot use Jsh for offd_kernel
EdDAzevedo Nov 4, 2024
68b6a56
generalized offd_kernel
EdDAzevedo Nov 4, 2024
1c24e71
update diag_rotate
EdDAzevedo Nov 5, 2024
e524af1
minor update to comments
EdDAzevedo Nov 5, 2024
39f508e
debug snapshot
EdDAzevedo Nov 6, 2024
5518be2
update gridOR, threadsOR for offd_rotate
EdDAzevedo Nov 6, 2024
01a11a5
Merge branch 'develop_opt_syevj' into shmem_develop_opt_syevj
EdDAzevedo Nov 6, 2024
1db4e2f
Merge remote-tracking branch 'rocm/develop' into develop_opt_syevj
EdDAzevedo Nov 6, 2024
10ec074
add option to use original offd_rotate()
EdDAzevedo Nov 6, 2024
7b50310
Merge branch 'develop_opt_syevj' into shmem_develop_opt_syevj
EdDAzevedo Nov 6, 2024
55f3759
debug snapshot
EdDAzevedo Nov 8, 2024
84d4a1e
use original algorithm for small problems
EdDAzevedo Nov 8, 2024
3055cc3
Merge branch 'develop_opt_syevj' into shmem_develop_opt_syevj
EdDAzevedo Nov 8, 2024
10fe93f
debug snapshot
EdDAzevedo Nov 9, 2024
8902cf7
debug snapshot
EdDAzevedo Nov 10, 2024
1aa380a
debug snapshot
EdDAzevedo Nov 12, 2024
838058a
debug snapshot
EdDAzevedo Nov 18, 2024
10db100
update idx2D for int32_t, add commonly used ceil function
EdDAzevedo Nov 18, 2024
f8ac8f7
use std::min, std::max, remove lambda idx2D
EdDAzevedo Nov 18, 2024
b86cea9
Merge branch 'rocm_develop' into develop_opt_syevj
EdDAzevedo Nov 18, 2024
5e3d7a7
remove host idx2D
EdDAzevedo Nov 18, 2024
dca8ec6
Merge branch 'develop_opt_syevj' into shmem_develop_opt_syevj
EdDAzevedo Nov 20, 2024
42d1783
minor update for offd_kernel
EdDAzevedo Nov 20, 2024
e4831c8
update with improvement to syevj
EdDAzevedo Nov 20, 2024
3eca403
Merge branch 'develop_opt_syevj' into shmem_develop_opt_syevj
EdDAzevedo Nov 20, 2024
499f8b8
Merge branch 'develop_opt_syevj' into smallcase_develop_opt_syevj
EdDAzevedo Nov 20, 2024
e0743bc
snapshot update offd_kernel
EdDAzevedo Nov 20, 2024
753b54e
development snapshot
EdDAzevedo Nov 21, 2024
02af9f8
remove lambda for ceil()
EdDAzevedo Nov 21, 2024
ca9e29d
add __device__ for ceil()
EdDAzevedo Nov 21, 2024
71e7fc3
Merge branch 'develop_opt_syevj' into smallcase_develop_opt_syevj
EdDAzevedo Nov 21, 2024
ad69158
debug snapshot
EdDAzevedo Nov 22, 2024
6c44765
adjust lmem size for offd_kernel
EdDAzevedo Nov 23, 2024
d278b17
simplified code
EdDAzevedo Nov 23, 2024
cb59565
minor update to access A directly instead of using Amat
EdDAzevedo Nov 24, 2024
f5c5bbf
rearrange loops to reduce LDS bank conflict
EdDAzevedo Nov 25, 2024
4cac896
Merge branch 'develop_opt_syevj' into shmem_develop_opt_syevj
EdDAzevedo Nov 25, 2024
1c67d3e
correct bug in diag_kernel
EdDAzevedo Nov 25, 2024
f13a6af
option to use LDS in diag_kernel
EdDAzevedo Nov 25, 2024
8fb27f7
Merge branch 'develop_opt_syevj' into flex_nb_syevj
EdDAzevedo Nov 26, 2024
1046509
flexible nb_max
EdDAzevedo Nov 26, 2024
f03dec9
clang format
EdDAzevedo Nov 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@ Full documentation for rocSOLVER is available at the [rocSOLVER documentation](h
* Improved the performance of SYEVJ
* Improved the performance of GEQRF

### Resolved issues
### Known issues
### Upcoming changes


## rocSOLVER 3.27.0 for ROCm 6.3.0

Expand Down
Loading
Loading