You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our cuda implementation of the band matrix solve currently accesses global memory on the backward and forward solve, which is slow. We should at least use some form of local memory (MArray/CuStaticSharedMemory) to improve memory access speed.
The text was updated successfully, but these errors were encountered:
Our cuda implementation of the band matrix solve currently accesses global memory on the backward and forward solve, which is slow. We should at least use some form of local memory (
MArray
/CuStaticSharedMemory
) to improve memory access speed.The text was updated successfully, but these errors were encountered: