-
Notifications
You must be signed in to change notification settings - Fork 575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tpetra::CrsGraph::findLocalIndex: Port to use Kokkos #205
Comments
I imagine the function looking like this:
We need |
Sufficient C++ cleverness could let us write one function that works for either a rank-1 Kokkos::View or a raw array. For example, |
It would also make sense for the function to accept SparseRowView or SparseRowViewConst (see tpetra/kernels/src/Kokkos_Sparse_CrsMatrix.hpp). Those are little structs that also have operator() and behave a bit like an unmanaged rank-1 Kokkos::View. Their main advantage is that they work for either CSR or ELLPACK storage. |
Here is a first-pass implementation that passes TpetraCore's tests. It only does linear search for now. Formatting is a bit messed up but you'll get the idea.
|
I don't like returning |
Ok, changing it to return |
Hm, this thing actually might belong in KokkosContainers, since Kokkos::StaticCrsGraph does. @crtrott , how do you feel about that? |
Here are two patches to try. Please read commit messages. 0001-Kokkos-StaticCrsGraph-Add-findRelOffset-see-Trilinos.txt |
The following four patches supersede the patches I posted above. They include a test for the new function (which builds and passes, though I haven't tested CUDA yet) and an optimization for sorted arrays (it uses binary search in that case). The latter means that performance should be comparable to that of Tpetra::CrsGraph::findLocalIndex or Epetra_CrsGraph::FindMyIndexLoc. 0001-Kokkos-StaticCrsGraph-Add-findRelOffset-see-Trilinos.txt |
In case it's not clear: We really only need to dispatch this kernel to the (CUDA) device if the matrix is fillComplete. If it's not, you can use the existing host execution space code path. UVM will take care of the StaticProfile-but-not-fillComplete case. Thus, the method needs a branch: If fillComplete, get out the Kokkos widgets and play with them on device; else, use the existing host execution space code. For extra credit, it's still possible to dispatch to the device if the matrix is StaticProfile. DynamicProfile won't get far; column indices are stored as |
@crtrott -- the reason I think this might belong with What do you think? |
Note: This is a lower priority than #41, for two-argument getLocalDiagCopy. |
@trilinos/tpetra Add a test for the new findRelOffset function (see exercise short arrays; this new test will exercise longer arrays. (findRelOffset may optimize for short arrays by using linear search; the new test should bypass that case.)
@trilinos/tpetra Change findRelOffset test so it only uses 1 MPI process in an MPI build. It doesn't need more than 1 MPI process. Build/Test Cases Summary Enabled Packages: TpetraCore 0) MPI_DEBUG => passed: passed=85,notpassed=0 (8.40 min) 1) SERIAL_RELEASE => passed: passed=66,notpassed=0 (10.47 min) Other local commits for this build/test group: fae30fa, 8568f55, ec4a3fa
My recent push includes the above patch. |
@trilinos/tpetra @amklinv
This blocks #41.
It may also help #118.
Tpetra::CrsGraph::findLocalIndex takes the column indices in a row of a CrsGraph / CrsMatrix, and a column index. If the column index exists in the row, it returns the relative offset of that column index. Otherwise, it returns a flag value. There are other details (e.g., the search hint) but that's the essence.
We want to make this fully Kokkos-ized -- we want this functionality, marked as a Kokkos device function, so that we can use it in Kokkos::parallel_*.
There is nothing in this method that requires knowing anything about a sparse graph or matrix! It's just search in an array. Thus, if we want to Kokkos-ize this fully, it shouldn't even be an instance method of KokkosSparse::CrsMatrix, nor does it need to take the matrix as an input argument. I don't want to add more instance methods to classes, especially not public ones.
Doing this will get us most of the way to finishing #41. It should also start to address #118, in that it will help us change Tpetra so that dispatch from replace / sumInto to the various storage options will happen at the top level, making the lower-level search code less general and therefore possibly faster.
The text was updated successfully, but these errors were encountered: