-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tpetra::CrsGraph::getLocalDiagOffsets: Kokkos-ize #212
Comments
Attached here is a patch that gets you part of the way to a fix for this issue. Please apply after the patches for #205, as it depends on those (in particular, it uses findRelOffset). All TpetraCore tests pass with this patch, though ImportExport2_UnitTests takes a really long time (not sure if related at the moment, need to go back and check). 0001-TMP-Tpetra-CrsGraph-getLocalDiagOffsets-Partway-to-2.txt By "part of the way," I mean that every function called in the fillComplete branch is a Kokkos device function, and every Kokkos::View used there is a View of device memory. Thus, the only thing left to do is turn it into a functor and drop it into a parallel_for. The elaborate debug-build error checking could be left out for now. (For extra credit, in a debug build, do some error checking and use a parallel_reduce to find any errors. Omit the std::vector::push_back tracking.) |
@amklinv -- I assigned you provisionally, but please note that this is lower priority than the two-argument version of getLocalDiagCopy. I just want to upload my patches today while I still can. |
The patches in this comment supersede my patch above. The first four patches come from my comments to #205. The last patch implements a Kokkos functor that purports to fix this issue. It builds and passes Kokkos::Serial tests. I have not yet tested it with CUDA. Really, I just want to upload the patches so that they don't get lost over the next few days. 0001-Kokkos-StaticCrsGraph-Add-findRelOffset-see-Trilinos.txt Please note that the patch omits the nice error checking that the old getLocalDiagOffsets implementation did. Most of that could come back by using a |
OK, final set of patches for now. These three patches supersede all of the above patches. I think only the first two are necessary, but the third won't hurt. 0001-Tpetra-Add-findRelOffset-to-fix-205.txt |
My recent push includes all of the above patches, except for 0003-TMP-Tpetra-CrsGraph-findLocalIndex-Use-findRelOffset.txt. That patch wasn't actually a fix; it was an experiment to try replacing the implementation of Tpetra::CrsGraph::findLocalIndex with findRelOffset. The experiment succeeded, but the patch wasn't quite ready for full time yet, and I wanted to get this fix in. |
This blocks #41, but is only a serious blocker if the matrix's graph structure changes frequently.
This depends on #205.
It's OK for
Tpetra::CrsGraph::getLocalDiagOffsets
only to have a full Kokkos parallelization (e.g., on CUDA device) if the graph is fillComplete. If not, either the existing sequential code or a host execution space parallelization suffice. Thus, the code in CrsGraph should branch between these two cases.Ifpack2 should only exercise the fillComplete case, otherwise it's a performance bug for Ifpack2 and possibly a correctness bug for the application. (Giving a non-fillComplete matrix to Ifpack2 is wrong, because one should always fillComplete the matrix before handing it off to a solver. If the matrix has never been fillComplete before, calling fillComplete could change the offsets of the diagonal entries, which is what getLocalDiagOffsets computes.)
Take care to make the whole Kokkos::StaticCrsGraph memory-unmanaged before entering the Kokkos kernel. Otherwise, it will be slow with non-Cuda execution spaces, due to memory management overhead.
The text was updated successfully, but these errors were encountered: