Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tpetra::CrsGraph::getLocalDiagOffsets: Kokkos-ize #212

Closed
mhoemmen opened this issue Mar 12, 2016 · 5 comments
Closed

Tpetra::CrsGraph::getLocalDiagOffsets: Kokkos-ize #212

mhoemmen opened this issue Mar 12, 2016 · 5 comments

Comments

@mhoemmen
Copy link
Contributor

This blocks #41, but is only a serious blocker if the matrix's graph structure changes frequently.
This depends on #205.

It's OK for Tpetra::CrsGraph::getLocalDiagOffsets only to have a full Kokkos parallelization (e.g., on CUDA device) if the graph is fillComplete. If not, either the existing sequential code or a host execution space parallelization suffice. Thus, the code in CrsGraph should branch between these two cases.

Ifpack2 should only exercise the fillComplete case, otherwise it's a performance bug for Ifpack2 and possibly a correctness bug for the application. (Giving a non-fillComplete matrix to Ifpack2 is wrong, because one should always fillComplete the matrix before handing it off to a solver. If the matrix has never been fillComplete before, calling fillComplete could change the offsets of the diagonal entries, which is what getLocalDiagOffsets computes.)

Take care to make the whole Kokkos::StaticCrsGraph memory-unmanaged before entering the Kokkos kernel. Otherwise, it will be slow with non-Cuda execution spaces, due to memory management overhead.

@mhoemmen mhoemmen added this to the Tpetra BCRS Kokkos-ization milestone Mar 12, 2016
@mhoemmen
Copy link
Contributor Author

Attached here is a patch that gets you part of the way to a fix for this issue. Please apply after the patches for #205, as it depends on those (in particular, it uses findRelOffset). All TpetraCore tests pass with this patch, though ImportExport2_UnitTests takes a really long time (not sure if related at the moment, need to go back and check).

0001-TMP-Tpetra-CrsGraph-getLocalDiagOffsets-Partway-to-2.txt

By "part of the way," I mean that every function called in the fillComplete branch is a Kokkos device function, and every Kokkos::View used there is a View of device memory. Thus, the only thing left to do is turn it into a functor and drop it into a parallel_for. The elaborate debug-build error checking could be left out for now. (For extra credit, in a debug build, do some error checking and use a parallel_reduce to find any errors. Omit the std::vector::push_back tracking.)

@mhoemmen mhoemmen assigned mhoemmen and amklinv and unassigned mhoemmen Mar 12, 2016
@mhoemmen
Copy link
Contributor Author

@amklinv -- I assigned you provisionally, but please note that this is lower priority than the two-argument version of getLocalDiagCopy. I just want to upload my patches today while I still can.

@mhoemmen
Copy link
Contributor Author

The patches in this comment supersede my patch above. The first four patches come from my comments to #205. The last patch implements a Kokkos functor that purports to fix this issue. It builds and passes Kokkos::Serial tests. I have not yet tested it with CUDA. Really, I just want to upload the patches so that they don't get lost over the next few days.

0001-Kokkos-StaticCrsGraph-Add-findRelOffset-see-Trilinos.txt
0002-Tpetra-CrsGraph-Add-test-for-new-findRelOffset-see-2.txt
0003-TMP-Tpetra-CrsGraph-findLocalIndex-Use-findRelOffset.txt
0004-Kokkos-StaticCrsGraph-Add-binary-search-to-findRelOf.txt
0005-Tpetra-CrsGraph-getLocalDiagOffsets-Fix-212.txt

Please note that the patch omits the nice error checking that the old getLocalDiagOffsets implementation did. Most of that could come back by using a Kokkos::parallel_reduce over rows (to find that something bad happened somewhere, e.g., that some row lacked a diagonal entry), instead of a Kokkos::parallel_for.

@mhoemmen mhoemmen added the stage: in progress Work on the issue has started label Mar 12, 2016
@mhoemmen
Copy link
Contributor Author

OK, final set of patches for now. These three patches supersede all of the above patches. I think only the first two are necessary, but the third won't hurt.

0001-Tpetra-Add-findRelOffset-to-fix-205.txt
0002-Tpetra-CrsGraph-getLocalDiagOffsets-Fix-212.txt
0003-TMP-Tpetra-CrsGraph-findLocalIndex-Use-findRelOffset.txt

@mhoemmen
Copy link
Contributor Author

My recent push includes all of the above patches, except for 0003-TMP-Tpetra-CrsGraph-findLocalIndex-Use-findRelOffset.txt. That patch wasn't actually a fix; it was an experiment to try replacing the implementation of Tpetra::CrsGraph::findLocalIndex with findRelOffset. The experiment succeeded, but the patch wasn't quite ready for full time yet, and I wanted to get this fix in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants