Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tpetra::CrsGraph::findLocalIndex: Port to use Kokkos #205

Closed
mhoemmen opened this issue Mar 11, 2016 · 14 comments
Closed

Tpetra::CrsGraph::findLocalIndex: Port to use Kokkos #205

mhoemmen opened this issue Mar 11, 2016 · 14 comments

Comments

@mhoemmen
Copy link
Contributor

@trilinos/tpetra @amklinv

This blocks #41.
It may also help #118.

Tpetra::CrsGraph::findLocalIndex takes the column indices in a row of a CrsGraph / CrsMatrix, and a column index. If the column index exists in the row, it returns the relative offset of that column index. Otherwise, it returns a flag value. There are other details (e.g., the search hint) but that's the essence.

We want to make this fully Kokkos-ized -- we want this functionality, marked as a Kokkos device function, so that we can use it in Kokkos::parallel_*.

There is nothing in this method that requires knowing anything about a sparse graph or matrix! It's just search in an array. Thus, if we want to Kokkos-ize this fully, it shouldn't even be an instance method of KokkosSparse::CrsMatrix, nor does it need to take the matrix as an input argument. I don't want to add more instance methods to classes, especially not public ones.

Doing this will get us most of the way to finishing #41. It should also start to address #118, in that it will help us change Tpetra so that dispatch from replace / sumInto to the various storage options will happen at the top level, making the lower-level search code less general and therefore possibly faster.

@mhoemmen
Copy link
Contributor Author

I imagine the function looking like this:

template<class OffsetType, class IndexViewType>
KOKKOS_INLINE_FUNCTION OffsetType
findRelOffset (const OffsetType numEnt, typename IndexViewType::const_value_type indToFind, const IndexViewType& indsToSearch, const OffsetType hint, const bool isSorted)
{
  static_assert (Kokkos::is_view<IndexViewType>::value, "IndexViewType must be a Kokkos::View");
  static_assert (static_cast<int> (IndexViewType::rank) == 1, "IndexViewType must be a rank-1 Kokkos::View");
  static_assert (std::is_integral<OffsetType>::value, "OffsetType must be an integer.");

  // ... implementation follows ...
}

OffsetType corresponds to size_t in the current Tpetra::CrsGraph::findLocalIndex. However, we could use a type the same size as LocalOrdinal there, because offsets are relative to the row (different than the row offsets type in the ptr array of CSR, since those are absolute offsets). We can safely assume that even if there are duplicates, there won't be enough of them to require a larger relative offset type.

numEnt (number of entries in the input array to search) avoids the (small) overhead of taking subviews.

We need isSorted so that the method can dispatch to binary or linear search as appropriate. Tpetra::CrsGraph::findLocalIndex can call CrsGraph::isSorted; this function is not an instance method so it can't.

@mhoemmen
Copy link
Contributor Author

Sufficient C++ cleverness could let us write one function that works for either a rank-1 Kokkos::View or a raw array. For example, const typename std::decay<indsToSearch[0]>::type should have the same type as typename IndexViewType::const_value_type, but works even if indsToSearch is a raw array instead of a Kokkos::View. However, don't worry about this for now.

@mhoemmen mhoemmen added this to the Tpetra BCRS Kokkos-ization milestone Mar 11, 2016
@mhoemmen
Copy link
Contributor Author

It would also make sense for the function to accept SparseRowView or SparseRowViewConst (see tpetra/kernels/src/Kokkos_Sparse_CrsMatrix.hpp). Those are little structs that also have operator() and behave a bit like an unmanaged rank-1 Kokkos::View. Their main advantage is that they work for either CSR or ELLPACK storage.

@mhoemmen
Copy link
Contributor Author

Here is a first-pass implementation that passes TpetraCore's tests. It only does linear search for now. Formatting is a bit messed up but you'll get the idea.

    template<class OffsetType, class IndexViewType>
    KOKKOS_INLINE_FUNCTION OffsetType
    findRelOffset (const OffsetType numEnt, 
           typename IndexViewType::const_value_type indToFind, 
           const IndexViewType& indsToSearch, 
           const OffsetType hint,
           const bool /* isSorted */)
    {
      static_assert (Kokkos::is_view<IndexViewType>::value, 
             "IndexViewType must be a Kokkos::View");
      static_assert (static_cast<int> (IndexViewType::rank) == 1, 
             "IndexViewType must be a rank-1 Kokkos::View");
      static_assert (std::is_integral<OffsetType>::value, 
             "OffsetType must be an integer.");

      if (hint < numEnt && indsToSearch[hint] == indToFind) {
    return hint; // hint was correct
      }

      // Start with simple linear search for now.
      for (OffsetType k = 0; k < numEnt; ++k) {
    if (indsToSearch[k] == indToFind) {
      return k;
    }
      }
      return Tpetra::Details::OrdinalTraits<OffsetType>::invalid ();
    }

@mhoemmen
Copy link
Contributor Author

I don't like returning Tpetra::Details::OrdinalTraits<OffsetType>::invalid() -- yet another dependency for users. C++ Standard Library convention is for search functions (e.g., std::find) to return an iterator to the "end of the sequence." In this case, that would mean returning numEnt, since indsToSearch.ptr_on_device() + numEnt is one past the last valid entry to search.

@mhoemmen
Copy link
Contributor Author

Ok, changing it to return numEnt if not found also works, given appropriate changes to Tpetra::CrsGraph::findLocalIndex. (I'm making findLocalIndex call this function, mainly to test it informally.)

@mhoemmen
Copy link
Contributor Author

Hm, this thing actually might belong in KokkosContainers, since Kokkos::StaticCrsGraph does. @crtrott , how do you feel about that?

@mhoemmen
Copy link
Contributor Author

@mhoemmen
Copy link
Contributor Author

The following four patches supersede the patches I posted above. They include a test for the new function (which builds and passes, though I haven't tested CUDA yet) and an optimization for sorted arrays (it uses binary search in that case). The latter means that performance should be comparable to that of Tpetra::CrsGraph::findLocalIndex or Epetra_CrsGraph::FindMyIndexLoc.

0001-Kokkos-StaticCrsGraph-Add-findRelOffset-see-Trilinos.txt
0002-Tpetra-CrsGraph-Add-test-for-new-findRelOffset-see-2.txt
0003-TMP-Tpetra-CrsGraph-findLocalIndex-Use-findRelOffset.txt
0004-Kokkos-StaticCrsGraph-Add-binary-search-to-findRelOf.txt

@mhoemmen
Copy link
Contributor Author

In case it's not clear: We really only need to dispatch this kernel to the (CUDA) device if the matrix is fillComplete. If it's not, you can use the existing host execution space code path. UVM will take care of the StaticProfile-but-not-fillComplete case. Thus, the method needs a branch: If fillComplete, get out the Kokkos widgets and play with them on device; else, use the existing host execution space code.

For extra credit, it's still possible to dispatch to the device if the matrix is StaticProfile. DynamicProfile won't get far; column indices are stored as ArrayRCP<Array<LocalOrdinal> >, for example.

@mhoemmen
Copy link
Contributor Author

@crtrott -- the reason I think this might belong with Kokkos::StaticCrsGraph in KokkosContainers, is because Tpetra::CrsGraph needs it for getLocalDiagOffsets. Ifpack2::Relaxation first calls Tpetra::CrsGraph::getLocalDiagOffsets to get the offsets of the diagonal entries. It then stores them. If the structure of the matrix doesn't change, it can reuse them, for example in the two-argument version of Tpetra::CrsMatrix::getLocalDiagCopy.

What do you think?

@mhoemmen
Copy link
Contributor Author

Note: This is a lower priority than #41, for two-argument getLocalDiagCopy.

@mhoemmen
Copy link
Contributor Author

Last patch for tonight. This supersedes all of my above patches. I use this patch in #212. @crtrott -- I decided not to fiddle with KokkosContainers for now, and moved findRelOffset into TpetraCore.

0001-Tpetra-Add-findRelOffset-to-fix-205.txt

mhoemmen pushed a commit that referenced this issue Mar 17, 2016
@trilinos/tpetra Add a test for the new findRelOffset function (see
exercise short arrays; this new test will exercise longer arrays.
(findRelOffset may optimize for short arrays by using linear search; the
new test should bypass that case.)
mhoemmen pushed a commit that referenced this issue Mar 17, 2016
@trilinos/tpetra Change findRelOffset test so it only uses 1 MPI
process in an MPI build.  It doesn't need more than 1 MPI process.

Build/Test Cases Summary
Enabled Packages: TpetraCore
0) MPI_DEBUG => passed: passed=85,notpassed=0 (8.40 min)
1) SERIAL_RELEASE => passed: passed=66,notpassed=0 (10.47 min)
Other local commits for this build/test group: fae30fa, 8568f55, ec4a3fa
@mhoemmen mhoemmen removed the stage: in progress Work on the issue has started label Mar 17, 2016
@mhoemmen mhoemmen assigned mhoemmen and unassigned amklinv Mar 17, 2016
@mhoemmen
Copy link
Contributor Author

My recent push includes the above patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants