Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tpetra: create CrsGraph transferAndFillComplete #2267

Closed
tjfulle opened this issue Feb 20, 2018 · 24 comments
Closed

Tpetra: create CrsGraph transferAndFillComplete #2267

tjfulle opened this issue Feb 20, 2018 · 24 comments

Comments

@tjfulle
Copy link
Contributor

tjfulle commented Feb 20, 2018

CrsGraph version of transferAndFillComplete is required to complete Type 2 finite element assembly.

@trilinos/tpetra

@mhoemmen
Copy link
Contributor

Thanks @tjfulle ! Clarification: "Type 2" means "Export of CrsGraph from shared to owned," where elements (in the sense of the finite-element method) are uniquely owned but degrees of freedom are not. Currently, if a CrsGraph has StaticProfile, and if the graph is an Export or Import target, the graph forbids exceeding allocated storage. This is bad because users may not know on the receiving process, at target CrsGraph creation time, how many edges each row needs to receive from the source CrsGraph.

There are two ways to fix this:

  1. Relax the above constraint.
  2. Extend transferAndFillComplete to work with CrsGraph.

The second approach has the disadvantage of requiring duplicated memory for owned rows, since the target object must be separate from the source object in a transferAndFillComplete operation. However, it may be easier to adapt that code path. Also, there is an outstanding issue requesting this transferAndFillComplete feature for CrsGraph: #79

@tjfulle
Copy link
Contributor Author

tjfulle commented Feb 27, 2018

@mhoemmen and I talked about this issue yesterday over the phone and I'll try to summarize the conversation. @mhoemmen, please feel free to edit to fix any mistakes or to further clarify.

For CrsGraph we would like to implement a hybrid of the two approaches Mark outlined above. The canonical graph construction would follow as:

  1. Create and fill a graph with owned indices
  2. Create and fill a graph with shared indices
  3. Export from shared graph to owned graph. The owned graph will be fill complete at this stage and contain owned, shared, and reachable edge information. If the owned graph does not have enough storage for new edges communicated in this stage, storage will be expanded (transparent to user).

The interface for the exportAndFillComplete function required for step 3:

void
exportAndFillCompleteCrsGraph(Teuchos::RCP<graph_type>& targetGraph,
    const graph_type& sourceGraph,
    const export_type& export,
    const map_type& domainMap,
    const map_type& rangeMap);
  • If targetGraph is null on input, a new CrsGraph which is the result of the Export will be created. Otherwise, targetGraph must not be fill complete as it will be modified in place. targetGraph will be returned fill complete.

  • domainMap is an optional input and defaults to the targetGraph’s current domain map.

  • rangeMap is also an optional input optional and defaults to targetGraph’s current range map.

Tasks to be completed

  • Modify current CrsGraph packAndPrepare and unpackAndCombine to take optional PID information.
  • Modify the several Tpetra::Import_Util::sort* procedures for CrsGraph data structures
  • Write and test the several [import,export]AndFillCompleteCrsGraph non-member functions and associated member functions.

@mhoemmen
Copy link
Contributor

Hi @tjfulle ! Thanks for writing all this down! I just have a couple notes:

The interface should take the input targetGraph by RCP, and return it that way too. (C++ ampersand references are never allowed to be null.) It should also take the domain and range Map by RCP, in order to match fillComplete's interface.

(We should talk at some point about replacing RCP with std::shared_ptr, but we should do that in a principled way and all at once.)

Prerequisite: Target graph, if nonnull, must NOT be fill complete on input. This may imply a new CrsGraph constructor that takes Kokkos::StaticCrsGraph as input but does not fill-complete the global graph.

"Transparent to user" in the above comment means "invisible to user," as in, "users don't see it or need to know about it."

More soon! Thanks @tjfulle !

@tjfulle
Copy link
Contributor Author

tjfulle commented Feb 27, 2018

Thanks @mhoemmen! I’ll edit my last comment to include your corrections so that it serves as a (accurate) roadmap

@mhoemmen
Copy link
Contributor

Just to document the use case in more detail: The point is to use the Export to discover edges that belong in the owned graph on the calling process, but that the calling MPI process doesn't know. Each MPI process learns about those graph edges from their neighboring processes in the discretization mesh. "Owned" here means uniquely owned, the solver's distribution of degrees of freedom. "Shared" means that another process owns it, but my process contributes to it.

This pattern shows up in finite-element or similar discretizations, when finite elements are uniquely owned by processes ("No Aura"), but multiple processes may contribute to degrees of freedom (that live on nodes, edges, or faces). The typical pattern is as Tim explained:

  1. Each MPI process locally iterates over its finite elements. It adds "owned edges" to the "owned graph," and "shared edges" to the "shared graph."
  2. An Export from the shared graph to the owned graph communicates shared edges.
  3. Users then fillComplete the owned graph and use it in solvers.

An exportAndFillComplete function on CrsGraph would combine steps 2 and 3. This matches the usual application pattern of "local assembly first, then boundary exchange / global assembly."

Note the following:

  • Only the owned graph ever needs to be fillComplete, and only after the Export.
  • fillComplete does global things (e.g., all-reduces to count global number of whatevers) as well as local things.
  • Step 1 makes users distinguish between owned and shared things when iterating over the mesh, but so does the current fastest pattern for global assembly on a CrsMatrix. Any "FECrsGraph" wrapper for this task would have to make the same distinction, and would inevitably be slower unless it had a complicated initialization interface and unless users had to use local indices. If such a wrapper existed, I would prefer that it be a separate façade interface, not even a subclass.

@mhoemmen
Copy link
Contributor

We discussed #119 today at the Tpetra meeting, as a hindrance (not quite blocker) to this issue.

tjfulle added a commit to tjfulle/Trilinos that referenced this issue Mar 8, 2018
Specialization of sortCrsEntries and sortAndMergeCrsEntries that don't
sort/merge CRS values.  These procedures will be used by Tpetra::CrsGraph.

@trilinos/tpetra

Part of: trilinos#2267
tjfulle added a commit to tjfulle/Trilinos that referenced this issue Mar 8, 2018
Specialization of sortCrsEntries and sortAndMergeCrsEntries that don't
sort/merge CRS values.  These procedures will be used by Tpetra::CrsGraph.

@trilinos/tpetra

Part of: trilinos#2267
tjfulle added a commit that referenced this issue Mar 10, 2018
* Tpetra:

Specialization of sortCrsEntries and sortAndMergeCrsEntries that don't
sort/merge CRS values.  These procedures will be used by Tpetra::CrsGraph.

@trilinos/tpetra

Part of: #2267

* Remove unused variable

* Update to address @mhoemmen feedback

* Tpetra: separating sortCrsEntries tests to their own executable.  Standalone test of Kokkos version

* Fixes to compile/run/pass tests with CUDA

* Fix comparison in inner loop
@tjfulle
Copy link
Contributor Author

tjfulle commented Mar 10, 2018

#2354 is a step toward implementing TAFC for CrsGraph (allowing sorting/merging of CRS entries w/o values)

@mhoemmen
Copy link
Contributor

Thanks @tjfulle ! :D

@tjfulle
Copy link
Contributor Author

tjfulle commented Mar 26, 2018

@csiefer2 @mhoemmen @william76 - I have a graph version of TAFC "done". Meaning, it compiles and unit tests of some implementation functions pass. I need to write tests of the interface (exportAndFillComplete, etc.). Are we to the point where this is needed to finish the FE assembly examples (that need TAFC)?

@mhoemmen
Copy link
Contributor

@tjfulle Awesome! :D Would rather have unit tests first ;-) .

As mentioned before, the use case we really want is Export from a shared CrsGraph to an owned CrsGraph that already exists and is partly filled. However, we can get by with using TAFC from an "owned + shared CrsGraph" to an owned CrsGraph. We would just need to modify the "local element loop" fill example to fill both shared and owned rows into the source CrsGraph. It would be interesting to compare that to the "Export between shared and owned CrsGraph" case, once that works, so I think it's worth having the TAFC option in the assembly example.

@tjfulle
Copy link
Contributor Author

tjfulle commented Mar 26, 2018

Why does TAFC require "owned + shared CrsGraph" to owned CrsGraph? i.e., why does it not support shared CrsGraph to owned? Or, in other words, how must TAFC be modified to support the desired use case?

@mhoemmen
Copy link
Contributor

TAFC takes an existing CrsGraph (the source), creates a new CrsGraph (the target), and Exports/Imports from source to target. If the source CrsGraph is just the shared CrsGraph, then you're missing the owned information.

Since TAFC creates the target CrsGraph, you can't preload the target with owned edges.

Since TAFC fillCompletes the target CrsGraph, you can't "postload" the target with owned edges.

@tjfulle
Copy link
Contributor Author

tjfulle commented Mar 27, 2018

I had thought that modifying TAFC to take a partially filled graph (of owned IDs) would be a quick and easy fix. Looking more closely, it looks to be a lot more effort than it seems on the surface.

@mhoemmen
Copy link
Contributor

@tjfulle The key feature we need is the ability to expand local storage after receiving data. That might actually be easier to do in CrsGraph::unpackAndPrepareNew than it is in TAFC. We don't really need the MPI communication optimizations that TAFC has for this case.

@tjfulle
Copy link
Contributor Author

tjfulle commented Mar 27, 2018

@mhoemmen perhaps? Do you mean unpackAndCombineNew? I have implementations of packAndPrepareNew and unpackAndCombineNew. The problem with unpackAndCombine is (the matrix version) unpacks in to the local matrix. The local StaticCsrGraph does not support inserting/replacing, so a different strategy from the matrix version will need to be devised

@mhoemmen
Copy link
Contributor

@tjfulle wrote:

Do you mean unpackAndCombineNew?

Oops :D Yup, that's right.

The local StaticCrsGraph does not support inserting/replacing, so a different strategy from the matrix version will need to be devised

Right -- we will need to clobber the local graph and make a new one. We'll also have to be careful to patch up all those extra Kokkos::View / arrays / whatever that hang around for no obvious reason but wreak havoc whenever I try to get rid of them.

@csiefer2
Copy link
Member

You could just leave the target graph empty...

@mhoemmen
Copy link
Contributor

@csiefer2 Does that case work? That would be the "don't fill-complete the target graph" case, right?

@csiefer2
Copy link
Member

@mhoemmen :More than that. Allocate the target graph and then don't do any inserts. The TAFC it all over from the source. This isn't normally the way Type 2 assembly is done, but I can't think of a particularly good reason why it has to be done the way it is once you have a working TAFC....

@tjfulle
Copy link
Contributor Author

tjfulle commented Mar 27, 2018

@mhoemmen , if the target graph is null, it’s created. I agree with @csiefer2 , it seems very convenient to fill up a single source graph and let TAFC do the rest (create and fill the target). The CrsGraph TAFC should work in this case. I’ll put in some tests and open a PR soon.

@mhoemmen
Copy link
Contributor

@csiefer2 @tjfulle OK cool; I didn't realize it could work that way. Thanks!

@mhoemmen
Copy link
Contributor

mhoemmen commented Mar 27, 2018

Procedure discussed today at Tpetra meeting:

  1. Construct (owned + shared) graph.
  2. Fill only the (owned + shared) graph.
  3. TAFC to make owned-only graph.

This means we would either have to change how global assembly works on the matrix, or we would have to create a shared-only graph.

@mhoemmen
Copy link
Contributor

"Efficient type 1" means:

  1. Build row Map
  2. Give list of columns to routine that reorders and makes a column Map
  3. Create the matrix with a row Map and a column Map, with strict upper bounds. (The user is supposed to figure that out. They could, for example, do a round of communication.)
  4. Insert into rows. The only off-process rows into which you're allowed to insert are rows that correspond to the column Map (and not in the row Map) (for a square matrix).

tjfulle added a commit to tjfulle/Trilinos that referenced this issue Apr 12, 2018
…nds)

CrsGraph::transferAndFillComplete is new, independent code.  However, there were
some function name clashes in packCrsMatrix and packCrsGraph.  So, I moved implementation
details of each to a new namespace so that they could have consistent naming.  The
same goes for unpackAndCombineCrsMatrix and unpackAndCombineCrsGraph.

Addresses: trilinos#2267
tjfulle added a commit that referenced this issue Apr 17, 2018
* Tpetra: Implementation of CrsGraph::transferAndFillComplete (and friends)

CrsGraph::transferAndFillComplete is new, independent code.  However, there were
some function name clashes in packCrsMatrix and packCrsGraph.  So, I moved implementation
details of each to a new namespace so that they could have consistent naming.  The
same goes for unpackAndCombineCrsMatrix and unpackAndCombineCrsGraph.

Addresses: #2267

* Fix error message in test

* Add explicit template parameters to see if gcc 4.8 build error goes away
@mhoemmen
Copy link
Contributor

mhoemmen commented May 8, 2018

@tjfulle It looks like PR #2556 fixes this, so I'm closing the issue. Please feel free to reopen if that's not accurate. Thanks! :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants