-
Notifications
You must be signed in to change notification settings - Fork 575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tpetra: create CrsGraph transferAndFillComplete #2267
Comments
Thanks @tjfulle ! Clarification: "Type 2" means "Export of CrsGraph from shared to owned," where elements (in the sense of the finite-element method) are uniquely owned but degrees of freedom are not. Currently, if a CrsGraph has StaticProfile, and if the graph is an Export or Import target, the graph forbids exceeding allocated storage. This is bad because users may not know on the receiving process, at target CrsGraph creation time, how many edges each row needs to receive from the source CrsGraph. There are two ways to fix this:
The second approach has the disadvantage of requiring duplicated memory for owned rows, since the target object must be separate from the source object in a transferAndFillComplete operation. However, it may be easier to adapt that code path. Also, there is an outstanding issue requesting this transferAndFillComplete feature for CrsGraph: #79 |
@mhoemmen and I talked about this issue yesterday over the phone and I'll try to summarize the conversation. @mhoemmen, please feel free to edit to fix any mistakes or to further clarify. For
The interface for the void
exportAndFillCompleteCrsGraph(Teuchos::RCP<graph_type>& targetGraph,
const graph_type& sourceGraph,
const export_type& export,
const map_type& domainMap,
const map_type& rangeMap);
Tasks to be completed
|
Hi @tjfulle ! Thanks for writing all this down! I just have a couple notes: The interface should take the input (We should talk at some point about replacing Prerequisite: Target graph, if nonnull, must NOT be fill complete on input. This may imply a new CrsGraph constructor that takes "Transparent to user" in the above comment means "invisible to user," as in, "users don't see it or need to know about it." More soon! Thanks @tjfulle ! |
Thanks @mhoemmen! I’ll edit my last comment to include your corrections so that it serves as a (accurate) roadmap |
Just to document the use case in more detail: The point is to use the Export to discover edges that belong in the owned graph on the calling process, but that the calling MPI process doesn't know. Each MPI process learns about those graph edges from their neighboring processes in the discretization mesh. "Owned" here means uniquely owned, the solver's distribution of degrees of freedom. "Shared" means that another process owns it, but my process contributes to it. This pattern shows up in finite-element or similar discretizations, when finite elements are uniquely owned by processes ("No Aura"), but multiple processes may contribute to degrees of freedom (that live on nodes, edges, or faces). The typical pattern is as Tim explained:
An exportAndFillComplete function on CrsGraph would combine steps 2 and 3. This matches the usual application pattern of "local assembly first, then boundary exchange / global assembly." Note the following:
|
We discussed #119 today at the Tpetra meeting, as a hindrance (not quite blocker) to this issue. |
Specialization of sortCrsEntries and sortAndMergeCrsEntries that don't sort/merge CRS values. These procedures will be used by Tpetra::CrsGraph. @trilinos/tpetra Part of: trilinos#2267
Specialization of sortCrsEntries and sortAndMergeCrsEntries that don't sort/merge CRS values. These procedures will be used by Tpetra::CrsGraph. @trilinos/tpetra Part of: trilinos#2267
* Tpetra: Specialization of sortCrsEntries and sortAndMergeCrsEntries that don't sort/merge CRS values. These procedures will be used by Tpetra::CrsGraph. @trilinos/tpetra Part of: #2267 * Remove unused variable * Update to address @mhoemmen feedback * Tpetra: separating sortCrsEntries tests to their own executable. Standalone test of Kokkos version * Fixes to compile/run/pass tests with CUDA * Fix comparison in inner loop
#2354 is a step toward implementing TAFC for |
Thanks @tjfulle ! :D |
@csiefer2 @mhoemmen @william76 - I have a graph version of TAFC "done". Meaning, it compiles and unit tests of some implementation functions pass. I need to write tests of the interface ( |
@tjfulle Awesome! :D Would rather have unit tests first ;-) . As mentioned before, the use case we really want is Export from a shared CrsGraph to an owned CrsGraph that already exists and is partly filled. However, we can get by with using TAFC from an "owned + shared CrsGraph" to an owned CrsGraph. We would just need to modify the "local element loop" fill example to fill both shared and owned rows into the source CrsGraph. It would be interesting to compare that to the "Export between shared and owned CrsGraph" case, once that works, so I think it's worth having the TAFC option in the assembly example. |
Why does TAFC require "owned + shared CrsGraph" to owned CrsGraph? i.e., why does it not support shared CrsGraph to owned? Or, in other words, how must TAFC be modified to support the desired use case? |
TAFC takes an existing CrsGraph (the source), creates a new CrsGraph (the target), and Exports/Imports from source to target. If the source CrsGraph is just the shared CrsGraph, then you're missing the owned information. Since TAFC creates the target CrsGraph, you can't preload the target with owned edges. Since TAFC fillCompletes the target CrsGraph, you can't "postload" the target with owned edges. |
I had thought that modifying TAFC to take a partially filled graph (of owned IDs) would be a quick and easy fix. Looking more closely, it looks to be a lot more effort than it seems on the surface. |
@tjfulle The key feature we need is the ability to expand local storage after receiving data. That might actually be easier to do in |
@mhoemmen perhaps? Do you mean |
@tjfulle wrote:
Oops :D Yup, that's right.
Right -- we will need to clobber the local graph and make a new one. We'll also have to be careful to patch up all those extra Kokkos::View / arrays / whatever that hang around for no obvious reason but wreak havoc whenever I try to get rid of them. |
You could just leave the target graph empty... |
@csiefer2 Does that case work? That would be the "don't fill-complete the target graph" case, right? |
@mhoemmen :More than that. Allocate the target graph and then don't do any inserts. The TAFC it all over from the source. This isn't normally the way Type 2 assembly is done, but I can't think of a particularly good reason why it has to be done the way it is once you have a working TAFC.... |
Procedure discussed today at Tpetra meeting:
This means we would either have to change how global assembly works on the matrix, or we would have to create a shared-only graph. |
"Efficient type 1" means:
|
…nds) CrsGraph::transferAndFillComplete is new, independent code. However, there were some function name clashes in packCrsMatrix and packCrsGraph. So, I moved implementation details of each to a new namespace so that they could have consistent naming. The same goes for unpackAndCombineCrsMatrix and unpackAndCombineCrsGraph. Addresses: trilinos#2267
* Tpetra: Implementation of CrsGraph::transferAndFillComplete (and friends) CrsGraph::transferAndFillComplete is new, independent code. However, there were some function name clashes in packCrsMatrix and packCrsGraph. So, I moved implementation details of each to a new namespace so that they could have consistent naming. The same goes for unpackAndCombineCrsMatrix and unpackAndCombineCrsGraph. Addresses: #2267 * Fix error message in test * Add explicit template parameters to see if gcc 4.8 build error goes away
CrsGraph
version oftransferAndFillComplete
is required to complete Type 2 finite element assembly.@trilinos/tpetra
The text was updated successfully, but these errors were encountered: