-
Notifications
You must be signed in to change notification settings - Fork 576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Amesos2: Handle x and b copying more efficiently. #7678
Amesos2: Handle x and b copying more efficiently. #7678
Conversation
Tracks whether things need to be initialized and whether the adapter was able to do directy assignment. Removes some spurious copies based on that information.
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MicheldeMessieres functional changes look good, most comments are requests in the documentation to hopefully help improve clarity and make it easier in the future when revisiting this code to understand the changes without having to hop between files. Thanks!
@@ -183,29 +184,29 @@ Basker<Matrix,Vector>::solve_impl( | |||
if ( single_proc_optimization() && nrhs == 1 ) { | |||
// no msp creation | |||
Util::get_1d_copy_helper_kokkos_view<MultiVecAdapter<Vector>, | |||
host_solve_array_t>::do_get(B, bValues_, as<size_t>(ld_rhs)); | |||
host_solve_array_t>::do_get(true, B, bValues_, as<size_t>(ld_rhs)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the modified do_get
calls, can the hard-coded true/false be pulled out, and replaced by a const bool value with descriptive name (or comment) indicating what this value means to the routine? I think it will help for future revisits and make it easier to navigate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -230,11 +231,10 @@ Basker<Matrix,Vector>::solve_impl( | |||
std::runtime_error, | |||
"Could not alloc needed working memory for solve" ); | |||
|
|||
{ | |||
if(!bDidAssignX) { // if bDidAssignX, then we solved straight to the adapter's X memory space |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would help to append to the comment something like "... without requiring additional memory allocation", unless I'm misunderstanding the comment's meaning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -238,30 +239,29 @@ Cholmod<Matrix,Vector>::solve_impl(const Teuchos::Ptr<MultiVecAdapter<Vector> > | |||
if(use_triangular_solves_) { // to device | |||
#if defined(KOKKOSKERNELS_ENABLE_SUPERNODAL_SPTRSV) && defined(KOKKOSKERNELS_ENABLE_TPL_CHOLMOD) | |||
Util::get_1d_copy_helper_kokkos_view<MultiVecAdapter<Vector>, | |||
device_solve_array_t>::do_get(B, device_bValues_, | |||
device_solve_array_t>::do_get(true, B, device_bValues_, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment about the true/false replacement in the do_get
routines here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -292,7 +292,7 @@ Cholmod<Matrix,Vector>::solve_impl(const Teuchos::Ptr<MultiVecAdapter<Vector> > | |||
TEUCHOS_TEST_FOR_EXCEPTION(ierr == -2, std::runtime_error, "Ran out of memory" ); | |||
|
|||
/* Update X's global values */ | |||
{ | |||
if(!bDidAssignX) { // if bDidAssignX, then we solved straight to the adapter's X memory space |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment about appending to the comment indicating no memory allocation was performed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
{ | ||
#ifdef HAVE_AMESOS2_TIMERS | ||
Teuchos::TimeMonitor mvConvTimer(this->timers_.vecConvTime_); | ||
Teuchos::TimeMonitor redistTimer( this->timers_.vecRedistTime_ ); | ||
#endif | ||
if ( single_proc_optimization() && nrhs == 1 ) { | ||
// no msp creation | ||
Util::get_1d_copy_helper_kokkos_view<MultiVecAdapter<Vector>, | ||
host_solve_array_t>::do_get(B, bValues_, as<size_t>(ld_rhs)); | ||
bDidAssignB = Util::get_1d_copy_helper_kokkos_view<MultiVecAdapter<Vector>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment about the true/false replacement in the do_get
routines here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
// However if b was actually a copy (bDidAssignB = false) then we can avoid | ||
// this deep_copy and just assign xValues_ = bValues_. | ||
if(bDidAssignB) { | ||
Kokkos::deep_copy(xValues_, bValues_); // need deep_copy or aolve will change adapter's b memory which should never happen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kokkos::deep_copy(xValues_, bValues_); // need deep_copy or aolve will change adapter's b memory which should never happen | |
Kokkos::deep_copy(xValues_, bValues_); // need deep_copy or solve will change adapter's b memory which should never happen |
minor typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -380,7 +384,7 @@ KLU2<Matrix,Vector>::solve_impl( | |||
} // end root_ | |||
} //end else | |||
|
|||
{ | |||
if(!bDidAssignX) { // if bDidAssignX, then we solved straight to the adapter's X memory space |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment about appending to the comment for clarity regarding memory allocation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -77,36 +77,53 @@ update_dst_size(dst_t & dst, const src_t & src) { // templated just for 2d | |||
template<class dst_t, class src_t> // version for same memory spaces | |||
typename std::enable_if<std::is_same<typename dst_t::value_type, | |||
typename src_t::value_type>::value>::type | |||
implement_copy_or_assign_same_mem_check_types(dst_t & dst, const src_t & src) { | |||
implement_copy_or_assign_same_mem_check_types(bool bInitialize, dst_t & dst, const src_t & src, bool & bAssigned) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment prior indicating what bInitialize
and bAssigned
represent? I know it's indicated in other usage areas as well, but I think it will help to have explanation here as well to avoid need to search
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
// changed and the next solve cycle will fail. However if the method returns | ||
// false then the adapter already had to deep_copy b so those solvers don't | ||
// need to take any extra steps to protect the b memory space. | ||
// Note, searching for bDidAssignX and bDidAssignB will find all places |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This explanation is helpful and important, but because of the length it may take a couple passes to find specific info, plus it requires hopping between solvers to get some clarification on usage and meaning. Can some of the comments here be modified, or copied and formatted differently after the explanation, to separate out key words/variables with summary of meaning of the value, something like:
do_get
Return type (bool)
bDidAssignX, bDidAssignB (bool): assignment state for solver
true means ...
false means ...
Input
bInitialize (bool): indicates in adapter if deep_copy occurred (it's not clear if a false means the deep_copy hasn't occurred and is still needed, or if false means a deep_copy is not necessary, can you clarify?)
true means ...
false means ...
My formatting isn't great but hopefully indicates what I think may be helpful for separating out the variables from the explanation with concise summaries that are easy to find (something similar to Tpetra's style for documenting functions and in/out variables may be better pattern to follow, doxygen is not necessary here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have rewritten this to be more palatable. I left out the specific stuff on Klu2 and SuperLU. It's probably better just to leave those parts explained in the specific solvers which have their own comments.. Also removed the parts about the internal temporary copies since from the caller's point of view that doesn't matter for anything.
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
This is a cosmetic update to improve clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I think the changes help with readability and require less file hopping, thanks @MicheldeMessieres ! Let's get this testing
Status Flag 'Pre-Test Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ ndellingwood ]! |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_9.2
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_clang_10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_2
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_3
Jenkins Parameters
Using Repos:
Pull Request Author: MicheldeMessieres |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: Trilinos_pullrequest_gcc_4.8.4
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_intel_17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_4.9.3_SERIAL
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_7.2.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_gcc_8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_cuda_9.2
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_clang_10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_2
Jenkins Parameters
Build InformationTest Name: Trilinos_pullrequest_python_3
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ ndellingwood ]! |
Status Flag 'Pull Request AutoTester' - Pull Request will be Automerged |
Merge on Pull Request# 7678: IS A SUCCESS - Pull Request successfully merged |
@trilinos/amesos2
Motivation
The solver gets the x and b data from the adapter and it may be directly assigned (no deep copy) if memory and types match. This PR adds a flag bInitialize so the 'get x' call can inform the adapter that initializing the data is not necessary. This means that when the adapter does not match and has to allocate the data for x, it won't deep copy its x values into the x space. That was a wasted copy before since the solve was going to overwrite x anyways.
Also the get x or get b calls now receive a bDidAssign flag to tell the solver if the data is pointing directly to the adapter memory space. If x is directly assigned, there is no need to 'put x' after the solver and this can now be skipped.
Also SuperLU and Klu2 may modify b during the solve under some conditions. If the adapter directly assigned b, it's necessary to deep_copy b first to avoid overwriting the adapters b data. Previously, this deep_copy would always happen but now only when necessary.
This resolves #7158 which mentions some of the above issues.
Testing
Amesos2 tests on white Cuda and Mac parallel/serial builds.