Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trilinos Master Merge PR Generator: Auto PR created to promote from master_merge_20211030_000551 branch to master #9887

Closed
wants to merge 131 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
131 commits
Select commit Hold shift + click to select a range
d57f6ab
Replace recursive by iterative implementation of heapify
masterleinad Jun 10, 2021
e2359ae
Tpetra: Enable more SYCL tests
masterleinad Jun 10, 2021
76e1cd9
tpetra: move sort_crs_matrix out of Impl namespace
ndellingwood Jun 10, 2021
da10abe
amesos2: move sort_crs_matrix out of Impl namespace
ndellingwood Jul 8, 2021
f8c911d
zoltan2: modify "Vector" alias in Test_Sphynx
ndellingwood Jul 12, 2021
f310e12
Improve comments
masterleinad Jul 22, 2021
544eeda
sacado, stokhos: replace KOKKOS_IMPL_CUDA_* macros with Cuda functions
ndellingwood Jul 22, 2021
3b4f116
Mention KokkosKernel issue
masterleinad Jul 22, 2021
60e27ce
tpetra: move sort_crs_matrix out of Impl namespace
ndellingwood Jun 10, 2021
c0e1471
amesos2: move sort_crs_matrix out of Impl namespace
ndellingwood Jul 8, 2021
a891ab2
zoltan2: modify "Vector" alias in Test_Sphynx
ndellingwood Jul 12, 2021
f86c91a
sacado, stokhos: replace KOKKOS_IMPL_CUDA_* macros with Cuda functions
ndellingwood Jul 22, 2021
d72eb85
atdm/contributed/weaver: update modules
ndellingwood Oct 1, 2021
e6aee98
intrepid2: workaround intel internal compiler error in Intrepid2_Data
ndellingwood Oct 1, 2021
08f1f07
Merge branch 'kokkos-promotion' of https://github.com/trilinos/Trilin…
ndellingwood Oct 1, 2021
b539d1a
Panzer: fix 1D line mesh issue on cuda
rppawlo Oct 6, 2021
54a5a70
MiniTensor: fix or at least work around some __host__ __device__ marking
japlews Oct 12, 2021
0c76505
Merge pull request #9806 from japlews/japlews/minitensor-host-device-…
jhux2 Oct 13, 2021
536a384
tpetra: move sort_crs_* out of Impl namespace
ndellingwood Oct 14, 2021
2cf5e81
adding "start owned" parameter to FECrsMatrix to start
tjfulle Oct 6, 2021
96820bd
STK: Snapshot 10-14-21 12:18
tasmith4 Oct 14, 2021
bed59f2
Merge branch 'develop' into kokkos-promotion
ndellingwood Oct 14, 2021
fa115c1
ifpack2,sacado: rename CUDA_SAFE_CALL -> KOKKOS_IMPL_CUDA_SAFE_CALL
ndellingwood Oct 14, 2021
189fe36
TrilinosCouplings: Adding 2D anisotropic diffusion
csiefer2 Oct 15, 2021
7b63df4
Update packages to move Kokkos::Timer out of impl namespace
ndellingwood Oct 15, 2021
0d9ab22
tpetra: fixed shadow warnings, including one true bug where loop index
kddevin Oct 15, 2021
d7b6eac
Set Cuda host pinned memory as default
vqd8a Oct 16, 2021
b9a2b4d
Enable using Cuda host pinned mem
vqd8a Oct 17, 2021
67fe3bf
Check requested number of GPUs against available number of GPUs
vqd8a Oct 18, 2021
452e80d
Merge Pull Request #9783 from rppawlo/Trilinos/panzer-fix1d-line-mesh
trilinos-autotester Oct 18, 2021
fd27eb3
MueLu: Adding signed SA style dropping option
csiefer2 Oct 19, 2021
13007a8
MueLu: Adding signed SA style dropping option
csiefer2 Oct 19, 2021
2d80622
TrilinosCouplings: Output mods
csiefer2 Oct 19, 2021
f7b64da
Tpetra: Added outline of a test for async MultiVector transfers
tasmith4 Oct 5, 2021
bfbeb3d
Tpetra: update async MultiVector transfer test to be independent of t…
tasmith4 Oct 8, 2021
31d9c29
Tpetra: update async MultiVector transfer test to be independent of r…
tasmith4 Oct 8, 2021
7258c08
Tpetra: Added outline of tests for async CrsMatrix transfers
tasmith4 Oct 11, 2021
f6c4d69
Tpetra: separate file for async transfer tests
tasmith4 Oct 18, 2021
4d0359e
Tpetra: decouple import method from tests
tasmith4 Oct 19, 2021
1fd4add
Tpetra: cleanup MultiVectorTransferFixture
tasmith4 Oct 19, 2021
b84b02a
Tpetra: cleanup DiagonalCrsMatrixTransferFixture
tasmith4 Oct 19, 2021
262bb81
Tpetra: cleanup LowerTriangularCrsMatrixTransferFixture
tasmith4 Oct 19, 2021
37f0f36
Tpetra: change ForwardImport implementation to be truly asynchronous
tasmith4 Oct 19, 2021
8df9981
Tpetra: rearrange test names
tasmith4 Oct 19, 2021
132c004
Tpetra: add ReverseImport tests
tasmith4 Oct 19, 2021
0985c3a
Tpetra: add ForwardExport tests
tasmith4 Oct 19, 2021
d5b43ee
Tpetra: add ReverseExport tests
tasmith4 Oct 19, 2021
01ddff6
Add MPI_Finalize before return
vqd8a Oct 19, 2021
03f4904
Amesos2::SuperLU_dist: add option to apply Equil
iyamazaki Oct 19, 2021
a72563a
update config/config.* to build on summit
cwsmith Oct 19, 2021
49eda37
MueLu: Removing printf
csiefer2 Oct 19, 2021
d91d3fc
new version of prolongator constraint satisfication capability
rstumin Oct 19, 2021
0b13a91
Merge Pull Request #9822 from trilinos/Trilinos/csiefer-13007a8
trilinos-autotester Oct 19, 2021
31ff38e
Percept changes needed for stk entity-rank changes.
alanw0 Oct 19, 2021
48770fe
Changes to fix errors in LCM
lxmota Oct 20, 2021
0c4bac0
Merge Pull Request #9831 from trilinos/Trilinos/minitensor-lcm
trilinos-autotester Oct 20, 2021
186a40b
First krino commit (#9825)
drnobleabq Oct 20, 2021
7723a6a
fixed some scalar traits type compile errors
rstumin Oct 20, 2021
0187150
packages/framework: Point to top-level ini files
e10harvey Oct 20, 2021
b6b9d46
Merge Pull Request #9266 from masterleinad/Trilinos/enable_more_sycl_…
trilinos-autotester Oct 21, 2021
6870a3e
Tpetra: Fixing stride bug
csiefer2 Oct 21, 2021
cf3ae06
Merge pull request #9834 from e10harvey/TRILFRAME-129
e10harvey Oct 21, 2021
ab51356
Merge Pull Request #9837 from trilinos/Trilinos/csiefer-6870a3e
trilinos-autotester Oct 21, 2021
3d18a53
fixed one more comparison that needed a teuchos scalar traits cast to…
rstumin Oct 21, 2021
c96aeed
STK: Updated snapshot 10-21-21 11:21
tasmith4 Oct 21, 2021
b690724
introduced new variable with LocalOrdinal type to remove a casting er…
rstumin Oct 21, 2021
aacae8b
MueLu: Add test driver for hierarchical matrices
cgcgcg Oct 21, 2021
efad248
Merge Pull Request #9826 from cwsmith/Trilinos/cws/zoltanConfigUpdate
trilinos-autotester Oct 22, 2021
77a45dd
one more casting fix
rstumin Oct 22, 2021
511cf50
ML: Fix socket references for Windows builds
sskutnik Oct 22, 2021
71a9276
Fix TeuchosParameterlist Windows build configuration
sskutnik Oct 22, 2021
d3a9411
Fix invalid enum reference for Thyra SpmdVectorDefaultBase
sskutnik Oct 22, 2021
3b37d42
Merge Pull Request #9829 from trilinos/Trilinos/rstumin-d91d3fc
trilinos-autotester Oct 22, 2021
d02666c
shylubasker: remove deprecated code
ndellingwood Oct 22, 2021
9c42161
Merge Pull Request #9849 from sskutnik/Trilinos/fix_ml_socket_windows
trilinos-autotester Oct 23, 2021
f89d494
Merge pull request #9848 from sskutnik/thyra-fix-invalid-enum
jhux2 Oct 23, 2021
113793c
Merge Pull Request #9845 from sskutnik/Trilinos/fix_teuchos_parameter…
trilinos-autotester Oct 23, 2021
06d8083
Belos: Kokkos solvers adapter (#9827)
jennloe Oct 25, 2021
3bae707
Merge pull request #9853 from ndellingwood/shylubasker-rm-deps
hkthorn Oct 25, 2021
6e0ee32
Tpetra: clean up includes/typedefs/setup functions for AsyncTransfer …
tasmith4 Oct 25, 2021
1da9e14
Tpetra: fix template parameter ordering in AsyncTransfer for consistency
tasmith4 Oct 25, 2021
ac8f397
Tpetra: AsyncTransfer test cleanup: remove shouldSkipTest, fix maps, …
tasmith4 Oct 25, 2021
d84e340
MueLu: remove dependence on Tpetra deprecated
brian-kelley Oct 25, 2021
fd4b703
Sacado: Make Sacado_FadKokkosTests_Cuda UVM free
JacobDomagala Oct 20, 2021
14c1e52
Sacado: Re-enable Sacado unit tests for UVM-Off PR tester
JacobDomagala Oct 25, 2021
4d9a4f0
MueLu: remove Tpetra deprecated from Region
brian-kelley Oct 25, 2021
dddef5c
Sacado: Fad_KokkosAtomicTests_Cuda_Hierarchical UVM free
JacobDomagala Oct 25, 2021
029e783
Sacado: Run DFad tests for remaining Sacado tests
JacobDomagala Oct 25, 2021
448daeb
intrepid2: remove deprecation warnings
ndellingwood Oct 22, 2021
eb03da3
amesos2: resolve unused warning in superlu interface
ndellingwood Oct 22, 2021
98bc0c8
Amesos2::SuperLU_dist: shift the local row and col scaling by offset
iyamazaki Oct 26, 2021
0e6d971
Disable override of Sacado_NEW_FAD_DESIGN_IS_DEFAULT with PR tester
ndellingwood Oct 26, 2021
0c84fa1
Merge pull request #9824 from iyamazaki/amesos2-superlu-dist
ndellingwood Oct 26, 2021
15c606f
MueLu: Simplify passing precomputed objects
cgcgcg Oct 19, 2021
78fac34
MueLu: Rebase gold files
cgcgcg Oct 20, 2021
b155965
MueLu: fix ReitzingerPFactory for non-UVM
brian-kelley Oct 26, 2021
a01a8dd
Merge Pull Request #9811 from trilinos/Trilinos/fix-9638-take-2
trilinos-autotester Oct 26, 2021
66f7efc
Merge Pull Request #9863 from trilinos/Trilinos/tpetra_shadows
trilinos-autotester Oct 27, 2021
bc4e9f6
Merge Pull Request #9864 from brian-kelley/Trilinos/FixRitzinger
trilinos-autotester Oct 27, 2021
6940295
Sacado: Use Kokkos::View<FadType> instead of Kokkos::View<FadType*> f…
JacobDomagala Oct 27, 2021
fe90c94
MueLu: Construct Ifpack2Smoother using Operator instead of Matrix
cgcgcg Oct 26, 2021
e35f02f
MueLu: Derive HierarchicalOperator from RowMatrix
cgcgcg Oct 26, 2021
9972d1b
Merge Pull Request #9828 from cgcgcg/Trilinos/ifpackHypreDepr
trilinos-autotester Oct 27, 2021
b34b94d
cmake/std/atdm: Fix typo
kokkos-devops-admin Oct 27, 2021
8606fb3
packages/framework: Fix typo
e10harvey Oct 27, 2021
46303e5
Tpetra: deprecate and remove use of StaticProfile (#9865)
kddevin Oct 27, 2021
2ac5617
stokhos: resolve -Werror
ndellingwood Oct 22, 2021
1d24174
Merge pull request #9858 from ndellingwood/enable-sacado-new-design
ndellingwood Oct 27, 2021
1efe0d2
Merge Pull Request #9868 from kokkos-devops-admin/Trilinos/patch-2
trilinos-autotester Oct 27, 2021
ed3b034
Merge Pull Request #9823 from trilinos/Trilinos/tasmit/distobject-asy…
trilinos-autotester Oct 27, 2021
0e6ccca
Merge Pull Request #9869 from trilinos/Trilinos/e10harvey-patch-1
trilinos-autotester Oct 27, 2021
8bf37ee
MueLu: Adding support for driving Galeri options via ParameterList in…
csiefer2 Oct 27, 2021
c47b1dc
MueLu: Adding support for driving Galeri options via ParameterList in…
csiefer2 Oct 27, 2021
e442df7
MueLu: Adding support for driving Galeri options via ParameterList in…
csiefer2 Oct 27, 2021
8d4f015
Merge Pull Request #9813 from trilinos/Trilinos/stk-snapshot
trilinos-autotester Oct 27, 2021
b607078
Merge pull request #9861 from cgcgcg/hierarchical
cgcgcg Oct 27, 2021
1d02e11
Merge pull request #9862 from brian-kelley/MueLuDeprecated
jhux2 Oct 27, 2021
9ef3d88
Merge Pull Request #9871 from trilinos/Trilinos/csiefer-c47b1dc
trilinos-autotester Oct 28, 2021
3ea77b0
MueLu: Removing superfluous cout
csiefer2 Oct 28, 2021
0e7c14a
Merge Pull Request #9875 from trilinos/Trilinos/csiefer-3ea77b0
trilinos-autotester Oct 28, 2021
75ca407
STK: Snapshot 10-28-21 09:02
tasmith4 Oct 28, 2021
44fdc45
tpetra: fixed indexing error in describe() #9870 (#9874)
kddevin Oct 28, 2021
4275f6b
Snapshot of kokkos.git from commit 8dc4a906d43ae8eacc951cc5d7e95ad2df…
ndellingwood Oct 28, 2021
cd7a9c7
Snapshot of kokkos-kernels.git from commit 14d29f0a04f9fc959c7c96d98e…
ndellingwood Oct 28, 2021
6909d93
Intrepid2 - deep copy range match
kyungjoo-kim Oct 28, 2021
4d84b07
Merge Pull Request #9877 from vqd8a/Trilinos/adelus-defaulthostpinned
trilinos-autotester Oct 28, 2021
d63e635
intrepid2: resolve signed-unsigned warning
ndellingwood Oct 29, 2021
78ff466
Merge Pull Request #9872 from trilinos/Trilinos/stk-snapshot
trilinos-autotester Oct 29, 2021
f268d09
Update Krino for fixing parsing support when compiled with yaml-cpp (…
drnobleabq Oct 29, 2021
91e55fa
Merge pull request #9836 from trilinos/kokkos-promotion
crtrott Oct 29, 2021
b414ad3
Merge pull request #9857 from NexGenAnalytics/Sacado-UVM-off
etphipp Oct 29, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 1 addition & 0 deletions PackagesList.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@ TRIBITS_REPOSITORY_DEFINE_PACKAGES(
Compadre packages/compadre ST
STK packages/stk PT # Depends on boost
Percept packages/percept PT # Depends on boost
Krino packages/krino PT # Depends on boost
SCORECapf_zoltan SCOREC/zoltan ST
SCORECapf_stk SCOREC/stk ST
SCORECma SCOREC/ma ST
Expand Down
4 changes: 3 additions & 1 deletion cmake/std/PullRequestLinuxCuda10.1.105TestingSettings.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,9 @@ set (TPL_DLlib_LIBRARIES "-ldl" CACHE FILEPATH "Set by default for CUDA PR testi
# The compile times for two Panzer files went up to over 6 hours. This
# turns off one feature that allows these in about 24 minutes. Please remove
# when issue #7532 is resolved.
set (Sacado_NEW_FAD_DESIGN_IS_DEFAULT OFF CACHE BOOL "Temporary fix for issue #7532" )
# Compile time issues addressed by #8377. Commenting out the override of
# Sacado_NEW_FAD_DESIGN_IS_DEFAULT to return to default settings.
#set (Sacado_NEW_FAD_DESIGN_IS_DEFAULT OFF CACHE BOOL "Temporary fix for issue #7532" )

# Disable some packages that can't be tested with this PR build
set (Trilinos_ENABLE_ShyLU_NodeTacho OFF CACHE BOOL
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,9 @@ set (TPL_DLlib_LIBRARIES "-ldl" CACHE FILEPATH "Set by default for CUDA PR testi
# The compile times for two Panzer files went up to over 6 hours. This
# turns off one feature that allows these in about 24 minutes. Please remove
# when issue #7532 is resolved.
set (Sacado_NEW_FAD_DESIGN_IS_DEFAULT OFF CACHE BOOL "Temporary fix for issue #7532" )
# Compile time issues addressed by #8377. Commenting out the override of
# Sacado_NEW_FAD_DESIGN_IS_DEFAULT to return to default settings.
#set (Sacado_NEW_FAD_DESIGN_IS_DEFAULT OFF CACHE BOOL "Temporary fix for issue #7532" )

# Disable some packages that can't be tested with this PR build
set (Trilinos_ENABLE_ShyLU_NodeTacho OFF CACHE BOOL
Expand Down Expand Up @@ -146,7 +148,6 @@ set (Domi_ENABLE_TESTS OFF CACHE BOOL "Turn off tests for non-UVM build")
set (Kokkos_ENABLE_TESTS OFF CACHE BOOL "Turn off tests for non-UVM build")
set (KokkosKernels_ENABLE_TESTS OFF CACHE BOOL "Turn off tests for non-UVM build")
set (ROL_ENABLE_TESTS OFF CACHE BOOL "Turn off tests for non-UVM build")
set (Sacado_ENABLE_TESTS OFF CACHE BOOL "Turn off tests for non-UVM build")
set (SEACAS_ENABLE_TESTS OFF CACHE BOOL "Turn off tests for non-UVM build")
set (ShyLU_DD_ENABLE_TESTS OFF CACHE BOOL "Turn off tests for non-UVM build")
set (STK_ENABLE_TESTS OFF CACHE BOOL "Turn off tests for non-UVM build")
Expand Down
4 changes: 3 additions & 1 deletion cmake/std/PullRequestLinuxCuda10.1.243TestingSettings.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,9 @@ set (TPL_DLlib_LIBRARIES "-ldl" CACHE FILEPATH "Set by default for CUDA PR testi
# The compile times for two Panzer files went up to over 6 hours. This
# turns off one feature that allows these in about 24 minutes. Please remove
# when issue #7532 is resolved.
set (Sacado_NEW_FAD_DESIGN_IS_DEFAULT OFF CACHE BOOL "Temporary fix for issue #7532" )
# Compile time issues addressed by #8377. Commenting out the override of
# Sacado_NEW_FAD_DESIGN_IS_DEFAULT to return to default settings.
#set (Sacado_NEW_FAD_DESIGN_IS_DEFAULT OFF CACHE BOOL "Temporary fix for issue #7532" )

# Disable some packages that can't be tested with this PR build
set (Trilinos_ENABLE_ShyLU_NodeTacho OFF CACHE BOOL
Expand Down
2 changes: 1 addition & 1 deletion cmake/std/atdm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1572,7 +1572,7 @@ each system</a>.
These contributed configurations are used just like any other custom
configuration as described in <a
href="#custom-systems-and-configurations">Custom systems and
configurations</a>.. For example, to load the contributed custom 'weaver'
configurations</a>. For example, to load the contributed custom 'weaver'
configuration to do a CUDA optimized build, do:

```
Expand Down
2 changes: 1 addition & 1 deletion cmake/std/atdm/contributed/weaver/environment.sh
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ elif [[ "$ATDM_CONFIG_COMPILER" == "CUDA"* ]] ; then
fi

# Ninja
module load ninja/1.7.2
#module load ninja/1.7.2

# CMake
#module swap cmake/3.6.2 cmake/3.12.3
Expand Down
2 changes: 1 addition & 1 deletion packages/adelus/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ TRIBITS_ADD_OPTION_AND_DEFINE(${PACKAGE_NAME}_ENABLE_TIMING
TRIBITS_ADD_OPTION_AND_DEFINE(${PACKAGE_NAME}_ENABLE_CUDAHOSTPINNED
CUDA_HOST_PINNED_MPI
"Use Cuda Host Pinned memory for MPI."
OFF )
ON )

TRIBITS_ADD_OPTION_AND_DEFINE(${PACKAGE_NAME}_ENABLE_USEDEEPCOPY
USE_DEEPCOPY
Expand Down
34 changes: 31 additions & 3 deletions packages/adelus/src/Adelus_perm1.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -300,6 +300,9 @@ namespace Adelus {
int ptr1_idx, myfirstrow;

#ifdef GET_TIMING
#if defined(CUDA_HOST_PINNED_MPI) && defined(KOKKOS_ENABLE_CUDA)
double t1, copyhostpinnedtime;
#endif
double t2;
double totalpermtime;
#endif
Expand All @@ -316,10 +319,18 @@ namespace Adelus {
#endif
typedef typename ZDView::device_type::memory_space memory_space;
typedef Kokkos::View<value_type**, Kokkos::LayoutLeft, memory_space> ViewMatrixType;


#if defined(CUDA_HOST_PINNED_MPI) && defined(KOKKOS_ENABLE_CUDA)
typedef Kokkos::View<value_type**, Kokkos::LayoutLeft, Kokkos::CudaHostPinnedSpace> View2DHostPinnType;//CudaHostPinnedSpace
#endif

if (my_rhs_ > 0) {

ViewMatrixType rhs_temp ( "rhs_temp", nrows_matrix, my_rhs_ );//allocate full-size RHS vectors
ViewMatrixType rhs_temp ( "rhs_temp", nrows_matrix, my_rhs_ );//allocate full-size RHS vectors
#if defined(CUDA_HOST_PINNED_MPI) && defined(KOKKOS_ENABLE_CUDA)
View2DHostPinnType h_rhs_temp( "h_rhs_temp", nrows_matrix, my_rhs_ );
#endif

Kokkos::deep_copy(rhs_temp, 0);//initialize with 0s

ncols_proc1 = ncols_matrix/nprocs_row;
Expand Down Expand Up @@ -357,18 +368,35 @@ namespace Adelus {

ptr1_idx++;
}
#if defined(CUDA_HOST_PINNED_MPI) && defined(KOKKOS_ENABLE_CUDA)
#ifdef GET_TIMING
t1 = MPI_Wtime();
#endif
Kokkos::deep_copy(h_rhs_temp,rhs_temp);
#ifdef GET_TIMING
copyhostpinnedtime = (MPI_Wtime()-t1);
#endif

MPI_Allreduce( MPI_IN_PLACE, h_rhs_temp.data(), nrows_matrix*my_rhs_, ADELUS_MPI_DATA_TYPE, MPI_SUM, col_comm);

Kokkos::deep_copy( subview(ZV, Kokkos::ALL(), Kokkos::make_pair(0, my_rhs_)),
subview(h_rhs_temp, Kokkos::make_pair(myfirstrow-1, myfirstrow-1+my_rows), Kokkos::ALL()) );
#else //CUDA-aware MPI
MPI_Allreduce( MPI_IN_PLACE, rhs_temp.data(), nrows_matrix*my_rhs_, ADELUS_MPI_DATA_TYPE, MPI_SUM, col_comm);

Kokkos::deep_copy( subview(ZV, Kokkos::ALL(), Kokkos::make_pair(0, my_rhs_)),
subview(rhs_temp, Kokkos::make_pair(myfirstrow-1, myfirstrow-1+my_rows), Kokkos::ALL()) );

#endif
}

#ifdef GET_TIMING
totalpermtime = MPI_Wtime() - t2;
#endif

#ifdef GET_TIMING
#if defined(CUDA_HOST_PINNED_MPI) && defined(KOKKOS_ENABLE_CUDA)
showtime("Time to copy dev mem --> host pinned mem",&copyhostpinnedtime);
#endif
showtime("Total time in perm",&totalpermtime);
#endif
}
Expand Down
62 changes: 31 additions & 31 deletions packages/adelus/src/Adelus_solve.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -149,9 +149,9 @@ void back_solve6(ZDView& ZV)
double t1,t2;
double allocviewtime,eliminaterhstime,bcastrowtime,updrhstime,xchgrhstime;
double totalsolvetime;
//#if defined(CUDA_HOST_PINNED_MPI) && defined(KOKKOS_ENABLE_CUDA)
// double copyhostpinnedtime;
//#endif
#if defined(CUDA_HOST_PINNED_MPI) && defined(KOKKOS_ENABLE_CUDA)
double copyhostpinnedtime;
#endif
#endif

MPI_Request msgrequest;
Expand Down Expand Up @@ -181,9 +181,9 @@ void back_solve6(ZDView& ZV)

#ifdef GET_TIMING
allocviewtime=eliminaterhstime=bcastrowtime=updrhstime=xchgrhstime=0.0;
//#if defined(CUDA_HOST_PINNED_MPI) && defined(KOKKOS_ENABLE_CUDA)
// copyhostpinnedtime=0.0;
//#endif
#if defined(CUDA_HOST_PINNED_MPI) && defined(KOKKOS_ENABLE_CUDA)
copyhostpinnedtime=0.0;
#endif

t1 = MPI_Wtime();
#endif
Expand Down Expand Up @@ -251,15 +251,15 @@ void back_solve6(ZDView& ZV)
#endif
}

//#if defined(CUDA_HOST_PINNED_MPI) && defined(KOKKOS_ENABLE_CUDA)
//#ifdef GET_TIMING
// t1 = MPI_Wtime();
//#endif
// Kokkos::deep_copy(h_row1,row1);
//#ifdef GET_TIMING
// copyhostpinnedtime += (MPI_Wtime()-t1);
//#endif
//#endif
#if defined(CUDA_HOST_PINNED_MPI) && defined(KOKKOS_ENABLE_CUDA)
#ifdef GET_TIMING
t1 = MPI_Wtime();
#endif
Kokkos::deep_copy(h_row1,row1);
#ifdef GET_TIMING
copyhostpinnedtime += (MPI_Wtime()-t1);
#endif
#endif

#ifdef GET_TIMING
t1 = MPI_Wtime();
Expand All @@ -268,27 +268,27 @@ void back_solve6(ZDView& ZV)
type[0] = SOCOLTYPE+j;

//MPI_Bcast((char *) row1, bytes[0], MPI_CHAR, mesh_row(root), col_comm);
//#if defined(CUDA_HOST_PINNED_MPI) && defined(KOKKOS_ENABLE_CUDA)
// MPI_Bcast(reinterpret_cast<char *>(h_row1.data()), bytes[0], MPI_CHAR, mesh_row(root), col_comm);
//#else //CUDA-aware MPI -- Note: Looks like MPI_Bcast is still working well with device (cuda) pointers (and faster than using cuda host pinned memory)
#if defined(CUDA_HOST_PINNED_MPI) && defined(KOKKOS_ENABLE_CUDA)
MPI_Bcast(reinterpret_cast<char *>(h_row1.data()), bytes[0], MPI_CHAR, mesh_row(root), col_comm);
#else //CUDA-aware MPI
MPI_Bcast(reinterpret_cast<char *>(row1.data()), bytes[0], MPI_CHAR, mesh_row(root), col_comm);
//#endif
#endif
// added this barrier for CPLANT operation

MPI_Barrier(col_comm);
#ifdef GET_TIMING
bcastrowtime += (MPI_Wtime()-t1);
#endif

//#if defined(CUDA_HOST_PINNED_MPI) && defined(KOKKOS_ENABLE_CUDA)
//#ifdef GET_TIMING
// t1 = MPI_Wtime();
//#endif
// Kokkos::deep_copy(row1,h_row1);
//#ifdef GET_TIMING
// copyhostpinnedtime += (MPI_Wtime()-t1);
//#endif
//#endif
#if defined(CUDA_HOST_PINNED_MPI) && defined(KOKKOS_ENABLE_CUDA)
#ifdef GET_TIMING
t1 = MPI_Wtime();
#endif
Kokkos::deep_copy(row1,h_row1);
#ifdef GET_TIMING
copyhostpinnedtime += (MPI_Wtime()-t1);
#endif
#endif

#ifdef GET_TIMING
t1 = MPI_Wtime();
Expand Down Expand Up @@ -376,9 +376,9 @@ void back_solve6(ZDView& ZV)
showtime("Time to eliminate rhs",&eliminaterhstime);
showtime("Time to bcast temp row",&bcastrowtime);
showtime("Time to update rhs",&updrhstime);
//#if defined(CUDA_HOST_PINNED_MPI) && defined(KOKKOS_ENABLE_CUDA)
// showtime("Time to copy host pinned mem <--> dev mem",&copyhostpinnedtime);
//#endif
#if defined(CUDA_HOST_PINNED_MPI) && defined(KOKKOS_ENABLE_CUDA)
showtime("Time to copy host pinned mem <--> dev mem",&copyhostpinnedtime);
#endif
showtime("Time to xchg rhs",&xchgrhstime);
showtime("Total time in solve",&totalsolvetime);
#endif
Expand Down
11 changes: 11 additions & 0 deletions packages/adelus/test/vector_random/cxx_main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,17 @@ int main(int argc, char *argv[])
#ifdef KOKKOS_ENABLE_CUDA
int gpu_count;
cudaGetDeviceCount ( &gpu_count );
if (nptile > gpu_count) {
if( rank == 0 ) {
std::cout << "Request more GPUs than the number of GPUs available "
<< "to MPI processes (requested: " << nptile
<< " vs. available: " << gpu_count
<< "). Exit without test." << std::endl;
}
MPI_Finalize() ;
return 0;
}

Kokkos::InitArguments args;
args.num_threads = 0;
args.num_numa = 0;
Expand Down
3 changes: 2 additions & 1 deletion packages/amesos2/src/Amesos2_MatrixAdapter_def.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@
#define TESTING_AMESOS2_WITH_TPETRA_REMOVE_UVM
#if defined(TESTING_AMESOS2_WITH_TPETRA_REMOVE_UVM)
#include "KokkosKernels_SparseUtils.hpp"
#include "KokkosKernels_Sorting.hpp"
#endif

namespace Amesos2 {
Expand Down Expand Up @@ -608,7 +609,7 @@ namespace Amesos2 {
// sort
if( ordering == SORTED_INDICES ) {
using execution_space = typename KV_GS::execution_space;
KokkosKernels::Impl::sort_crs_matrix <execution_space, KV_GS, KV_GO, KV_S>
KokkosKernels::sort_crs_matrix <execution_space, KV_GS, KV_GO, KV_S>
(rowptr, colind, nzval);
}
#endif
Expand Down
2 changes: 1 addition & 1 deletion packages/amesos2/src/Amesos2_Superlu_def.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -1060,7 +1060,6 @@ Superlu<Matrix,Vector>::triangular_solve_factor()
if (data_.options.ConditionNumber == SLU::YES) {
using STM = Teuchos::ScalarTraits<magnitude_type>;
const magnitude_type eps = STM::eps ();
int n = data_.perm_r.extent(0);

SCformat *Lstore = (SCformat*)(data_.L.Store);
int nsuper = 1 + Lstore->nsuper;
Expand All @@ -1077,6 +1076,7 @@ Superlu<Matrix,Vector>::triangular_solve_factor()
condition_flag = (((double)max_cols * nsuper) * eps * multiply_fact >= data_.rcond);

#ifdef HAVE_AMESOS2_VERBOSE_DEBUG
int n = data_.perm_r.extent(0);
std::cout << this->getComm()->getRank()
<< " : anorm = " << data_.anorm << ", rcond = " << data_.rcond << ", n = " << n
<< ", num super cols = " << nsuper << ", max super cols = " << max_cols
Expand Down
4 changes: 2 additions & 2 deletions packages/amesos2/src/Amesos2_Superludist_FunctionMap.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -532,14 +532,14 @@ namespace Amesos2 {
}

static void gsequ_loc(SLUD::SuperMatrix* A, double* r, double* c,
double* rowcnd, double* colcnd, double* amax, int* info,
double* rowcnd, double* colcnd, double* amax, SLUD::int_t* info,
SLUD::gridinfo_t* grid)
{
SLUD::Z::pzgsequ(A, r, c, rowcnd, colcnd, amax, info, grid);
}

static void gsequ(SLUD::SuperMatrix* A, double* r, double* c,
double* rowcnd, double* colcnd, double* amax, int* info)
double* rowcnd, double* colcnd, double* amax, SLUD::int_t* info)
{
SLUD::Z::zgsequ_dist(A, r, c, rowcnd, colcnd, amax, info);
}
Expand Down
Loading