-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get pre-installed Kokkos and Trilinos working with native Kokkos non-TriBITS CMake build system (#11545) #11863
Conversation
demo-external-kokkos testing for clang-11.0.1I tested the repo setup shown above with non-CUDA This posted to CDash at: showing results: NOTE: The Kokkos build does not show up because it is not a TriBITS build anymore in this simulation. The details for this are given below. demo-external-kokkos details for clang-11.0.1For the Trilinos repos state:
Running the 'demo-external-kokkos' simulation using the Trilinos PR GenConfig 'clang-11.0.1' build:
That submitted results to CDash at: Now the configure output for the TriBITS Build of KokkosKernels here shows:
and:
The above shows that TriBITS allows for the non-fully TriBITS-compliant external package Trilinos configure output is shown here showing:
and
So we see the same type of behavior with |
demo-external-kokkos testing for cudaI tested the repo setup shown above with non-CUDA This posted to CDash at: showing results: Most of the 10 failing tests shown above were timeouts. When I ran the failing test against with
Those tests fail because they call
So when using pre-installed Kokkos with CUDA, you can't get the compilers from Trilinos by calling The full details for this are given below. demo-external-kokkos details for cuda-11.4.2-uvm-offNow to do the CUDA build on 'ascicgpu17'. The machine is totally empty right now. The Trilinos repo state and versions are:
Running the 'demo-external-kokkos' simulation using the Trilinos PR GenConfig 'cuda-11.4.2-uvm-off' build:
That submitted to CDash: The failing Kokkos tests were:
The test
There is no clue why that test failed. If the timing was off, what was the timing tolerance? The test
Looking at the source code for the tests, both of these tests appear to be timing releated tests. The Trilinos results with failures showed:
So most of those failures were timeouts. Rerunning just the failured tests one test at a time:
So why is find_package() failing? That is not good. The find_package() command failes with the error:
The line of code that failed was:
The problem is that the find_package() test does not set a compiler at all so The test
So the problem in that case is that you have to define the compilers before you call So we need to decide if we will give up on showing users how to pull in compilers from |
Talking with @jwillenbring yesterday, the correct path forward is to update the |
6833394
to
0d364f8
Compare
Origin repo remote tracking branch: 'github/master' Origin repo remote repo URL: 'github = [email protected]:TriBITSPub/TriBITS.git' Git describe: vera-release-3.5-start-1789-gb669da78 At commit: commit 9e6f4b999cfc38fcc40475491dc962ae514c08f2 Author: Roscoe A. Bartlett <[email protected]> Date: Thu May 11 13:36:23 2023 -0600 Summary: Fix some old misspellings caught by codespell tests (#63)
demo-external-kokkos testing for clang-11.0.1 (2023-05-13)After snaphshotting updated TriBITS into this Trilinos branch gitlab-ex.sandia.gov/rabartl/demo-external-kokkos/-/tree/11545-kokkos-use-native-cmake (see exact details below). This submitted to CDash at: showing: NOTE: The Kokkos build does not show up because it is not a TriBITS build anymore in this simulation. The details for this are given below. demo-external-kokkos details for clang-11.0.1 (click to expand)For the Trilinos repos where:
Running the 'clang-11.0.1' (re)build on 'crf450':
This produced:
and was submited to the CDash site at: |
demo-external-kokkos testing for cuda-11.4.2-uvm-off (2023-05-13)After snaphshotting updated TriBITS into this Trilinos branch (see exact details below). This submitted to CDash at: showing: NOTE: The Kokkos build does not show up because it is not a TriBITS build anymore in this simulation. NOTE: The one failing test was:
When I ran that test again alone, it passed. The details for this are given below. demo-external-kokkos details for cuda-11.4.2-uvm-off (click to expand)The Trilinos repos were:
Running the 'cuda-11.4.2-uvm-off' build simulation on 'ascicgpu17':
That produced:
submitting to CDash at: The one failing test was:
I reran just that one test:
So running with |
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-11.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-2021.3
Jenkins Parameters
Using Repos:
Pull Request Author: bartlettroscoe |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-11.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-2021.3
Jenkins Parameters
|
FYI: Note that only a small number of The Kokkos developers will work to get those missing vars added to the native |
…ilinos (#6157) * Add Kokkos::all_libs alias target for compatibility with Trilinos * Add comments and fixup for cmake version less than 3.18
…11545) This makes the CUDA build work since KokkosConfig.cmake checks that the nvcc compiler is being used when it is included. This does not mean that customers still can't the compilers from find_package(Trilinos), it is just that it will not work with CUDA. This also means we are no longer testing getting the compilers from Trilinos. But given that customers that use CUDA can't do that anyway, we should consider removing the compilers from the TriBITS-generated <Package>Config.cmake and <Project>Config.cmake files.
This commit does a few things: * A removeInstall test is split off from the doInstall test * A new simpleBuildAgainstTrilinos/CMakeLists.by_package.cmake file is created to show how to find the packages you want from Trilinos and not actually find Trilinos. * New test simpleBuildAgainstTrilinos_by_package_build_tree_name is added which depends on removeInstall and points to the build tree. This demonstrates and protects using Trilinos packages from the build tree and how to find Trilinos by packages instead of as a big thing.
@@ -6,7 +6,12 @@ | |||
# upper-case version for use within | |||
|
|||
set(Kokkos_OPTIONS_NOT_TO_EXPORT | |||
Kokkos_ENABLE_TESTS Kokkos_ENABLE_EXAMPLES) | |||
Kokkos_ENABLE_BENCHMARKS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L9-14 changes correspond to kokkos/kokkos#6142 (merged)
@@ -31,7 +31,7 @@ FUNCTION(KOKKOS_TPL_OPTION PKG DEFAULT) | |||
ENDIF() | |||
ENDFUNCTION() | |||
|
|||
KOKKOS_TPL_OPTION(HWLOC Off) | |||
KOKKOS_TPL_OPTION(HWLOC Off TRIBITS HWLOC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L34 changes correspond to kokkos/kokkos#6176 (merged)
@@ -1,7 +1,5 @@ | |||
# Adding source directory to the build | |||
LIST(APPEND KK_INCLUDE_DIRS ${CMAKE_CURRENT_SOURCE_DIR}/common/src) | |||
LIST(APPEND KK_INCLUDE_DIRS ${CMAKE_CURRENT_SOURCE_DIR}/common/src/impl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L3-4 changes correspond to kokkos/kokkos-kernels#1844 (merged)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: This change to KokkosKernels is only needed to get the new test TrilinosInstallTests_simpleBuildAgainstTrilinos_by_package_build_tree
to pass. (And see my comment above about that.)
@bartlettroscoe I went through the kokkos and kokkos-kernels changes in this PR to explicitly cross-reference with the PRs to their respective directories. All lines here are accounted for in merged PRs, but the changes in kokkos/kokkos#6138 are not present here, can you clarify if those changes should be present here? |
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
@ndellingwood, the changes in kokkos/kokkos#6138 were pulled off of this Trilinos PR branch as described above and replaced with the newer merged Kokkos PRs. Therefore, the Kokkos patches in this Trilinos branch should be reflected with the merged Kokkos PRs (as described above). Please let me know if you have any other questions. Otherwise, I think this PR should be ready to merge (and the next snapshot of Kokkkos taken off of the Kokkos 'develop' branch will not overwrite these changes). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work and effort on this @bartlettroscoe !
We typically do not support making changes directly to kokkos in Trilinos, preferring all changes to occur via release snapshot, but make an exception for this PR to allow these updates to be further tested and validated before the upcoming 4.1.0 release (which will clobber the kokkos and kokkos-kernels changes with identical changes already approved and merged to their respective repos)
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ ndellingwood ]! |
Status Flag 'Pull Request AutoTester' - Pull Request will be Automerged |
Merge on Pull Request# 11863: IS A SUCCESS - Pull Request successfully merged |
Certainly you want to avoid making changes on the Trilinos side when using the snapshotting approach. But there will be times when changes on the Trilinos side will need to be made for various reasons. Because Kokkos 'develop' does not maintain backward compatibility with the last release, there really is not much of an option. (But I guess I could have created a branch in the Kokkos repo and used that for this work instead of the version of Kokkos in Trilinos.) |
The new test `TrilinosInstallTests_simpleBuildAgainstTrilinos_by_package_build_tree` merged from PR #11863 fails because the subdirs ${CMAKE_CURRENT_BINARY_DIR}/common and ${CMAKE_CURRENT_SOURCE_DIR}/common because this CMakeLists.txt file already sits in the kokkos-kernels/common/ subdir. I don't know why this error did not happen with PR testing for PR #11863 but this is clearly the right thing to do.
…s:develop' (ab899a0). * trilinos-develop: (22 commits) Remove non-existant subdir kokkos-kernels/common/common (trilinos#11921, trilinos#11863) Teuchos: Fixing cmake logic Teuchos: Fixing catch() issues with C++ language drift TrilinosSS: include <omp.h> (Fix trilinos#11867) MueLu hierarchical: Fix build error Tpetra: Changes to StaticView for Kokkos PTHREAD to THREADS change Teuchos: Automatically enabling Tecuhos_ENABLE_THREAD_SAFE if you have Kokkos THREADS or OPENMP for the host Stokhos: Add missing KOKKOS_INLINE_FUNCTION to fix build errors on HIP Phalanx: Remove usage of undefined var Kokkos_INCLUDE_DIRS (trilinos#11545) Kokkos: Mark HWLOC as a TriBITS TPL as well (trilinos#11938) Update for removal of Kokkos subpackages and Kokkos test renamings (trilinos#11545, trilinos#11808) KokkosKernels: Remove non-existent common/src/[impl,tpls] include dirs (trilinos#11545) Add test simpleBuildAgainstTrilinos_by_package_build_tree_name (trilinos#11545) Pass in and define compilers before calling find_package(Trilinos) (trilinos#11545) Add `Kokkos::all_libs` alias target for compatibility with TriBITS/Trilinos (trilinos#6157) Export Kokkos_ENABLE_<OPTION> that are relevant Do not append to Kokkos_OPTIONS variables those in the do not export list Expand list of kokkos options not to export with cmake Tpetra: Don't use std::binary_function Tpetra: Fixing missing HIP tesT ...
…s:develop' (ab899a0). * trilinos-develop: (22 commits) Remove non-existant subdir kokkos-kernels/common/common (trilinos#11921, trilinos#11863) Teuchos: Fixing cmake logic Teuchos: Fixing catch() issues with C++ language drift TrilinosSS: include <omp.h> (Fix trilinos#11867) MueLu hierarchical: Fix build error Tpetra: Changes to StaticView for Kokkos PTHREAD to THREADS change Teuchos: Automatically enabling Tecuhos_ENABLE_THREAD_SAFE if you have Kokkos THREADS or OPENMP for the host Stokhos: Add missing KOKKOS_INLINE_FUNCTION to fix build errors on HIP Phalanx: Remove usage of undefined var Kokkos_INCLUDE_DIRS (trilinos#11545) Kokkos: Mark HWLOC as a TriBITS TPL as well (trilinos#11938) Update for removal of Kokkos subpackages and Kokkos test renamings (trilinos#11545, trilinos#11808) KokkosKernels: Remove non-existent common/src/[impl,tpls] include dirs (trilinos#11545) Add test simpleBuildAgainstTrilinos_by_package_build_tree_name (trilinos#11545) Pass in and define compilers before calling find_package(Trilinos) (trilinos#11545) Add `Kokkos::all_libs` alias target for compatibility with TriBITS/Trilinos (trilinos#6157) Export Kokkos_ENABLE_<OPTION> that are relevant Do not append to Kokkos_OPTIONS variables those in the do not export list Expand list of kokkos options not to export with cmake Tpetra: Don't use std::binary_function Tpetra: Fixing missing HIP tesT ...
…s:develop' (ab899a0). * trilinos-develop: (22 commits) Remove non-existant subdir kokkos-kernels/common/common (trilinos#11921, trilinos#11863) Teuchos: Fixing cmake logic Teuchos: Fixing catch() issues with C++ language drift TrilinosSS: include <omp.h> (Fix trilinos#11867) MueLu hierarchical: Fix build error Tpetra: Changes to StaticView for Kokkos PTHREAD to THREADS change Teuchos: Automatically enabling Tecuhos_ENABLE_THREAD_SAFE if you have Kokkos THREADS or OPENMP for the host Stokhos: Add missing KOKKOS_INLINE_FUNCTION to fix build errors on HIP Phalanx: Remove usage of undefined var Kokkos_INCLUDE_DIRS (trilinos#11545) Kokkos: Mark HWLOC as a TriBITS TPL as well (trilinos#11938) Update for removal of Kokkos subpackages and Kokkos test renamings (trilinos#11545, trilinos#11808) KokkosKernels: Remove non-existent common/src/[impl,tpls] include dirs (trilinos#11545) Add test simpleBuildAgainstTrilinos_by_package_build_tree_name (trilinos#11545) Pass in and define compilers before calling find_package(Trilinos) (trilinos#11545) Add `Kokkos::all_libs` alias target for compatibility with TriBITS/Trilinos (trilinos#6157) Export Kokkos_ENABLE_<OPTION> that are relevant Do not append to Kokkos_OPTIONS variables those in the do not export list Expand list of kokkos options not to export with cmake Tpetra: Don't use std::binary_function Tpetra: Fixing missing HIP tesT ...
Agreed 👍 |
…s:develop' (ab899a0). * trilinos-develop: (23 commits) Remove non-existant subdir kokkos-kernels/common/common (trilinos#11921, trilinos#11863) Teuchos: Fixing cmake logic Teuchos: Fixing catch() issues with C++ language drift fastilu: Fix memory leak. TrilinosSS: include <omp.h> (Fix trilinos#11867) MueLu hierarchical: Fix build error Tpetra: Changes to StaticView for Kokkos PTHREAD to THREADS change Teuchos: Automatically enabling Tecuhos_ENABLE_THREAD_SAFE if you have Kokkos THREADS or OPENMP for the host Stokhos: Add missing KOKKOS_INLINE_FUNCTION to fix build errors on HIP Phalanx: Remove usage of undefined var Kokkos_INCLUDE_DIRS (trilinos#11545) Kokkos: Mark HWLOC as a TriBITS TPL as well (trilinos#11938) Update for removal of Kokkos subpackages and Kokkos test renamings (trilinos#11545, trilinos#11808) KokkosKernels: Remove non-existent common/src/[impl,tpls] include dirs (trilinos#11545) Add test simpleBuildAgainstTrilinos_by_package_build_tree_name (trilinos#11545) Pass in and define compilers before calling find_package(Trilinos) (trilinos#11545) Add `Kokkos::all_libs` alias target for compatibility with TriBITS/Trilinos (trilinos#6157) Export Kokkos_ENABLE_<OPTION> that are relevant Do not append to Kokkos_OPTIONS variables those in the do not export list Expand list of kokkos options not to export with cmake Tpetra: Don't use std::binary_function ...
…s:develop' (ab899a0). * trilinos-develop: (23 commits) Remove non-existant subdir kokkos-kernels/common/common (trilinos#11921, trilinos#11863) Teuchos: Fixing cmake logic Teuchos: Fixing catch() issues with C++ language drift fastilu: Fix memory leak. TrilinosSS: include <omp.h> (Fix trilinos#11867) MueLu hierarchical: Fix build error Tpetra: Changes to StaticView for Kokkos PTHREAD to THREADS change Teuchos: Automatically enabling Tecuhos_ENABLE_THREAD_SAFE if you have Kokkos THREADS or OPENMP for the host Stokhos: Add missing KOKKOS_INLINE_FUNCTION to fix build errors on HIP Phalanx: Remove usage of undefined var Kokkos_INCLUDE_DIRS (trilinos#11545) Kokkos: Mark HWLOC as a TriBITS TPL as well (trilinos#11938) Update for removal of Kokkos subpackages and Kokkos test renamings (trilinos#11545, trilinos#11808) KokkosKernels: Remove non-existent common/src/[impl,tpls] include dirs (trilinos#11545) Add test simpleBuildAgainstTrilinos_by_package_build_tree_name (trilinos#11545) Pass in and define compilers before calling find_package(Trilinos) (trilinos#11545) Add `Kokkos::all_libs` alias target for compatibility with TriBITS/Trilinos (trilinos#6157) Export Kokkos_ENABLE_<OPTION> that are relevant Do not append to Kokkos_OPTIONS variables those in the do not export list Expand list of kokkos options not to export with cmake Tpetra: Don't use std::binary_function ...
Kokkos code owners needs to get added back so that a Kokkos developer must approve any PR that changes Kokkos (likewise for any of the other snpashotted packages like KokkosKernels, STK, SEACAS, etc.). @sebrowne, can we find a way to get code owners working again for these snapshotted packages? |
@trilinos/framework, @trilinos/kokkos, @trilinos/kokkos-kernels
Description
This is part of:
This PR branches off of and continues the changes from the
11545-kokkos-no-subpkgs
branch from PR:This PR branch
11545-kokkos-use-native-cmake
contains minimal additions to Kokkos to allow it to be built and installed using its native non-TriBITS CMake build system and have it work with building downstream Trilinos packages. The main additions to the native Kokkos non-TriBITS CMake build system are:Kokkos::all_libs
target to the installedKokkosTargets.cmake
file. (That is needed for package-based auto linking in downstream TriBITS packages.)set()
statements for a bunch of Kokkos optionsKokkos_<INFO>
to the installedKokkosConfig.cmake
file. (Many of those vars are needed by downstream Trilinos packages like Tpetra.)This Trilinos PR branch includes an updated snapshot of TriBITS 'master' that includes the TriBITS PR:
This TriBITS update allows Trilinos to build against pre-installed Kokkos with this updated native CMake build system's installed
KokkosConfig.cmake
file (which lacks<UpstreamPkg>::all_libs
targets and upstream<UpstreamPkg>Config.cmake
files).Local testing
Testing this with Trilinos can be done using the Trilinos PR GenConfig configurations along with the simulation driver scripts in:
on the branch:
This was tested for the Trilinos repository state:
And the build directory is set up as:
This is the basic setup use for driving the build/test/install simulations.
See the actually builds and results in comments below (links are in the Remaining Task).
Remaining Tasks:
demo-external-kokkos
driver script:TrilinosInstallTests
tests with CUDA configuration ... See belowTrilinosInstallTests
tests to define C++ compiler first then callfind_package(Trilinos)
to make them pass with CUDA configurations ... See commit 2e6a11bdemo-external-kokkos
driver script against using just updated Trilinos branch '11545-kokkos-use-native-cmake':Kokkos::all_libs
alias target for compatibility with TriBITS/Trilinos kokkos/kokkos#6157 (see below).demo-external-kokkos
driver script against using just updated Trilinos branch '11545-kokkos-use-native-cmake':TrilinosInstallTests_simpleBuildAgainstTrilinos_by_package_build_tree
(see below)defaultdevicetype.shared_space
in testKokkos_CoreUnitTest_Default_MPI_1
(see Kokkos: Timing-based test Kokkos_CoreUnitTest_Default_MPI_1 randomly failing in CUDA PR builds #11940) ... See commit 59c2e4d