Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add install hooks for ATDM Trilinos configuration and ctest -S driver #2689

Closed
bartlettroscoe opened this issue May 8, 2018 · 15 comments
Closed
Assignees
Labels
ATDM Config Issues that are specific to the ATDM configuration settings ATDM DevOps Issues that will be worked by the Coordinated ATDM DevOps teams client: ATDM Any issue primarily impacting the ATDM project client: EMPIRE All issues that most directly target the ATDM EMPIRE code client: SPARC Issues related to or needed more specifically by the ATDM SPARC code stage: in progress Work on the issue has started type: enhancement Issue is an enhancement, not a bug

Comments

@bartlettroscoe
Copy link
Member

bartlettroscoe commented May 8, 2018

CC: @fryeguy52

Description

In order for the ATDM projects to adopt the new ATDM Trilinos Configuration and Jenkins Drivers (that submit to CDash as a byproduct), these scripts must support the installation of Trilinos and using Trilinos from that install.

For this to occur:

  • There must be a single that can be sourced sitting in the installation directory that will load the correct env to use that installed version of Trilinos. [Done]
  • Trilinos must be installeable using that ATDM Trilinos configuration from a Jenkins job

Possible solutions

First, a new env var ATDM_CONFIG_TRIL_CMAKE_INSTALL_PREFIX could be added that would be read in the ATDMDevEnvSettings.cmake file and would set the CMake var CMAKE_INSTALL_PREFIX. That would give you the correct configuration of Trilinos and would know where to install.

As for doing the installation itself, one simple solution would be to provide a atdm/install-prebuilt-trilinos.sh script that will run make install on the built Trilinos version. This script would need to be explicitly called by some Jenkins driver script. But any failures would not be posted to CDash.

If we wanted to allow for installation errors to get reported up to CDash, then a CTEST_DO_INSTALL option could be added to the TRIBITS_CTEST_DRIVER() function and would do a cmake_build( ... ) command that would run the install target (and in parallel) after the project build was complete. That would also allow logic for not installing if there was a build failure and, again, would allow for any install failures to get reported to CDash.

Also, to help to make installation testing for downstream customer codes stronger, the smart-jenkins-driver.sh script could be updated to move the source and build directories out of the way to catch mistakes were the installed configuration files point into the source or build trees. This could be accomplished by adding the steps:

  • Run the smart-jenkins-driver.sh script under $WORKSPACE/Trilinos
  • If $WORKSPACE/moved/SRC_AND_BUILD/ already exists, the move it back to $WORKSPACE/SRC_AND_BUILD/
  • After the build, installation and testing are completed, move $WORKSPACE/SRC_AND_BUILD/ to $WORKSPACE/moved/SRC_AND_BUILD/.

NOTE: The reason to move $WORKSPACE/SRC_AND_BUILD/ to $WORKSPACE/moved/SRC_AND_BUILD/ instead of say $WORKSPACE/SRC_AND_BUILD.moved/ is to catch errors where the installation under $WORKSPACE/local_install/ might have relative paths to $WORKSPACE/SRC_AND_BUILD/BUILD/ which would also work for $WORKSPACE/SRC_AND_BUILD.moved/BUILD/ but would not work for $WORKSPACE/moved/SRC_AND_BUILD/BUILD/. Also, the reason to move $WORKSPACE/SRC_AND_BUILD/ instead of deleting it is to avoid having to re-clone the Trilinos git repo again from scratch every time the job runs and to allow looking through the build artifacts on Jenkins after the job is complete.

Lastly, in order to support loading the correct env from a Trilinos install, the relevant files from Trilinos/cmake/std/atdm/ need to be installed that match the configuration of Trilinos being installed. At a minimum, this must include, for example:

  • Trilinos/cmake/std/atdm/load-dev.sh
  • Trilinos/cmake/std/atdm/utils/
  • Trilinos/cmake/std/atdm/<system-name>/environment.sh

But that would allow loading any env, including those that don't match the install. So the Trilinos install hooks should also be updated to install a script <install-prefix>/load_matching_env.sh that takes no arguments and will source the installed atdm/load-env.sh script with the right job name. For example, this installed script load_matching_env.sh might look like:

source <install-prefix>/share/atdm-trilinos/load-env.sh Trilinos-cuda-9.2-opt

The a client like EMPIRE or SPARC would just source:

source <trilinos-install-prefix>/load_matching_env.sh

and then inspect the exported ATDM_CONFIG_* vars and know that the right compilers, MPI, TPL locations, etc. were loaded to correctly use that installation of Trilinos.

@bartlettroscoe bartlettroscoe added type: enhancement Issue is an enhancement, not a bug client: ATDM Any issue primarily impacting the ATDM project labels May 8, 2018
@bartlettroscoe bartlettroscoe added ATDM Config Issues that are specific to the ATDM configuration settings client: EMPIRE All issues that most directly target the ATDM EMPIRE code labels May 15, 2018
@bartlettroscoe
Copy link
Member Author

CC: @dridzal, @trilinos/framework

NOTE: Adding a CTEST_DO_INSTALL option as described above and running that as part of PR testing would catch issues like #2785 before they got into the 'develop' branch. Doing the install is pretty cheap compared to building everything and testing. At the very least, the Trilinos "Clean" nightly builds should turn this option on. But it would be better to do the install pre-merge to 'develop'.

Therefore, adding a CTEST_DO_INSTALL option to TRIBITS_CTEST_DRIVER() will kill two birds with one stone so we need to do this very soon.

@dridzal
Copy link
Contributor

dridzal commented May 24, 2018

Agreed. I would like ROL's nightly testing to catch this as well (and not just PR testing). Are these options added at the Trilinos level or do I need to modify my scripts?

@bartlettroscoe
Copy link
Member Author

Are these options added at the Trilinos level or do I need to modify my scripts?

@dridzal, yes, once I add that option CTEST_DO_INSTALL to TRIBITS_CTEST_DRIVER() and update the snaphsot of TriBITS in Trilinos, then you can just set the install prefix (to a subdir under the build tree) and then set CTEST_DO_INSTALL=TRUE and that will be it. Once we add this for ATDM testing I will tell you how to do this for your own ROL testing.

@bartlettroscoe
Copy link
Member Author

CC: @bathmatt, @jmgate, @trilinos/framework, @fryeguy52

Note that the install hook defined here and the testing it would represent would not have caught the Kokkos installation problem reported in #2883. In order to catch that, you would need to delete (or move) the Trilinos source directories and binary directories after the install of Trilinos, and then have a separate process try to build against these. That type of process should be used for testing installation of Trilinos as part of upgrades of Trilinos for ATDM APPs but that goes beyond the scope of this story.

However, this story should consider implementing the minimum features needed for that process which are to:

  • Run the smart-jenkins-driver.sh script under $WORKSPACE/Trilinos
  • If $WORKSPACE/SRC_AND_BUILD.moved/ already exists, the move it back to $WORKSPACE/SRC_AND_BUILD/
  • After the build, installation and testing is completed, move $WORKSPACE/SRC_AND_BUILD/ to $WORKSPACE/SRC_AND_BUILD.moved/.

Then when the follow-on Jenkins job tries to build against this installation of Trilinos, if anything is pointing into the source or build trees, it will fail the build. If this Jenkins driver would have been in place for the EMPIRE Jenkins Pipeline process, then EMPIRE would have never have accepted this version of Trilinos and a lot of wasted time would have been avoided. I will add these moves to the scope of this issue. That is not a lot of work.

@bartlettroscoe
Copy link
Member Author

CC: @fryeguy52

@bathmatt and @jmgate,

FYI: I updated the above "Proposed Solutions" section to describe how to provide a <trilinos-install-prefix>/load_matching_env.sh script that would allow any client to load the env needed to successfully build against that installed version of Trilinos. This should not be too hard to do. The only problem is how to add automated tests for this to make sure this works (other than just building EMPIRE against it but that might be okay).

@bartlettroscoe
Copy link
Member Author

@bathmatt, is the name of the script that gets installed with Trilinos that loads the env negotiable? Currently it looks like the EM-Plasma/BuildScripts Trilinos install scripts install <trilinos-install-prefix>/configure.sh but the name configure.sh is not very accurate. When I see the name configure.sh I expect that it is running CMake to create Makefiles or ninja.build but that is not what that script is doing. It is only loading and env. Instead can we call this <trilinos-install-prefix>/load_matching_env.sh? Or, we could provide a CMake cache var called something like ATDM_INSTALLED_ENV_LOAD_SCRIPT_NAME and then EMPIRE could configure Trilinos with:

  -D ATDM_INSTALLED_ENV_LOAD_SCRIPT_NAME=configure.sh

?

If I don't hear back from you, I think I will just implement this with the CMake cache var ATDM_INSTALLED_ENV_LOAD_SCRIPT_NAME which defaults to load_matching_env.sh.

@bartlettroscoe bartlettroscoe changed the title Add install hooks to ATDM Trilinos configuration and ctest -S driver Add install hooks for ATDM Trilinos configuration and ctest -S driver Sep 26, 2018
@bartlettroscoe bartlettroscoe added the stage: in progress Work on the issue has started label Sep 26, 2018
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Sep 27, 2018
…AME (trilinos#2689)

See the README.md file for details.

I also renamed the env var set in atdm/load-env.sh from JOB_NAME to
ATDM_CONFIG_JOB_NAME.  This is to avoid confusing and other problems with the
unnamespaced var JOB_NAME which is set by Jenkins.  This does not change the
behavior of the CTest -S jenkins drivers that directly read from JOB_NAME.
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Sep 27, 2018
…AME (trilinos#2689)

See the README.md file for details.

I also renamed the env var set in atdm/load-env.sh from JOB_NAME to
ATDM_CONFIG_JOB_NAME.  This is to avoid confusing and other problems with the
unnamespaced var JOB_NAME which is set by Jenkins.  This does not change the
behavior of the CTest -S jenkins drivers that directly read from JOB_NAME.
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Sep 27, 2018
…AME (trilinos#2689)

See the README.md file for details.

I also renamed the env var set in atdm/load-env.sh from JOB_NAME to
ATDM_CONFIG_JOB_NAME.  This is to avoid confusing and other problems with the
unnamespaced var JOB_NAME which is set by Jenkins.  This does not change the
behavior of the CTest -S jenkins drivers that directly read from JOB_NAME.
@bartlettroscoe
Copy link
Member Author

@bathmatt and @jmgate,

FYI: PR #3521 implements the installation of the env loading scripts and allows you to configure with:

  -D CMAKE_INSTALL_PREFIX=<install-prefix> \
  -D ATDM_INSTALLED_ENV_LOAD_SCRIPT_NAME=configure.sh \

and then after installation run:

source <install-prefix>/configure.sh

and you should be off to the races.

Please review the updated documentation at:

and comment in PR #3521.

bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Sep 27, 2018
…AME (trilinos#2689)

See the README.md file for details.

I also renamed the env var set in atdm/load-env.sh from JOB_NAME to
ATDM_CONFIG_JOB_NAME.  This is to avoid confusing and other problems with the
unnamespaced var JOB_NAME which is set by Jenkins.  This does not change the
behavior of the CTest -S jenkins drivers that directly read from JOB_NAME.
trilinos-autotester added a commit that referenced this issue Sep 27, 2018
…g-install

Automatically Merged using Trilinos Pull Request AutoTester
PR Title: Add install of ATDM env scripts, rename JOB_NAME to ATDM_CONFIG_JOB_NAME (#2689)
PR Author: bartlettroscoe
@bartlettroscoe
Copy link
Member Author

FYI: PR #3521 was just merged that installs a <install-prefix>/load_matching_env.sh file that you just source.

Now we just need the install hooks for our CTest -S driver.

@bartlettroscoe bartlettroscoe removed the stage: in progress Work on the issue has started label Sep 27, 2018
@bartlettroscoe
Copy link
Member Author

Actually, EMPIRE is going to use their own Jenkins drivers so they don't need the ATDM Trilinos CTest -S driver to do installs at this time. Therefore, the needs of EMPIRE for this story are satisfied so I will remove the "EMPIRE" label and add the "SPARC" label since we do plan to install Trilinos as a biproduct of the ATDM Trilinos builds for SPARC (or run another set of ATDM Trilinos builds for the merge of SPARC Trilinos 'master' and Trilinos Github 'develop').

@bartlettroscoe bartlettroscoe added client: SPARC Issues related to or needed more specifically by the ATDM SPARC code and removed client: EMPIRE All issues that most directly target the ATDM EMPIRE code labels Oct 10, 2018
@bartlettroscoe bartlettroscoe added the ATDM DevOps Issues that will be worked by the Coordinated ATDM DevOps teams label Oct 26, 2018
tjfulle pushed a commit to tjfulle/Trilinos that referenced this issue Dec 6, 2018
…AME (trilinos#2689)

See the README.md file for details.

I also renamed the env var set in atdm/load-env.sh from JOB_NAME to
ATDM_CONFIG_JOB_NAME.  This is to avoid confusing and other problems with the
unnamespaced var JOB_NAME which is set by Jenkins.  This does not change the
behavior of the CTest -S jenkins drivers that directly read from JOB_NAME.
@bartlettroscoe bartlettroscoe added client: EMPIRE All issues that most directly target the ATDM EMPIRE code stage: in progress Work on the issue has started labels Jan 4, 2019
bartlettroscoe added a commit to TriBITSPub/TriBITS that referenced this issue Jan 12, 2019
I added tests for passing and failing installs.  I only implemented this for
the all-at-once approach since that is all we need right now for ATDM
Trilinos.  But we could implement it for the package-by-package mode if needed
without much trouble.

Note that as part of this I fixed an oversight where build failures that did
not cause test failures would be ignored in that all-at-once approach.  That
was wrong.  Now if there are any build or install failures, it will assume
that any (read that 'all') of the tested packages may have failed.  See the
long "NOTE" comment about this.  Long-story-short, this will only really
impact CI builds where there are build failures.  If there are just test
failures (the more common case), then only the packages with failing tests are
flagged as failed and will be enabled on the next CI iteration.
bartlettroscoe added a commit to TriBITSPub/TriBITS that referenced this issue Apr 23, 2019
My last commit broke the use case where an optional component (TriBITS
package) is missing (i.e. was not enabled).  This broke the SPARC use case
where it looks for TriKota but TriKota can be missing.

I added tests for that use case and strengthened the existing tests.  I broke
the failing test into two ctest tests since you have to reconfigure from
scratch anyway so there is no benefit to keeping these together.
bartlettroscoe added a commit to TriBITSPub/TriBITS that referenced this issue Apr 23, 2019
My last commit broke the use case where an optional component (TriBITS
package) is missing (i.e. was not enabled).  This broke the SPARC use case
where it looks for TriKota but TriKota can be missing.

I added tests for that use case and strengthened the existing tests.  I broke
the failing test into two ctest tests since you have to reconfigure from
scratch anyway so there is no benefit to keeping these together.

Build/Test Cases Summary
Enabled Packages:
Enabled all Packages
0) MPI_DEBUG => passed: passed=355,notpassed=0 (1.16 min)
1) SERIAL_RELEASE => passed: passed=355,notpassed=0 (1.22 min)
bartlettroscoe added a commit that referenced this issue Apr 23, 2019
Origin repo remote tracking branch: 'github/master'
Origin repo remote repo URL: 'github = [email protected]:TriBITSPub/TriBITS.git'

At commit:

commit 1d1334bbe67bb82184c64445a6d946d4a5ad35b7
Author:  Roscoe A. Bartlett <[email protected]>
Date:    Tue Apr 23 10:17:36 2019 -0600
Summary: Fix case for optional missing component (#2689)
bartlettroscoe added a commit that referenced this issue Apr 23, 2019
…dm-nightly (#2689, #4993)

Merging PR #4993 branch directly to 'atdm-nightly' to ensure it gets into ATDM
builds tomorrow.  This should fix SPARC build problems described in #4993
related to install changes related to #4993.

NOTE: Since the Trilinos PR tester does not currently test the installed
TrilinosConfig.cmake file it provides no value in testing a change like this.

Build/Test Cases Summary
Enabled Packages: Kokkos, Teuchos
Disabled Packages: PyTrilinos,Claps,TriKota
0) MPI_RELEASE_DEBUG_SHARED_PT_OPENMP => passed: passed=170,notpassed=0 (3.01 min)
Other local commits for this build/test group: f1b5c79
trilinos-autotester added a commit that referenced this issue Apr 23, 2019
Automatically Merged using Trilinos Pull Request AutoTester
PR Title: Fix case for optional missing component (#2689)
PR Author: bartlettroscoe
bartlettroscoe added a commit that referenced this issue Apr 26, 2019
Origin repo remote tracking branch: 'github/master'
Origin repo remote repo URL: 'github = [email protected]:TriBITSPub/TriBITS.git'

At commit:

commit c6b9a85872eacce51dfaadbb87e61c9ed10cedf6
Author:  Roscoe A. Bartlett <[email protected]>
Date:    Thu Apr 25 12:47:14 2019 -0600
Summary: Add tribits/core install_package_by_package tests and <Project>_INSTALL_PBP_RUNNER option (#2689, ATDV-156)
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Apr 27, 2019
…velop (trilinos#2689, ATDV-156)

Allow doing install using 'run-as-atdm-devops-admin' (see ATDV-156).  This is
part of the general install story trilinos#2689.
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Apr 27, 2019
…s#2689, ATDV-151, ATDV-156)

This is mainly to allow the usage of installation as the 'atdm-devops-admin'
user account using tthe 'jenkins' on various systems and therefore protect the
installs of Trilinos from bad 'jenkins' jobs.  There is a lot little things
you need to do to get this work:

* ATDM_CONFIG_WORKSPACE_BASE[_DEFAULT]: Use a different workspace for
  SRC_AND_BUILD (which will allow for reading by the 'atdm-devops-admin'
  account).

* ATDM_CONFIG_INSTALL_PBP_RUNNER[_DEFAULT]: Allow inserting the
  'run-as-atdm-devops-admin' setuid program to run the install command as the
  'atdm-devops-admin' user.

* ATDM_CONFIG_TRIL_CMAKE_INSTALL_PREFIX_DATE_BASE[_DEFAULT]: Set a default
  base directory for a given system

* ATDM_CONFIG_USE_JENKINS_INSTALL_DEFAULTS: Set to '1' to use the defaults for
  the above three vars.

See updated documentation for more details.
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Apr 29, 2019
…s#2689, ATDV-151, ATDV-156)

This is mainly to allow the usage of installation as the 'atdm-devops-admin'
user account using the 'jenkins' entity account on various systems and
therefore protect the installs of Trilinos from bad 'jenkins' jobs.  There is
a lot little things you need to do to get this work:

* ATDM_CONFIG_WORKSPACE_BASE[_DEFAULT]: Use a different workspace for
  SRC_AND_BUILD (which will allow for reading by the 'atdm-devops-admin'
  account).  (And you must 'cd' into that workspace.)

* ATDM_CONFIG_INSTALL_PBP_RUNNER[_DEFAULT]: Allow inserting the
  'run-as-atdm-devops-admin' setuid program to run the install command as the
  'atdm-devops-admin' user.

* ATDM_CONFIG_TRIL_CMAKE_INSTALL_PREFIX_DATE_BASE[_DEFAULT]: Set a default
  base directory for a given system.

* ATDM_CONFIG_USE_JENKINS_INSTALL_DEFAULTS: Set to '1' to use the defaults for
  the above three vars.

See updated documentation for more details.
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Apr 29, 2019
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Apr 29, 2019
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Apr 29, 2019
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Apr 29, 2019
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Apr 29, 2019
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Apr 29, 2019
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Apr 29, 2019
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Apr 29, 2019
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Apr 29, 2019
…s#2689, ATDV-151, ATDV-156)

This is mainly to allow the usage of installation as the 'atdm-devops-admin'
user account using the 'jenkins' entity account on various systems and
therefore protect the installs of Trilinos from bad 'jenkins' jobs.  There is
a lot little things you need to do to get this work:

* ATDM_CONFIG_WORKSPACE_BASE[_DEFAULT]: Use a different workspace for
  SRC_AND_BUILD (which will allow for reading by the 'atdm-devops-admin'
  account).  (And you must 'cd' into that workspace.)

* ATDM_CONFIG_INSTALL_PBP_RUNNER[_DEFAULT]: Allow inserting the
  'run-as-atdm-devops-admin' setuid program to run the install command as the
  'atdm-devops-admin' user.

* ATDM_CONFIG_TRIL_CMAKE_INSTALL_PREFIX_DATE_BASE[_DEFAULT]: Set a default
  base directory for a given system.

* ATDM_CONFIG_USE_JENKINS_INSTALL_DEFAULTS: Set to '1' to use the defaults for
  the above three vars.

See updated documentation for more details.
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue May 2, 2019
…s:develop' (bad6c8f).

* trilinos-develop:
  Add ctest -S driver for waterman_cuda-9.2_shared_opt for SPARC (ATDV-151)
  Add install stuff for 'waterman' (ATDV-156, ATDV-151)
  Set up for alternate workspace, install as atdm-devops, etc. (trilinos#2689, ATDV-151, ATDV-156)
  Automatic snapshot commit from tribits at c6b9a85
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue May 2, 2019
…s:develop' (bad6c8f).

* trilinos-develop:
  Add ctest -S driver for waterman_cuda-9.2_shared_opt for SPARC (ATDV-151)
  Add install stuff for 'waterman' (ATDV-156, ATDV-151)
  Set up for alternate workspace, install as atdm-devops, etc. (trilinos#2689, ATDV-151, ATDV-156)
  Automatic snapshot commit from tribits at c6b9a85
bartlettroscoe added a commit to TriBITSPub/TriBITS that referenced this issue May 21, 2019
@bartlettroscoe
Copy link
Member Author

Technically made more progress on improving installing in PRs #7285 and #7297.

That last bit of scope stopping this story from being closed described above would be to move the build and source trees out of the way after the install. That would make the install testing a little stronger but I fear that may cause some confusion and may not react well to other scripts (like that various ATDM Trilinos driver scripts).

I think I should just close this issue and leave that last bit of scope undone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ATDM Config Issues that are specific to the ATDM configuration settings ATDM DevOps Issues that will be worked by the Coordinated ATDM DevOps teams client: ATDM Any issue primarily impacting the ATDM project client: EMPIRE All issues that most directly target the ATDM EMPIRE code client: SPARC Issues related to or needed more specifically by the ATDM SPARC code stage: in progress Work on the issue has started type: enhancement Issue is an enhancement, not a bug
Projects
None yet
Development

No branches or pull requests

2 participants