Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UFS-WM testing w/ spack-stack #1651

Closed
ulmononian opened this issue Mar 10, 2023 · 18 comments
Closed

UFS-WM testing w/ spack-stack #1651

ulmononian opened this issue Mar 10, 2023 · 18 comments
Labels
enhancement New feature or request

Comments

@ulmononian
Copy link
Collaborator

ulmononian commented Mar 10, 2023

Description

As the transition from hpc-stack to spack-stack is ongoing (e.g., #1448, #1621, Acorn spack testing, spack-stack #454, spack-stack #478) a new spack-based Unified Environment (UE) has been developed to help facilitate the switch. This environment contains a "unified" set of compiler+MPI (Intel & GNU), libraries/packages, and modules to support the UFS-WM and various related apps (e.g., global-workflow, SRW, JEDI Skylab, and GSI).

The preliminary (beta) installation has been installed by @climbfuji here on Orion: /work2/noaa/da/role-da/spack-stack-feature-r2d2-mysql/envs/unified-4.0.0-rc1/installand can be loaded via:

module use /work2/noaa/da/role-da/spack-stack-feature-r2d2-mysql/envs/unified-4.0.0-rc1/install/modulefiles/Core
module av

An initial testing round of the UFS-WM (as well as the global-workflow, SRW, SkyLab, and GSI) using the UE has been completed on Orion (@mark-a-potts successfully completed the full rt.sh suite w/ a new baseline). For a recent sample compile/run of cpld_control_p8, see: /work/noaa/stmp/cbook/stmp/cbook/FV3_RT/rt_198650). Some additional UFS-WM RTs have been performed with the UE on Parallel Works - AWS (e.g., cpld_control_c48); however, this testing is ongoing in collaboration with @yichengt90 / @clouden90.

Solution

Upon release of [email protected], the Unified Environment will be installed in official NOAA-EPIC & JCSDA locations on these spack-stack pre-configured sites. Given that, testing of the UFS-WM with the spack-stack UE will need to be expanded significantly. Ideally, the full set of RTs should be run on each machine; new baselines will more than likely be required.

Module files will need to be updated concomitantly with this testing (e.g.: https://github.com/ulmononian/ufs-weather-model/blob/test_spack/modulefiles/ufs_orion.intel.lua). For running on the cloud, various modifications also need to be made to the RT scripts and configuration files (i.e.: #1650; see https://github.com/ulmononian/ufs-weather-model/tree/feature/noaacloud_rt).

Further, ESMF library naming and linking needs to be addressed (see #1498), but is currently handled in spack via NOAA-EMC/spack #238. Note that recently merged PR #1645 addressed the removal of the static parallelio requirement, which is pertinent to implementing spack-stack as the UE uses shared parallelio (with an exception for operational machines).

This issue can be used to track some of the testing (successes & failures!) and hopefully facilitate some discussion about the transition.

Related to

may help address #1147, #1448
pertains to #1621

Butterfly test results look good: cpld_control_p8. Comparison of 500mb temperature impact between this PR and develop branch is here:
butterfly

Originally posted by @jkbk2004 in #1707 (comment)

@ulmononian ulmononian added the enhancement New feature or request label Mar 10, 2023
@ulmononian ulmononian changed the title UFS-WM RTs w/ spack-stack UFS-WM testing w/ spack-stack Mar 10, 2023
@jkbk2004
Copy link
Collaborator

@ulmononian Do you have any update regarding spack stack docker container? Library update for hdf-1.14.0/netcdf-4.9.1/esmf-8.4.1/mapl-2.35.2 is on-going priority. Following the update, EPIC needs to maintain the container used for Jenkins-CI pipeline in real time. Please, let me know if we need a quick tag-up for this.

@ulmononian
Copy link
Collaborator Author

@ulmononian Do you have any update regarding spack stack docker container? Library update for hdf-1.14.0/netcdf-4.9.1/esmf-8.4.1/mapl-2.35.2 is on-going priority. Following the update, EPIC needs to maintain the container used for Jenkins-CI pipeline in real time. Please, let me know if we need a quick tag-up for this.

the ufs-wm container based on the spack-stack unified environment package versions/variants will be delivered shortly after the release of [email protected]. i anticipate that they will be available by late next week or early the following week; once ready, i will let you know. one caveat to the ufs-wm spack-stack container is that we will not provide debug versions of MAPL or ESMF within the containers, as the current paradigm allows only one package version within the container.

for the interim, if you are interested, please have a look at the JEDI Skylab containers, which utilize spack-stack. they are available with (i) clang/mpich, and (ii) gnu/openmpi. note that Skylab has a different version of crtm than used by the ufs-wm, so these cannot be used for building/running the wm.

@jkbk2004
Copy link
Collaborator

What about spack stack itself ? Is it going to have debug versions of MAPL or ESMF ?

@climbfuji
Copy link
Collaborator

climbfuji commented Mar 10, 2023 via email

@ulmononian
Copy link
Collaborator Author

What about spack stack itself ? Is it going to have debug versions of MAPL or ESMF ?

if you look at the beta version of the unified environment i shared in the issue description, you will see mapl/2.22.0-debug-esmf-8.3.0b09-debug and esmf/8.3.0b09-debug are both available.

@DusanJovic-NOAA
Copy link
Collaborator

We should stop building debug versions of esmf (and mapl).

@climbfuji
Copy link
Collaborator

climbfuji commented Mar 10, 2023 via email

@jkbk2004
Copy link
Collaborator

@rhaesung @ulmononian @yichengt90 can we make a quick build test with land DA and noah-mp as well?

@ulmononian
Copy link
Collaborator Author

@rhaesung @ulmononian @yichengt90 can we make a quick build test with land DA and noah-mp as well?

do you mean ensure the land DA / noah-mp system builds & runs using the unified environment? or to add a specific land DA env into the unified environment (as is done for global workflow, srw, ufs-wm, etc.)?

@jkbk2004
Copy link
Collaborator

@rhaesung @ulmononian @yichengt90 can we make a quick build test with land DA and noah-mp as well?

do you mean ensure the land DA / noah-mp system builds & runs using the unified environment? or to add a specific land DA env into the unified environment (as is done for global workflow, srw, ufs-wm, etc.)?

land DA/noah-mp build cases need both features of jedi and ufs-wm environments. just build test with land da release branch. But I hope to follow on noah-mp component build along with that. Let me know if we need a quick tag-up.

@ulmononian
Copy link
Collaborator Author

ulmononian commented Mar 15, 2023

@rhaesung @ulmononian @yichengt90 can we make a quick build test with land DA and noah-mp as well?

do you mean ensure the land DA / noah-mp system builds & runs using the unified environment? or to add a specific land DA env into the unified environment (as is done for global workflow, srw, ufs-wm, etc.)?

land DA/noah-mp build cases need both features of jedi and ufs-wm environments. just build test with land da release branch. But I hope to follow on noah-mp component build along with that. Let me know if we need a quick tag-up.

the unified environment contains all necessary modules for building any of the jedi bundles. for example, in the case of land DA which currently uses the fv3-bundle, one can simply "load" the unified environment (i.e. module use <path/to/ue/core>, load appropriate compiler/mpi, load pertinent modules) and build (in this case, with ecbuild, also included with the unified environment). for example, i built the fv3-bundle using only modules from the beta unified environment on orion here: /work2/noaa/epic-ps/cbook/fv3-bundle.

the land DA system (cloned/built from https://github.com/NOAA-EPIC/land-offline_workflow/tree/release/public-v1.0.0) was run for 2016 case using this UE-built fv3-bundle and a modified landda_orion.intel.lua modulefile (points to unified environment stack & modules; see https://github.com/ulmononian/land-offline_workflow/blob/release/public-v1.0.0/modulefiles/landda_orion.intel.lua) here: [src] /work2/noaa/epic-ps/cbook/landDA/ue_test/spack_fork; [workdir] /work2/noaa/epic-ps/cbook/landDA/ue_test/workdir; [expts] /work2/noaa/epic-ps/cbook/landDA/ue_test/landda_expts.

@jkbk2004
Copy link
Collaborator

@ulmononian there is a re-syncing issue on land da side (NOAA-PSL/land-offline_workflow#29). Is it possible to install a similar version of this spack stack on hera?

@jkbk2004
Copy link
Collaborator

@ulmononian there is a re-syncing issue on land da side (NOAA-PSL/land-offline_workflow#29). Is it possible to install a similar version of this spack stack on hera?

@rhaesung FYI

@ulmononian
Copy link
Collaborator Author

@ulmononian there is a re-syncing issue on land da side (NOAA-PSL/land-offline_workflow#29). Is it possible to install a similar version of this spack stack on hera?

a beta installation of the UE on hera is underway. i will share the path and updated landda_hera.intel.lua file when it is ready for use.

@ulmononian
Copy link
Collaborator Author

@jkbk2004 @rhaesung i installed a beta UE on hera here: /scratch1/NCEPDEV/stmp4/Cameron.Book/sw/spack-stack-1.2.0/envs/unified-env. the land DA system was built against this stack here: /scratch1/NCEPDEV/stmp4/Cameron.Book/landDA_work/land-offline_workflow/build. 2020 and 2016 land DA experiments were run successfully w/ this stack here: /scratch1/NCEPDEV/stmp4/Cameron.Book/landDA_work (see 2020 and 2016 landda_expts and workdirs therein). for consitency, the fv3-bundle was rebuilt with the UE stack here: /scratch1/NCEPDEV/stmp4/Cameron.Book/landDA_work/fv3-bundle.

an updated modulefile for land DA on hera can be found here https://github.com/ulmononian/land-offline_workflow/blob/release/public-v1.0.0/modulefiles/landda_hera.intel.lua.

@DeniseWorthen
Copy link
Collaborator

Can we close this issue?

@DeniseWorthen
Copy link
Collaborator

Why is this issue still open?

@ulmononian
Copy link
Collaborator Author

this can be closed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants