-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release/public-v2.2.0] spack-stack 1.4.1 integration for Gaea C5 platform #941
[release/public-v2.2.0] spack-stack 1.4.1 integration for Gaea C5 platform #941
Conversation
Comprehensive tests seem to be looking good. Those that died expected to fail (no HPSS). Full summary log attached.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compiled, ran fundamental tests, all OK.
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta COMPLETE 21.05
nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_timeoffset_suite_ COMPLETE 26.24
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2 COMPLETE 13.93
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot COMPLETE 28.79
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR COMPLETE 35.64
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0 COMPLETE 33.54
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16 COMPLETE 49.83
----------------------------------------------------------------------------------------------------
Total COMPLETE 209.02
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@natalie-perlin - I was able to successfully compile your branch on Gaea C5 using spack-stack. The testing is underway, but since both you and @RatkoVasic-NOAA have already run tests, I will go ahead and give my approval and launch the Jenkins tests.
As expected, the tests successfully passed:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta COMPLETE 19.89
nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_timeoffset_suite_ COMPLETE 27.23
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2 COMPLETE 13.77
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot COMPLETE 28.28
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR COMPLETE 34.46
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0 COMPLETE 34.06
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16 COMPLETE 48.77
----------------------------------------------------------------------------------------------------
Total COMPLETE 206.46
One test failed on Orion:
Rerunning the failed
Awaiting completion of tests on Gaea, Gaea C5, Hera, and Jet. |
@MichaelLueken - is there anything else to be completed before merging?.. |
@natalie-perlin - Tests are still running on Gaea and Gaea C5. Tests were hanging on the machine, but have moved into the Testing phase this morning. Once complete, this PR can be merged. |
@natalie-perlin @ulmononian - The version of spack-stack used in this PR is different from the stack used in the weather model's PR #1784. Is there a plan to move the weather model to the version in this PR, or will the SRW need to move to use the weather model's version? |
Gaea C5 WE2E coverage tests were manually run and all passed successfully:
Working through some issues that were experienced on Hera. Once the tests finish, I can move forward with this work. @jkbk2004 would like to ensure that both @natalie-perlin and @ulmononian are on the same page with respect to the different versions of the spack-stacks that are used for SRW and the weather model. |
i am not sure where the stack used in this PR came from. it was not an official installation by the spack-stack group as far as i can recall. @natalie-perlin is there a reason why this PR does not use |
Thanks, @ulmononian! @natalie-perlin - If this PR isn't using the official spack-stack from the spack-stack team, then I think we should hold off. We were originally planning on supporting hpc-stack for Derecho, Gaea C5, generic Linux, and MacOS. It's probably not a good idea to use an unsupported stack on Gaea C5, especially for a community release. We can hold off until we transfer to |
Hera GNU WE2E coverage tests have completed:
|
Hera Intel WE2E tests have completed:
Rerun of the
|
@ulmononian @MichaelLueken - this is a spack-stack version that used the same compiler version as the previous SRW version (intel-2023.1.0). Dom Heinzeller suggested the location of this stack version to install in a standard EPIC location: , and to keep alongside of the previous one that used intel-2022.0.2: The last one was the stack that Ratko tested in the early stages of the PR-913 #913 and it did not work for SRW. |
did anyone try anyway, if the only difference between |
The Gaea tests have successfully completed:
|
@natalie-perlin and @ulmononian - The tests have completed on all platforms. It looks like @ulmononian is okay to move forward with merging this work for the release (since the only difference is the compiler version). I will go ahead and merge this PR now. Thank you both for the discussion in this PR! |
f142d38
into
ufs-community:release/public-v2.2.0
DESCRIPTION OF CHANGES:
Integrate spack-stack/1.4.1 modules into Gaea C5 platform.
Spack-stack is based on intel-classic/2023.1.0 compiler and cray-mpich/8.1.25; stack environment built is
/lustre/f2/dev/wpo/role.epic/contrib/spack-stack/c5/spack-stack-1.4.1/envs/unified-env-intel-2023.1.0/
Fundamental tests pass completely:
A detailed summary of log WE2E_summary_20231012211458.txt is attached.
In comprehensive tests suite, some of the failures do occur, in tasks such as
get_obs_<xxxx>
,get_extrn_ics
,get_extrn_lbcs
; comprehensive tests are still running.Files changed:
modulefiles/build_gaea-c5_intel.lua (unload modules darshan-runtime, cray-pmi)
modulefiles/wflow_gaea-c5.lua
modulefiles/tasks/gaea-c5/python_srw.lua (loads module darshan-runtime)
modulefiles/tasks/gaea-c5/run_vx.local.lua
Type of change
TESTS CONDUCTED:
DEPENDENCIES:
ISSUE:
Resolves the issue
CHECKLIST
LABELS (optional):
A Code Manager needs to add the following labels to this PR:
CONTRIBUTORS (optional):
@RatkoVasic-NOAA
WE2E_summary_20231012211458.txt