-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[develop] Add Gaea C5 to supported platforms #898
[develop] Add Gaea C5 to supported platforms #898
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@natalie-perlin - I was able to successfully clone your fork, checkout the develop_gaea_c5 branch, build the SRW App, and submit the fundamental tests. However, I should note that while attempting to use source etc/lmod-setup.csh gaea_c5
, it failed due to no file named Lmod_init_C5.csh
being in /lustre/f2/dev/role.epic/contrib
. I also found it interesting that I had to load python before I could use ./manage_externals/checkout_externals
.
Similar to what you are encountering, the tests are failing in the make_sfc_climo
task. I haven't seen this particular NetCDF error before, but looking up the error message:
FATAL ERROR: ERROR IN NF90_CREATE: Permission denied
STOP.
it looks like we don't have permission to create the necessary NetCDF file. It isn't clear to me why this would be the case, unless make_sfc_climo
is attempting to create a file in the EPIC role account space, rather than in my own local directory.
Since sfc_climo_gen.fd
is in UFS_UTILS, is it possible that we need to do something similar there as we had to do with sorc/ufs-weather-model/cmake?
@MichaelLueken - fixed the file Lmod_init_C5.csh |
Change "jet-epic" to "jet"
Update staged data locations for testing on Gaea C5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@natalie-perlin - Thanks for your work on porting the SRW App to Gaea C5! Your branch was cloned and built using the Jenkins build script (.cicd/scripts/srw_build.sh
) and the coverage.gaea_c5
test suite was run using the Jenkins test script (.cicd/scripts/srw_test.sh
). All coverage tests successfully passed:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
community COMPLETE 43.50
custom_ESGgrid_NewZealand_3km COMPLETE 48.17
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta COMPLETE 26.46
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP COMPLETE 30.58
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR COMPLETE 30.32
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thompson COMPLETE 312.60
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR COMPLETE 30.80
grid_RRFS_CONUScompact_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta COMPLETE 274.60
grid_SUBCONUS_Ind_3km_ics_RAP_lbcs_RAP_suite_RRFS_v1beta_plot COMPLETE 17.23
nco_ensemble COMPLETE 99.47
nco_grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thom COMPLETE 307.34
----------------------------------------------------------------------------------------------------
Total COMPLETE 1221.07
Approving this PR now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested on C5.
@natalie-perlin - While submitting the Jenkins tests, I noted that the |
The WE2E coverage tests were successfully run on Derecho:
|
@MichaelLueken @RatkoVasic-NOAA - all instances of "gaea_c5" renamed to "gaea-c5" ( a file or directory name, or a string in a file ) |
@natalie-perlin - Unfortunately, following the merge of PR #911, there is now a conflict with this PR in |
@natalie-perlin - Thank you for merging the latest HEAD into your branch and correcting the conflicts! I have requeued the Jenkins tests for this PR and will let you know if there are any issues. |
The Jenkins Hera Intel tests have completed:
A rerun on the
The Gaea and Hercules tests have successfully completed. The Jet tests are still running. The Gaea C5 and Orion tests have been requeued. Once the tests complete, I will move forward with merging this PR. |
@natalie-perlin - The Jenkins tests successfully passed on Hera GNU, Jet, Orion. The Gaea C5
It looks like you will also need to add the:
to the Gaea C5 section of |
@MichaelLueken - done with changes for Gaea C5 in |
@natalie-perlin - Thanks! Requeuing the Jenkins tests for Gaea C5 now. I'll let you know if any other issues arise. |
The latest Jenkins tests successfully passed on Gaea C5:
Since Orion and Hercules are down, manual submission of the Jenkins pipeline caused jobs to be kicked off on these machines, which were aborted. Moving forward with merging this PR now. |
I can confirm this PR code can build SRW on gaea-c5, and produce a E2E plot. |
Modulefiles and other configuration files that complete porting the SRW to Gaea C5 system.
Software stacks used for testing are hdf5/1.14.0, netcdf/4.9.2-based, similar to those used in #889.
DESCRIPTION OF CHANGES:
Add Gaea C5 at GFDL as NOAA RDHPCS supported system
Type of change
TESTS CONDUCTED:
All fundamental tests pass successfully on Gaea_c5
All comprehensive tests pass except one (nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR) that fails on some other platforms as well.
DEPENDENCIES:
Depends on #889 - MERGED
DOCUMENTATION:
ISSUE:
Fixes issue #886
CHECKLIST
LABELS (optional):
CONTRIBUTORS (optional):
@RatkoVasic-NOAA - thank you for you contribution!!
WE2E_gaea_c5_fundamental_summary.txt