-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[develop] Adds a YAML interface for creating a Rocoto XML. #676
[develop] Adds a YAML interface for creating a Rocoto XML. #676
Conversation
Changes were made to all config files and scripts to use FIRST and LAST cycle definitions to accept the cycle HH, and frequency will start from those for all relevant computation.
Need to merge develop here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verification failures have been addressed. All comprehensive tests now pass on Hera. This PR can be merged.
The pipelines seem stuck this evening while monitoring them on Jenkins. I saw the failure on Gaea and pushed the change that should fix it, but didn't see a way to stop the current workflow and start a new one. |
Note on testing: this PR appears to fix #688 as well. |
@christinaholtNOAA After resubmitting the Jenkins tests this morning, I'm seeing the following- Cheyenne:
Gaea:
Orion:
It looks like the issue from yesterday is still occurring within the Jenkins pipeline. I'll look into trying to manually run the Jenkins pipelines and see if anything jumps out as an issue. |
@mkavulich and I are doing some rapid dev/test iterations to try to fix these Cheyenne issues. I think they should also take care of the issues on other platforms. |
Ran manual fundamental tests on Cheyenne and Hera, all expected tests passed: Note that there is a known failure in develop for test grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR in the fundamental test on Hera, see #688 for more details. @MichaelLueken I think we're ready to kick off Jenkins tests again. |
@MichaelLueken I spoke too soon, the final verification tasks are failing on the one Cheyenne verification test. Investigating now. |
Okay, false alarm, the failure was due to running out of disk space. @MichaelLueken Please start up the Jenkins tests |
@mkavulich and @christinaholtNOAA The Jenkins tests have been resubmitted. I have also run the Jenkins fundamental tests manually on Jet and all tests successfully passed, so I'm feeling positive on this batch of tests. Thanks! |
@danielabdi-noaa Since you had left several comments in this PR, I'd like to check and make sure that you are okay with these changes before moving forward. The Jenkins tests are still running on Gaea, but I would like to ensure that all parties are okay now. Thank you very much for your time! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MichaelLueken Yes, my suggestions have been more or less addressed so approving it now.
@MichaelLueken Can we merge this PR while the gaea test are still stuck? I'd like to unblock the other PRs and address any remaining gaea issues next week, separately, if they exist. |
@christinaholtNOAA Sure. Since the manual tests on Hera and Jet passed and the Jenkins tests on Cheyenne and Orion have successfully passed, I think we can move forward with these changes now and address Gaea issues on Monday. Have a great weekend! |
* [develop] Adds a YAML interface for creating a Rocoto XML. (ufs-community#676) Refactors the creation of a Rocoto XML to use a very generic Jinja2 template that is flexible enough to meet the needs of various workflow configurations supported by SRW. Specifically, it allows for a completely arbitrary workflow to be created under SRW, which includes the addition of completely arbitrary tasks on top of the predefined ones. --------- Co-authored-by: Michael Kavulich <[email protected]> * [develop] Change the build log output file extension from log to txt (ufs-community#690) When pipeline files are archived to s3 bucket, retrieving the file via a browser attempts to render/display files of known extensions. A browser doesn't generally understand what to do with a .log extension (e.g. build.log). For ease of use in the CI Dashboard, which is a static HTML page, the s3 archived build log needs a .txt extension (e.g. build.txt). * Add "MET_TOOL" definitions to new XML definition YAMLs * Fix incorrect YAML if block in config_defaults, remove non-needed "USCORE_ENSMEM_NAME_OR_NULL" variable * - Convert new test "MET_ensemble_verification_only_vx" to new YAML format - Fix f-string for utils.py error message * Fixing more failures (still more to go) * More fixes, got stand-alone verification test to pass! - Fix copy-paste errors in parm/workflow yamls - Update corrected variables for new names in exscripts * Improvement for monitor jobs script: if in debug mode, print the number of tasks that succeeded and failed for failed experiments * Forgot to include VX_FCST_INPUT_DIR definition for MET_ensemble_verification_only_vx test * Correct script for task_run_MET_EnsembleStat_vx_APCP * Pull out CATE and ENSMEM_INDEX from default VX_FCST_INPUT_DIR. My naive attempt to simplify things was the root of all my problems! * Everything working! Just need to solve problem of non-existent metatask dependencies! * Fix last failing ensemble test, fundamental tests and all verification tests now pass on Hera! --------- Co-authored-by: Christina Holt <[email protected]> Co-authored-by: Bruce Kropp - Raytheon <[email protected]>
DESCRIPTION OF CHANGES:
Refactors the creation of a Rocoto XML to use a very generic Jinja2 template that is flexible enough to meet the needs of various workflow configurations supported by SRW. Specifically, it allows for a completely arbitrary workflow to be created under SRW, which includes the addition of completely arbitrary tasks on top of the predefined ones.
The workflow has been refactored to allow for the definition a workflow given specific entries in the rocoto section of the configuration files. The paradigm shifts to telling the configuration which sets of tasks to run, and removing a section when certain pre-configured tasks should not run.
The high level overview of changes include:
A follow-on PR will be needed to create the documentation necessary to support this change, although some documentation will be included with this PR.
I hope to prioritize this PR as it has been, and will continue to be very difficult to support as changes are made to the existing workflow and scripts. I will work with others impacted by the change to ensure their workflow changes make it into the new workflow config files.
A demo of this tool is scheduled for this Thursday, March 16. It will be recorded.
The code has run through the comprehensive tests on Hera. Comparisons to the answers from the corresponding develop branch (up to date as of Mar 13) are underway, and being coordinated with @mkavulich. At the very least, fundamental tests should be run on each machine to test the changes necessary for each other platform.
Because there are tons of changes spanning many months of development here, I will leave the PR in a Draft state until it is sufficiently cleaned up and ready for review.
Type of change
TESTS CONDUCTED:
DEPENDENCIES:
None.
DOCUMENTATION:
More extensive documentation is planned as a follow-on PR due to the time-sensitive nature of getting this PR prepared and merged to reduce the overhead of syncing with new changes.
ISSUE:
Addresses Issue #463
CHECKLIST