Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop] Adds a YAML interface for creating a Rocoto XML. #676

Merged
merged 68 commits into from
Mar 31, 2023

Conversation

christinaholtNOAA
Copy link
Collaborator

@christinaholtNOAA christinaholtNOAA commented Mar 15, 2023

DESCRIPTION OF CHANGES:

Refactors the creation of a Rocoto XML to use a very generic Jinja2 template that is flexible enough to meet the needs of various workflow configurations supported by SRW. Specifically, it allows for a completely arbitrary workflow to be created under SRW, which includes the addition of completely arbitrary tasks on top of the predefined ones.

The workflow has been refactored to allow for the definition a workflow given specific entries in the rocoto section of the configuration files. The paradigm shifts to telling the configuration which sets of tasks to run, and removing a section when certain pre-configured tasks should not run.

The high level overview of changes include:

  • Moved batch job related config variables out of config_default.yaml sections, and into the Rocoto config files for the given job. This resulted in the removal of entire sections of config_default.yaml. References to those removed sections were removed throughout SRW.
  • The transition of the FV3LAM_wflow.xml template to a complete generic interpretation of YAML configs.
  • The addition of parm/wflow files in groups of tasks that are commonly run together. All the Rocoto-required information for each task now lives in these yaml files.
  • A modification to specifying varying forecast lengths to remove a dependence on writing all cycles in a workflow to the config file. The FCST_LEN_CYCL array is expected to correspond to hours in the day associated with a cycling freqency. For example, a list of 4 forecast lengths with a 6 hour cycling freqency indicates that the forecast lengths correspond to 0, 6, 12, and 18 UTC.
  • The removal of the workflow_switches section. Whether tasks run is now based on their being defined as part of the rocoto section of the config file.
  • The number of processors is now included as an environment variable passed via Rocoto instead of computed in the scripts since Rocoto holds the source of that information.
  • Edits to the test config files to reflect how the test should have originally been run
  • Writes a new file to the experiment directory -- rocoto_defns.yaml. It contains the rocoto section of the config that has been removed from the var_defns.yaml.

A follow-on PR will be needed to create the documentation necessary to support this change, although some documentation will be included with this PR.

I hope to prioritize this PR as it has been, and will continue to be very difficult to support as changes are made to the existing workflow and scripts. I will work with others impacted by the change to ensure their workflow changes make it into the new workflow config files.

A demo of this tool is scheduled for this Thursday, March 16. It will be recorded.

The code has run through the comprehensive tests on Hera. Comparisons to the answers from the corresponding develop branch (up to date as of Mar 13) are underway, and being coordinated with @mkavulich. At the very least, fundamental tests should be run on each machine to test the changes necessary for each other platform.

Because there are tons of changes spanning many months of development here, I will leave the PR in a Draft state until it is sufficiently cleaned up and ready for review.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

TESTS CONDUCTED:

  • hera.intel
  • orion.intel
  • cheyenne.intel
  • cheyenne.gnu
  • gaea.intel
  • jet.intel
  • wcoss2.intel
  • NOAA Cloud (indicate which platform)
  • Jenkins
  • fundamental test suite
  • comprehensive tests (specify which if a subset was used)

DEPENDENCIES:

None.

DOCUMENTATION:

More extensive documentation is planned as a follow-on PR due to the time-sensitive nature of getting this PR prepared and merged to reduce the overhead of syncing with new changes.

ISSUE:

Addresses Issue #463

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes do not require updates to the documentation (explain).
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

Copy link
Collaborator

@mkavulich mkavulich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verification failures have been addressed. All comprehensive tests now pass on Hera. This PR can be merged.

@christinaholtNOAA
Copy link
Collaborator Author

The pipelines seem stuck this evening while monitoring them on Jenkins. I saw the failure on Gaea and pushed the change that should fix it, but didn't see a way to stop the current workflow and start a new one.

@mkavulich mkavulich removed the run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests label Mar 30, 2023
@mkavulich
Copy link
Collaborator

Note on testing: this PR appears to fix #688 as well.

@mkavulich mkavulich added the run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests label Mar 31, 2023
@MichaelLueken
Copy link
Collaborator

@christinaholtNOAA After resubmitting the Jenkins tests this morning, I'm seeing the following-

Cheyenne:

03/31/23 08:53:28 MDT :: FV3LAM_wflow.xml :: qsub: Illegal attribute or resource value
03/31/23 08:53:30 MDT :: FV3LAM_wflow.xml :: qsub: Illegal attribute or resource value
Reading database for experiment grid_CONUS_25km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16, updating experiment dictionary
03/31/23 08:53:35 MDT :: FV3LAM_wflow.xml :: qsub: Illegal attribute or resource value
03/31/23 08:53:37 MDT :: FV3LAM_wflow.xml :: qsub: Illegal attribute or resource value
Reading database for experiment grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR, updating experiment dictionary
03/31/23 08:53:40 MDT :: FV3LAM_wflow.xml :: qsub: Illegal attribute or resource value
03/31/23 08:53:42 MDT :: FV3LAM_wflow.xml :: qsub: Illegal attribute or resource value
Reading database for experiment grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta, updating experiment dictionary
03/31/23 08:53:45 MDT :: FV3LAM_wflow.xml :: qsub: Illegal attribute or resource value
03/31/23 08:53:47 MDT :: FV3LAM_wflow.xml :: qsub: Illegal attribute or resource value
Reading database for experiment grid_RRFS_CONUScompact_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16, updating experiment dictionary
03/31/23 08:53:50 MDT :: FV3LAM_wflow.xml :: qsub: Illegal attribute or resource value
03/31/23 08:53:53 MDT :: FV3LAM_wflow.xml :: qsub: Illegal attribute or resource value
Reading database for experiment grid_RRFS_NA_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta, updating experiment dictionary
Experiment grid_CONUS_25km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16 status is SUBMITTING
Experiment grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR status is SUBMITTING
Experiment grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta status is SUBMITTING
Experiment grid_RRFS_CONUScompact_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16 status is SUBMITTING
Experiment grid_RRFS_NA_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta status is SUBMITTING

Gaea:

03/31/23 10:52:53 EDT :: FV3LAM_wflow.xml :: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:52:54 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:52:59 EDT :: FV3LAM_wflow.xml :: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:53:00 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
Reading database for experiment grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta, updating experiment dictionary
03/31/23 10:53:06 EDT :: FV3LAM_wflow.xml :: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:53:07 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:53:14 EDT :: FV3LAM_wflow.xml :: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:53:15 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
Reading database for experiment grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta, updating experiment dictionary
03/31/23 10:53:20 EDT :: FV3LAM_wflow.xml :: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:53:22 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:53:28 EDT :: FV3LAM_wflow.xml :: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:53:29 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
Reading database for experiment grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2, updating experiment dictionary
03/31/23 10:53:35 EDT :: FV3LAM_wflow.xml :: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:53:36 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:53:42 EDT :: FV3LAM_wflow.xml :: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:53:44 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
Reading database for experiment grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16, updating experiment dictionary
03/31/23 10:53:51 EDT :: FV3LAM_wflow.xml :: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:53:53 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:53:59 EDT :: FV3LAM_wflow.xml :: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:01 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
Reading database for experiment grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR, updating experiment dictionary
03/31/23 10:54:07 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:13 EDT :: FV3LAM_wflow.xml :: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:15 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
Reading database for experiment grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR, updating experiment dictionary
03/31/23 10:54:22 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:22 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:23 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:23 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:24 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:25 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:25 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:26 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:26 EDT :: FV3LAM_wflow.xml :: Submission of run_fcst_mem001 failed!  sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:27 EDT :: FV3LAM_wflow.xml :: Submission of run_fcst_mem002 failed!  sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:27 EDT :: FV3LAM_wflow.xml :: Submission of run_fcst_mem001 failed!  sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:28 EDT :: FV3LAM_wflow.xml :: Submission of run_fcst_mem002 failed!  sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:29 EDT :: FV3LAM_wflow.xml :: Submission of run_fcst_mem001 failed!  sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:29 EDT :: FV3LAM_wflow.xml :: Submission of run_fcst_mem002 failed!  sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:30 EDT :: FV3LAM_wflow.xml :: Submission of run_fcst_mem001 failed!  sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:30 EDT :: FV3LAM_wflow.xml :: Submission of run_fcst_mem002 failed!  sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:37 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:38 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:38 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:38 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:39 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:39 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:41 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:41 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:42 EDT :: FV3LAM_wflow.xml :: Submission of run_fcst_mem001 failed!  sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:43 EDT :: FV3LAM_wflow.xml :: Submission of run_fcst_mem002 failed!  sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:43 EDT :: FV3LAM_wflow.xml :: Submission of run_fcst_mem001 failed!  sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:44 EDT :: FV3LAM_wflow.xml :: Submission of run_fcst_mem002 failed!  sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:44 EDT :: FV3LAM_wflow.xml :: Submission of run_fcst_mem001 failed!  sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:45 EDT :: FV3LAM_wflow.xml :: Submission of run_fcst_mem002 failed!  sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:46 EDT :: FV3LAM_wflow.xml :: Submission of run_fcst_mem001 failed!  sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:54:46 EDT :: FV3LAM_wflow.xml :: Submission of run_fcst_mem002 failed!  sbatch: error: Batch job submission failed: Node count specification invalid
Reading database for experiment nco_ensemble, updating experiment dictionary
03/31/23 10:54:54 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:55:00 EDT :: FV3LAM_wflow.xml :: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 10:55:02 EDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
Reading database for experiment grid_SUBCONUS_Ind_3km_ics_RAP_lbcs_RAP_suite_RRFS_v1beta, updating experiment dictionary
Experiment grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta status is SUBMITTING
Experiment grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta status is SUBMITTING
Experiment grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2 status is SUBMITTING
Experiment grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16 status is SUBMITTING
Experiment grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR status is SUBMITTING
Experiment grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR status is SUBMITTING
Experiment nco_ensemble status is SUBMITTING
Experiment grid_SUBCONUS_Ind_3km_ics_RAP_lbcs_RAP_suite_RRFS_v1beta status is SUBMITTING

Orion:

03/31/23 09:57:07 CDT :: FV3LAM_wflow.xml :: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 09:57:07 CDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 09:57:08 CDT :: FV3LAM_wflow.xml :: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 09:57:08 CDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
Reading database for experiment grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_RRFS_v1beta, updating experiment dictionary
03/31/23 09:57:09 CDT :: FV3LAM_wflow.xml :: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 09:57:10 CDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 09:57:10 CDT :: FV3LAM_wflow.xml :: sbatch: error: Batch job submission failed: Node count specification invalid
03/31/23 09:57:11 CDT :: FV3LAM_wflow.xml :: WARNING: job submission failed: sbatch: error: Batch job submission failed: Node count specification invalid
Reading database for experiment grid_SUBCONUS_Ind_3km_ics_RAP_lbcs_RAP_suite_RRFS_v1beta, updating experiment dictionary
Experiment grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_RRFS_v1beta status is SUBMITTING
Experiment grid_SUBCONUS_Ind_3km_ics_RAP_lbcs_RAP_suite_RRFS_v1beta status is SUBMITTING

It looks like the issue from yesterday is still occurring within the Jenkins pipeline. I'll look into trying to manually run the Jenkins pipelines and see if anything jumps out as an issue.

@christinaholtNOAA
Copy link
Collaborator Author

@mkavulich and I are doing some rapid dev/test iterations to try to fix these Cheyenne issues. I think they should also take care of the issues on other platforms.

@mkavulich
Copy link
Collaborator

Ran manual fundamental tests on Cheyenne and Hera, all expected tests passed: Note that there is a known failure in develop for test grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR in the fundamental test on Hera, see #688 for more details.

@MichaelLueken I think we're ready to kick off Jenkins tests again.

@mkavulich
Copy link
Collaborator

@MichaelLueken I spoke too soon, the final verification tasks are failing on the one Cheyenne verification test. Investigating now.

@mkavulich
Copy link
Collaborator

Okay, false alarm, the failure was due to running out of disk space. @MichaelLueken Please start up the Jenkins tests

@MichaelLueken
Copy link
Collaborator

@mkavulich and @christinaholtNOAA The Jenkins tests have been resubmitted. I have also run the Jenkins fundamental tests manually on Jet and all tests successfully passed, so I'm feeling positive on this batch of tests. Thanks!

@MichaelLueken
Copy link
Collaborator

@danielabdi-noaa Since you had left several comments in this PR, I'd like to check and make sure that you are okay with these changes before moving forward. The Jenkins tests are still running on Gaea, but I would like to ensure that all parties are okay now. Thank you very much for your time!

Copy link
Collaborator

@danielabdi-noaa danielabdi-noaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MichaelLueken Yes, my suggestions have been more or less addressed so approving it now.

@christinaholtNOAA
Copy link
Collaborator Author

@MichaelLueken Can we merge this PR while the gaea test are still stuck? I'd like to unblock the other PRs and address any remaining gaea issues next week, separately, if they exist.

@MichaelLueken
Copy link
Collaborator

@christinaholtNOAA Sure. Since the manual tests on Hera and Jet passed and the Jenkins tests on Cheyenne and Orion have successfully passed, I think we can move forward with these changes now and address Gaea issues on Monday. Have a great weekend!

@MichaelLueken MichaelLueken merged commit 276bdd6 into ufs-community:develop Mar 31, 2023
MichaelLueken pushed a commit that referenced this pull request Apr 25, 2023
… PR #676 (#722)

* Fix AQM configuration issues on both community and nco modes.
* Fix workflow entity issues on nco mode.
gsketefian pushed a commit to gsketefian/ufs-srweather-app that referenced this pull request May 4, 2023
* [develop] Adds a YAML interface for creating a Rocoto XML. (ufs-community#676)

Refactors the creation of a Rocoto XML to use a very generic Jinja2 template that is flexible enough to meet the needs of various workflow configurations supported by SRW. Specifically, it allows for a completely arbitrary workflow to be created under SRW, which includes the addition of completely arbitrary tasks on top of the predefined ones.

---------

Co-authored-by: Michael Kavulich <[email protected]>

* [develop] Change the build log output file extension from log to txt (ufs-community#690)

When pipeline files are archived to s3 bucket, retrieving the file via a browser attempts to render/display files of known extensions. A browser doesn't generally understand what to do with a .log extension (e.g. build.log). For ease of use in the CI Dashboard, which is a static HTML page, the s3 archived build log needs a .txt extension (e.g. build.txt).

* Add "MET_TOOL" definitions to new XML definition YAMLs

* Fix incorrect YAML if block in config_defaults, remove non-needed "USCORE_ENSMEM_NAME_OR_NULL" variable

* - Convert new test "MET_ensemble_verification_only_vx" to new YAML format
 - Fix f-string for utils.py error message

* Fixing more failures (still more to go)

* More fixes, got stand-alone verification test to pass!

 - Fix copy-paste errors in parm/workflow yamls
 - Update corrected variables for new names in exscripts

* Improvement for monitor jobs script: if in debug mode, print the number of tasks that succeeded and failed for failed experiments

* Forgot to include VX_FCST_INPUT_DIR definition for MET_ensemble_verification_only_vx test

* Correct script for task_run_MET_EnsembleStat_vx_APCP

* Pull out CATE and ENSMEM_INDEX from default VX_FCST_INPUT_DIR. My naive attempt to simplify things was the root of all my problems!

* Everything working! Just need to solve problem of non-existent metatask dependencies!

* Fix last failing ensemble test, fundamental tests and all verification tests now pass on Hera!

---------

Co-authored-by: Christina Holt <[email protected]>
Co-authored-by: Bruce Kropp - Raytheon <[email protected]>
@christinaholtNOAA christinaholtNOAA deleted the rocoto_yaml branch July 2, 2024 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: HIGH run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants