Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop] Introduce DA data preprocessing to workflow #778

Merged

Conversation

mkavulich
Copy link
Collaborator

DESCRIPTION OF CHANGES:

This is a set of contributions as part of the RRFS/dev merge. The major capabilities include:

  • A new task (get_da_obs) that retrieves and stages observation files for the subsequent data processing tasks. Most of this logic is existing consolidated logic that has been moved out of the process* tasks.
  • Add workflow entries for running get_da_obs, process_radarref, process_lightning, and process_bufrobs tasks
  • Add a new config.rrfs.yaml file for RRFS-specific options; this will be expanded over time as more capabilities are added.
  • Add a new test (process_obs) to test these initial observation processing tasks. Note that in order to run this test, the app must be built with the rrfs_utils and gsi builds, which is not part of the default build at this time.
  • More verbose errors for run_WE2E_tests.py
  • Improvements to retrieve_data.py
    • Rename --file_type and --external_model arguments to --file_fmt and --data_type, respectively. These are less confusing option names since both model data and obs can be retrieved with this script.
    • Fix bug where script was not looking for files in additional locations if they exist
    • Fix bug where script failed if any files are missing from zip archive; this should not be a failure condition unless all are missing
    • Make some arguments optional if it's possible to continue without them
    • Give more helpful/verbose error messages if an invalid/unavailable data store was selected
    • Add new unit tests "test_rap_obs_from_hpss" and "test_rap_e_obs_from_hpss"

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

TESTS CONDUCTED:

Ran fundamental tests on Hera and Jet, all passed. Ran new "process_obs" test on Hera and Jet that just tests the observation preprocessing tasks, confirmed that all expected tasks ran and completed successfully.

  • hera.intel
    • fundamental test suite
  • jet.intel
    • fundamental test suite
    • test_retrieve_data.py unit tests

DEPENDENCIES:

None

DOCUMENTATION:

None

ISSUE:

Resolves #675

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes do not require updates to the documentation (explain).
    • RRFS documentations will be updated when merge complete
  • My changes generate no new warnings
  • New and existing tests pass with my changes

CONTRIBUTORS (optional):

Workflow changes contributed by @christinaholtNOAA.

mkavulich and others added 29 commits April 19, 2023 20:27
 - "archive_internal_dir" must be an empty string for top-level archive
   files
 - Rename "file_type" argument to "file_fmt"
 - Rename "external_model" argument to more generic/correct "data_type"
 - If only some files can not be un-zipped, do not fail. We'll handle
   this more gracefully later if it needs to be an error
 - Remove default location logic for now; couldn't get it working
 - Re-add explicit disk path requirement; may re-visit later
 - For fatal error, do not imply that script continues
 - Remove debug prints
specifically HPSS fails with some kind of logging error. Needs more
investigation.

This commit removes some un-needed, un-initialized variables, adds
modulefiles for the new task on each platform, and adds some unit tests
for retrieve_data.py to test new capabilities.

The new task entry has been added to parm/wflow/coldstart.yaml for
testing, but it should be removed prior to merging this PR since the
task is not integrated with the rest of the workflow yet.
… tasks going forward. Clearly the boundary conditions tasks will need to eventually change also.
 - Link obs needed for task_process_bufr
 - Put obs in $DATA/obs rather than $DATA/gsi
 - Put in logic for getting "rap_e" files at 00 and 12z cycles; I hope
we can extract this logic into the config file later with Jinja
 - Add "task_process_bufr" entry to warmstart.yaml with a dependency on
task get_da_obs
 - Add "CYCLE_TYPE" variable to config_defaults
 - Start a new config.rrfs.yaml file
 - Set machine-specific paths in machine files, this should have been
obvious...
 - Use RRFS_NA_13km domain for RRFS
 - Re-organize Jet and Hera machine files for section consistency
 - Fix grammar error in retrieve_data.py
 - Move date variables to J-job level
 - Keep da_obs in same directory for community and nco mode (since
process_bufr does the same)
 - In process_bufr, look for da_obs in staging directory
 - Add default directory to Jet machine file...only available there for
now
 - Add task_process_lightning to warmstart workflow
…ve to Hera. Also fix config.rrfs.yaml to better settings for test case.
…to avoid duplicate error messages, search for RRFS data on disk as well as hpss
Removing the CYCLE_TYPE from config_defaults because it is a run-time,
cycle-dependent setting, not an experiment level one.
This is similar to what is done for external model files so that we can
use the machine file information about which data stores a platform has
access to.
Matching new standard outlined in Issue ufs-community#633
…obs_task

Had to manually revert some renaming changes in ush/test_retrieve_data.py
@mkavulich mkavulich requested a review from gsketefian as a code owner May 8, 2023 00:58
@MichaelLueken
Copy link
Collaborator

@mkavulich Following the merging of PR #728 this morning, there is now a conflict in ush/test_retrieve_data.py. Please merge the latest develop into your feature/add_get_obs_task branch at your earliest convenience.

@mkavulich
Copy link
Collaborator Author

@MichaelLueken I have merged in the latest changes from develop, and fixed some problems with the merged unit tests. I can not figure out why the linter is unhappy (I ran the linter on Jet for develop and got 9.74/10, so I don't know what I'm supposed to be aiming for). I have asked @christinaholtNOAA to take a look.

@MichaelLueken
Copy link
Collaborator

@mkavulich It looks like the new linting GitHub Action test gave you a score of 9.98/10. I see the following message:

************* Module test_python.test_calculate_cost
tests/test_python/test_calculate_cost.py:1:0: R0801: Similar lines in 2 files
==test_python.test_set_FV3nml_ens_stoch_seeds:30
==test_python.test_set_FV3nml_sfc_climo_filenames:26
        test_dir = os.path.dirname(os.path.abspath(__file__))
        USHdir = os.path.join(test_dir, "..", "..", "ush")
        PARMdir = os.path.join(USHdir, "..", "parm")

        # Create a temporary experiment directory structure
        self.tmp_dir = tempfile.TemporaryDirectory(
            dir=os.path.dirname(__file__),
            prefix="expt",
            )
        EXPTDIR = self.tmp_dir.name (duplicate-code)

-----------------------------------

@christinaholtNOAA
Copy link
Collaborator

christinaholtNOAA commented May 10, 2023

@mkavulich may have been the first one to check my work on-prem. I've pinned the GitHub actions to pylint v2.6, which is consistent with what's available in an old version of regional_workflow on Jet/Hera. I need to open an Issue/PR and bump that version to the pylint version available in the appropriate supported environments. This should help folks debug locally before pushing.

I'm sorry for the inconvenience.

Edit: I opened Issue #787 to document the need for this change.

@EdwardSnyder-NOAA
Copy link
Collaborator

Ran the process_obs WE2E test, config.rrfs.yaml experiment, and test_retrieve_data.py unit tests on Hera without any issues. All code checks out as well. The only suggestion I have is to perhaps rename the config.rrfs.yaml to something else, since we aren't releasing the full RRFS app, and the naming could be deceiving.

@EdwardSnyder-NOAA
Copy link
Collaborator

Approving based on our talk yesterday that the config.rrfs.yaml will be renamed.

@MichaelLueken MichaelLueken added the run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests label May 17, 2023
@christinaholtNOAA
Copy link
Collaborator

It looks like many of the major platform/compiler combos passed the Jenkins tests and Gaea's lack of Intel compiler module held up the others (best I can tell).

Can anyone help us get this PR through the pipeline given that the PR itself looks fine, but not Gaea?

@natalie-perlin
Copy link
Collaborator

@christinaholtNOAA - Gaea has been updated, and new regression tests now pass with the newly updated stack for the C3 and C4 partitions, see WM issue-1755 (1)
, WM issue-1755 (2)

At this point it is a matter of updating the ufs-weather-model repository pointing to the new software stack.

@christinaholtNOAA
Copy link
Collaborator

Okay, but what does that mean for this PR?

@natalie-perlin
Copy link
Collaborator

@christinaholtNOAA - Could Gaea possibly be added to the testing process? (still pending updates to the stack location)

@christinaholtNOAA
Copy link
Collaborator

@natalie-perlin Are you suggesting that we update the version of the weather model in this PR to make the Gaea tests pass?

That is way outside the scope of the work in this PR, and it would be great if we could find a solution that lets us merge the current PR as-is since the failures are due to factors outside our control and we have the approvals.

We're also only adding new capabilities here, not modifying old ones.

@natalie-perlin
Copy link
Collaborator

@christinaholtNOAA - Oh, the question was whether any changes in machine file for Gaea are expected, but not implemented due to issues following the upgrade. If no changes are expected for Gaea, and it only holds the PR from being fully passed, then no objections!

@MichaelLueken
Copy link
Collaborator

On Cheyenne GNU, the nco_grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thompson_mynn_lam3km test failed intermittently in run_post. This is an expected failure.

The Gaea tests will fail until PR #799 is merged.

On Hera GNU, the get_from_NOMADS_ics_FV3GFS_lbcs_FV3GFS test failed in the get_extrn_ics/lbcs tasks. Since this test has been added, it fails intermittently with this same issue.

On Hera Intel, the grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta test is failing in the run_fcst task. This is an expected failure.

Since all of the failures are expected, I will go ahead and merge this work now.

@MichaelLueken MichaelLueken merged commit 4892952 into ufs-community:develop May 19, 2023
michelleharrold pushed a commit to michelleharrold/ufs-srweather-app that referenced this pull request Jun 7, 2023
…community#778)

* Construct var_defns components from dictionary.

* Bring back config_defaults.yaml

* Add support for sourcing yaml file into shell script.

* Remove newline for printing config, json config fix.

* Make QUILTING a sub-dictionary in predef_grids

* Reorganize config_defaults.yaml by task and feature.

* Bug fix with QUILTING=true.

* Structure a dictionary based on a template dictionary.

* Convert all WE2E config files to yaml.

* Take care of problematic chars when converting to shell string.

* Process only selected keys of config.

* Add symlinked yaml config files.

* Actually use yaml config files for WE2E tests.

* Delete all shell WE2E configs.

* Don't check for single quotes in test description.

* Make WE2E work with yaml configs.

* Make yaml default config format.

* Bug fix in run_WE2E script.

* Add utility to check validity of yaml config file.

* Add config utility interface in ush directory.

* Remove unused check_expt_config_vars script.

* Add description to default config.

* Reorganize source_config.

* Add XML as one of the config formats.

* Update custom_ESGgrid config.

* Bug fix due to update.

* Change ensemble seed.

* Change POST_OUTPUT group due to merge.

* Make xml and ini configs work.

* Maintain config structure down to var_defns.

* Add function to load structured shell config, put description under metadata

* Flatten dicts before importing env now that shell config is structured.

* Support python regex for selecting dict keys.

* Add capability of sourcing task specific portion of config file.

* Access var_defns via env variable.

* Make names of tasks consistent with ex- and j- job script names.

* Append pid to temp file.

* Prettify user config, don't use " in xml texts.

* Compare timestamp of csv vs all files instead of directory.

* Fixes for some pylint suggestions.

* Convert new configs to yaml.

* Format python files with black (no functional change).

* More readable yaml/json formats by using more data types.
Only datetime type is now in quotes.

* More readable yaml config files for WE2E and default configs.

* Make config_defaults itself more readable.

* Correct pyyaml list indentation issue.

* Fix indentation in all config files.

* Use unquoted WTIME in config_defaults

* Cosmotic changes.

* Fix due to merge.

* Make __init__.py clearer.

* Fixes due to merge.

* Minor edits of comments.

* Remove wcoss_dell_p3 from workflow (ufs-community#810)

* remove wcoss_dell_p3

* remove block for tide and gyre

* Replace deprecated NCAR python environment with conda on Cheyenne (ufs-community#812)

* Fix issue on get_extrn_lbcs when FCST_LEN_HRS>=40 with netcdf (ufs-community#814)

* activate b file on hpss for >40h

* add a new we2e test for fcst_len_hrs>40

* reduce fcst time for we2e

* Convert new test case to yaml.

* Fix formatting due to merge.

* Convert new test case to yaml.

* Fix unittest.

* Merge develop

* Remove exception logic from __init__.py

* Minor change to cmd concat.

* Make grid gen methods return dictionary, simplifis code a lot.

* Add a comment why we are suppressing yaml import exception.

* Minor change to beautify unittest output.

* Add status badge for functional tests.

* Reorder tasks in config_default and we2e test cases to match order in FV3LAM.xml

* Keep single quotes and newlines in we2e test description.

* Revert back to not rounding to 10 digits

Co-authored-by: Chan-Hoo.Jeon-NOAA <[email protected]>
Co-authored-by: Michael Kavulich <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add get_obs tasks to workflow using retrieve_data.py.
5 participants