Skip to content

Commit

Permalink
Feature #2460 allow missing input (#2493)
Browse files Browse the repository at this point in the history
* changed template to use datetime format that works on MacOS

* update logic to only write a file list file if there are more than 1 files, updated unit tests to match new behavior, added exception handling to series analysis to prevent crash if file does not exist

* use getraw instead of getstr to prevent crash if providing a filename template tag to override a config variable on the command line

* Add optional argument to subset file function to always write a file list text file even if there is only 1 file found. Use this argument in UserScript wrapper so that the environment variables that contain paths to file list files are consistent in format for use in user scripts

* enhanced function to support different output variable types

* removed the need for overriding clear function in specific wrappers and added optional argument to skip clearing input file list

* clean up formatting

* per #2460, start to implement logic to prevent errors when some input files are not found

* isolate logic to find input files into find_input_files functions. clean up those functions to return boolean instead of sometimes returning None or a list of files to be consistent

* remove python embedding checks because MET is now smart enough to determine if a python script is used with always setting file_type

* turn on use cases to test error handling

* merge artifacts

* run only failed cases

* always run merge step

* run on a case that will succeed to test error log merge step

* only run error log merge step if there were 'Save error logs' jobs that succeeded

* run cases that will fail

* fix condition to merge error logs

* run group that will succeed but have diffs - check error logs doesn't fail

* testing - add use case group that will succeed but will cause diffs becaus there is no truth data - to confirm that the error log merge step behaves properly in this case

* run 3 jobs, 2 should error, to confirm that error_logs is created properly

* repeat diff no error test but with

* per dtcenter/MET#2796, fix error log artifact creation by merging error logs if any of the 'Save error logs' steps ran successfully

* run test to confirm diff does not cause merge error logs to fail

* Revert "run test to confirm diff does not cause merge error logs to fail"

This reverts commit ff2d1ca.

* run test to confirm error logs are merged properly when 2 use case groups have errors

* try checking output variable as string instead of boolean

* Revert "run test to confirm error logs are merged properly when 2 use case groups have errors"

This reverts commit 8106666.

* run test again

* test again

* move check for error logs for shell script and use github env vars

* Revert "run test again"

This reverts commit 7a0a99c.

* break 2 use cases to test that error logs are still created properly

* checkout repo to get script used to merge error logs

* Revert "break 2 use cases to test that error logs are still created properly"

This reverts commit cb6d0b4.

* test merge error log again on no error diff run

* fix script

* move merge error logic back to workflow

* break 2 use cases to test that error logs are still created properly

* Revert "break 2 use cases to test that error logs are still created properly"

This reverts commit 82aa0e1.

* remove testing use case group

* Revert "remove python embedding checks because MET is now smart enough to determine if a python script is used with always setting file_type"

This reverts commit de3b4b0.

* clean up lines

* update logic to check that python embedding is set up properly to only try to set file_type automatically if it is not already set and if the wrapper is a tool that supports multiple input files via python embedding (which require file_type to be set). also changed error if not set properly to warning and use PYTHON_NUMPY as a default

* remove run_count increment before run_at_time_once - set closer to find_input_files so run count and missing input count are consistent

* return boolean from find_input_files function to be consistent with other functions

* per #2460, warn instead of error if missing inputs are allowed, track counters for number of runs and missing inputs

* per #2460, added check to report error if allowed missing input threshold is met

* run clear before running plot_data_plane

* removed test group

* report warning instead of error if ALLOW_MISSING_INPUTS is True

* cleanup

* change function to pytest fixture so it can be used by other test scripts

* update ascii2nc test to process more than 1 time to ensure commands are built properly for each run

* add unit tests to ensure missing input file logic works properly for ascii2nc and grid_stat

* set variable to skip RuntimeFreq logic to find input files to prevent duplicate increment of run_count -- these will be removed when the wrapper has been updated to find files using RuntimeFreq logic

* remove unneccesary error checking

* cleanup

* call function to handle input templates that need to be handled separately for each item in the comma-separated list (for UserScript and GridDiag only)

* add time_info to ALL_FILES dictionaries to be consistent with other wrappers

* clean up logging for reporting error when missing inputs exceeds threshold

* added function to get files for a single run time to be consistent with other functions

* skip increment of run_count when FIND_FILES=True and RuntimeFreq input file logic is skipped to prevent duplicate increments

* added empty test files

* remove redundant variables

* view warnings on a failed test run

* add more empty test files

* added unit tests for missing input logic

* remove MANDATORY setting for EnsembleStat and GenEnsProd and instead pass mandatory argument to call to find model files so warnings/errors are properly displayed for other inputs

* cleanup

* remove allow missing input logic from ExtractTiles wrapper

* added functions to parse template/dir variables from config, removed explicit calls to read those variables from GridStat

* remove error if more labels than inputs are provided (for UserScript and GridDiag only) -- extra labels will just be ignored

* added required boolean for input templates

* per #2460, change warning messages to debug when checking a list of DA offsets since it is common that a given offset will not always be found in the files

* added tests for missing input logic for many wrappers

* cleanup

* fix increment of number of runs

* skip missing input logic

* change how required is handled for input templates

* warn instead of error if missing input is allowed

* remove increment of missing input counters because it is handled in RuntimeFreq

* check status of input files and increment counters in overridden run_once_per_lead. remove increment of missing input counters because it is handled in run_once_per_lead

* added unit tests for missing input logic

* skip missing input logic

* cleanup

* cleanup, use fixture for tests, add unit tests for missing input, bypass missing input logic on wrappers that don't need it

* removed file that is not needed

* added unit tests for pb2nc to test -valid_beg/end arguments and changes to properly support any runtime frequencies

* warn instead of error if allowing missing inputs

* cleanup

* implement changes to properly support all runtime frequencies for pb2nc. previously all files that match a wildcard will be used instead of selecting only files that fall within the specified time range. some functions moved into pb2nc wrapper will eventually be moved up so that they are used by all wrappers to be consistent

* added unit tests that will fail until wrapper is updated

* replace functions in RuntimeFreq wrapper used to find input files so they can be used by all wrappers, updated ioda2nc wrapper to find input files properly to fix tests

* cleanup

* removed mtd version of get_input_templates and added logic to RuntimeFreq's version to get the same behavior

* added unit tests for MTD missing input checks

* per #2491, add release notes for beta3
  • Loading branch information
georgemccabe authored Feb 8, 2024
1 parent c67050a commit 0f5beca
Show file tree
Hide file tree
Showing 83 changed files with 1,838 additions and 851 deletions.
1 change: 0 additions & 1 deletion .github/workflows/testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,6 @@ jobs:
needs: use_case_tests
if: ${{ always() && needs.use_case_tests.result == 'failure' }}
steps:
- uses: actions/checkout@v4
- name: Check for error logs
id: check-for-error-logs
run: |
Expand Down
43 changes: 43 additions & 0 deletions docs/Users_Guide/release-notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,49 @@ When applicable, release notes are followed by the
`GitHub issue <https://github.com/dtcenter/METplus/issues>`__ number which
describes the bugfix, enhancement, or new feature.

METplus Version 6.0.0 Beta 3 Release Notes (2024-02-07)
-------------------------------------------------------

.. dropdown:: Enhancements

* Add suport for MET land-mask settings in Point-Stat
(`#2334 <https://github.com/dtcenter/METplus/issues/2334>`_)
* Enhance the TC-Pairs wrapper to support the new diag_required and diag_min_req configuration options
(`#2430 <https://github.com/dtcenter/METplus/issues/2430>`_)
* Enhance the TC-Diag wrapper to support new configuration options added in MET-12.0.0-beta2
(`#2432 <https://github.com/dtcenter/METplus/issues/2432>`_)
* Prevent error if some input files are missing
(`#2460 <https://github.com/dtcenter/METplus/issues/2460>`_)

.. dropdown:: Bugfix

NONE

.. dropdown:: New Wrappers

* WaveletStat
(`#2252 <https://github.com/dtcenter/METplus/issues/2252>`_)

.. dropdown:: New Use Cases

* Verify Total Column Ozone against NASA's OMI dataset
(`#1989 <https://github.com/dtcenter/METplus/issues/1989>`_)
* RRFS reformatting, aggregating, and plotting use case
(`#2406 <https://github.com/dtcenter/METplus/issues/2406>`_)
* Satellite Altimetry data
(`#2383 <https://github.com/dtcenter/METplus/issues/2383>`_)

.. dropdown:: Documentation

* Create video to demonstrate how to update use cases that use deprecated environment variables
(`#2371 <https://github.com/dtcenter/METplus/issues/2371>`_)

.. dropdown:: Internal

* Update Documentation Overview and Conventions
(`#2454 <https://github.com/dtcenter/METplus/issues/2454>`_)


METplus Version 6.0.0 Beta 2 Release Notes (2023-11-14)
-------------------------------------------------------

Expand Down
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
21 changes: 21 additions & 0 deletions internal/tests/pytests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,14 @@ def test_example(metplus_config):
if len(msg.args) != 0]
print("Tests raised the following errors:")
print("\n".join(err_msgs))
if config.logger.warning.call_args_list:
warn_msgs = [
str(msg.args[0])
for msg
in config.logger.warning.call_args_list
if len(msg.args) != 0]
print("\nTests raised the following warnings:")
print("\n".join(warn_msgs))
config.logger = old_logger
# don't remove output base if test fails
if request.node.rep_call.failed:
Expand Down Expand Up @@ -185,3 +193,16 @@ def make_nc(tmp_path, lon, lat, z, data, variable='Temp', file_name='fake.nc'):
temp[0, :, :, :] = data

return file_name


@pytest.fixture(scope="function")
def get_test_data_dir():
"""!Get path to directory containing test data.
"""
def get_test_data_path(subdir):
internal_tests_dir = os.path.abspath(
os.path.join(os.path.dirname(__file__), os.pardir)
)
return os.path.join(internal_tests_dir, 'data', subdir)

return get_test_data_path
2 changes: 2 additions & 0 deletions internal/tests/pytests/util/run_util/test_run_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@
'METPLUS_BASE',
'PARM_BASE',
'METPLUS_VERSION',
'ALLOW_MISSING_INPUTS',
'INPUT_THRESH',
]


Expand Down
56 changes: 46 additions & 10 deletions internal/tests/pytests/wrappers/ascii2nc/test_ascii2nc_wrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ def ascii2nc_wrapper(metplus_config, config_overrides=None):
'LOOP_BY': 'VALID',
'VALID_TIME_FMT': '%Y%m%d%H',
'VALID_BEG': '2010010112',
'VALID_END': '2010010112',
'VALID_INCREMENT': '1M',
'VALID_END': '2010010118',
'VALID_INCREMENT': '6H',
'ASCII2NC_INPUT_TEMPLATE': '{INPUT_BASE}/met_test/data/sample_obs/ascii/precip24_{valid?fmt=%Y%m%d%H}.ascii',
'ASCII2NC_OUTPUT_TEMPLATE': '{OUTPUT_BASE}/ascii2nc/precip24_{valid?fmt=%Y%m%d%H}.nc',
'ASCII2NC_CONFIG_FILE': '{PARM_BASE}/met_config/Ascii2NcConfig_wrapped',
Expand Down Expand Up @@ -47,6 +47,36 @@ def ascii2nc_wrapper(metplus_config, config_overrides=None):
return ASCII2NCWrapper(config, instance=instance)


@pytest.mark.parametrize(
'missing, run, thresh, errors, allow_missing', [
(1, 3, 0.5, 0, True),
(1, 3, 0.8, 1, True),
(1, 3, 0.5, 1, False),
]
)
@pytest.mark.wrapper
def test_ascii2nc_missing_inputs(metplus_config, get_test_data_dir,
missing, run, thresh, errors, allow_missing):
config_overrides = {
'INPUT_MUST_EXIST': True,
'ASCII2NC_ALLOW_MISSING_INPUTS': allow_missing,
'ASCII2NC_INPUT_THRESH': thresh,
'ASCII2NC_INPUT_TEMPLATE': os.path.join(get_test_data_dir('ascii'), 'precip24_{valid?fmt=%Y%m%d%H}.ascii'),
'VALID_END': '2010010200',
}
wrapper = ascii2nc_wrapper(metplus_config, config_overrides)
assert wrapper.isOK

all_cmds = wrapper.run_all_times()
for cmd, _ in all_cmds:
print(cmd)

print(f'missing: {wrapper.missing_input_count} / {wrapper.run_count}, errors: {wrapper.errors}')
assert wrapper.missing_input_count == missing
assert wrapper.run_count == run
assert wrapper.errors == errors


@pytest.mark.parametrize(
'config_overrides, env_var_values', [
({},
Expand Down Expand Up @@ -163,11 +193,13 @@ def test_ascii2nc_wrapper(metplus_config, config_overrides,

input_path = wrapper.config.getraw('config', 'ASCII2NC_INPUT_TEMPLATE')
input_dir = os.path.dirname(input_path)
input_file = 'precip24_2010010112.ascii'
input_file1 = 'precip24_2010010112.ascii'
input_file2 = 'precip24_2010010118.ascii'

output_path = wrapper.config.getraw('config', 'ASCII2NC_OUTPUT_TEMPLATE')
output_dir = os.path.dirname(output_path)
output_file = 'precip24_2010010112.nc'
output_file1 = 'precip24_2010010112.nc'
output_file2 = 'precip24_2010010118.nc'

all_commands = wrapper.run_all_times()
print(f"ALL COMMANDS: {all_commands}")
Expand All @@ -177,13 +209,17 @@ def test_ascii2nc_wrapper(metplus_config, config_overrides,
verbosity = f"-v {wrapper.c_dict['VERBOSITY']}"
config_file = wrapper.c_dict.get('CONFIG_FILE')

expected_cmd = (f"{app_path} "
f"{input_dir}/{input_file} "
f"{output_dir}/{output_file} "
f"-config {config_file} "
f"{verbosity}")
expected_cmds = [
(f"{app_path} {input_dir}/{input_file1} {output_dir}/{output_file1} "
f"-config {config_file} {verbosity}"),
(f"{app_path} {input_dir}/{input_file2} {output_dir}/{output_file2} "
f"-config {config_file} {verbosity}"),
]

assert all_commands[0][0] == expected_cmd
assert len(all_commands) == len(expected_cmds)
for (cmd, _), expected_cmd in zip(all_commands, expected_cmds):
# ensure commands are generated as expected
assert cmd == expected_cmd

env_vars = all_commands[0][1]

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1199,17 +1199,20 @@ def test_errors_and_defaults(metplus_config):
assert actual == False
assert _in_last_err('Could not generate command', cb.logger)

# test python embedding error
# test python embedding check
with mock.patch.object(cb_wrapper, 'is_python_script', return_value=True):
actual = cb.check_for_python_embedding('FCST',{'fcst_name':'pyEmbed'})
assert actual == None
assert _in_last_err('must be set to a valid Python Embedding type', cb.logger)
assert actual == 'python_embedding'

cb.c_dict['FCST_INPUT_DATATYPE'] = 'PYTHON_XARRAY'
cb.env_var_dict['METPLUS_FCST_FILE_TYPE'] = "PYTHON_NUMPY"
with mock.patch.object(cb_wrapper, 'is_python_script', return_value=True):
actual = cb.check_for_python_embedding('FCST',{'fcst_name':'pyEmbed'})
assert actual == 'python_embedding'

with mock.patch.object(cb_wrapper, 'is_python_script', return_value=False):
actual = cb.check_for_python_embedding('FCST',{'fcst_name':'pyEmbed'})
assert actual == 'pyEmbed'

# test field_info not set
cb.c_dict['CURRENT_VAR_INFO'] = None
actual = cb.set_current_field_config()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,6 @@

import os

from datetime import datetime


from metplus.wrappers.ensemble_stat_wrapper import EnsembleStatWrapper

fcst_dir = '/some/path/fcst'
Expand All @@ -27,7 +24,7 @@
run_times = ['2005080700', '2005080712']


def set_minimum_config_settings(config, set_fields=True):
def set_minimum_config_settings(config, set_fields=True, set_obs=True):
# set config variables to prevent command from running and bypass check
# if input files actually exist
config.set('config', 'DO_NOT_RUN_EXE', True)
Expand All @@ -46,11 +43,12 @@ def set_minimum_config_settings(config, set_fields=True):
config.set('config', 'ENSEMBLE_STAT_CONFIG_FILE',
'{PARM_BASE}/met_config/EnsembleStatConfig_wrapped')
config.set('config', 'FCST_ENSEMBLE_STAT_INPUT_DIR', fcst_dir)
config.set('config', 'OBS_ENSEMBLE_STAT_GRID_INPUT_DIR', obs_dir)
config.set('config', 'FCST_ENSEMBLE_STAT_INPUT_TEMPLATE',
'{init?fmt=%Y%m%d%H}/fcst_file_F{lead?fmt=%3H}')
config.set('config', 'OBS_ENSEMBLE_STAT_GRID_INPUT_TEMPLATE',
'{valid?fmt=%Y%m%d%H}/obs_file')
if set_obs:
config.set('config', 'OBS_ENSEMBLE_STAT_GRID_INPUT_DIR', obs_dir)
config.set('config', 'OBS_ENSEMBLE_STAT_GRID_INPUT_TEMPLATE',
'{valid?fmt=%Y%m%d%H}/obs_file')
config.set('config', 'ENSEMBLE_STAT_OUTPUT_DIR',
'{OUTPUT_BASE}/EnsembleStat/output')
config.set('config', 'ENSEMBLE_STAT_OUTPUT_TEMPLATE', '{valid?fmt=%Y%m%d%H}')
Expand All @@ -62,6 +60,74 @@ def set_minimum_config_settings(config, set_fields=True):
config.set('config', 'OBS_VAR1_LEVELS', obs_level)


@pytest.mark.parametrize(
'allow_missing, optional_input, missing, run, thresh, errors', [
(True, None, 3, 8, 0.4, 0),
(True, None, 3, 8, 0.7, 1),
(False, None, 3, 8, 0.7, 3),
(True, 'obs_grid', 4, 8, 0.4, 0),
(True, 'obs_grid', 4, 8, 0.7, 1),
(False, 'obs_grid', 4, 8, 0.7, 4),
(True, 'point_grid', 4, 8, 0.4, 0),
(True, 'point_grid', 4, 8, 0.7, 1),
(False, 'point_grid', 4, 8, 0.7, 4),
(True, 'ens_mean', 4, 8, 0.4, 0),
(True, 'ens_mean', 4, 8, 0.7, 1),
(False, 'ens_mean', 4, 8, 0.7, 4),
(True, 'ctrl', 4, 8, 0.4, 0),
(True, 'ctrl', 4, 8, 0.7, 1),
(False, 'ctrl', 4, 8, 0.7, 4),
# still errors if more members than n_members found
(True, 'low_n_member', 8, 8, 0.7, 6),
(False, 'low_n_member', 8, 8, 0.7, 8),
]
)
@pytest.mark.wrapper_b
def test_ensemble_stat_missing_inputs(metplus_config, get_test_data_dir, allow_missing,
optional_input, missing, run, thresh, errors):
config = metplus_config
set_minimum_config_settings(config, set_obs=False)
config.set('config', 'INPUT_MUST_EXIST', True)
config.set('config', 'ENSEMBLE_STAT_ALLOW_MISSING_INPUTS', allow_missing)
config.set('config', 'ENSEMBLE_STAT_INPUT_THRESH', thresh)
n_members = 4 if optional_input == 'low_n_member' else 6
config.set('config', 'ENSEMBLE_STAT_N_MEMBERS', n_members)
config.set('config', 'INIT_BEG', '2009123106')
config.set('config', 'INIT_END', '2010010100')
config.set('config', 'INIT_INCREMENT', '6H')
config.set('config', 'LEAD_SEQ', '24H, 48H')
config.set('config', 'FCST_ENSEMBLE_STAT_INPUT_DIR', get_test_data_dir('ens'))
config.set('config', 'FCST_ENSEMBLE_STAT_INPUT_TEMPLATE',
'{init?fmt=%Y%m%d%H}/arw-*-gep?/d01_{init?fmt=%Y%m%d%H}_{lead?fmt=%3H}00.grib')

if optional_input == 'obs_grid':
prefix = 'OBS_ENSEMBLE_STAT_GRID'
elif optional_input == 'point_grid':
prefix = 'OBS_ENSEMBLE_STAT_POINT'
elif optional_input == 'ens_mean':
prefix = 'ENSEMBLE_STAT_ENS_MEAN'
elif optional_input == 'ctrl':
prefix = 'ENSEMBLE_STAT_CTRL'
else:
prefix = None

if prefix:
config.set('config', f'{prefix}_INPUT_DIR', get_test_data_dir('obs'))
config.set('config', f'{prefix}_INPUT_TEMPLATE', '{valid?fmt=%Y%m%d%H}_obs_file')

wrapper = EnsembleStatWrapper(config)
assert wrapper.isOK

all_cmds = wrapper.run_all_times()
for cmd, _ in all_cmds:
print(cmd)

print(f'missing: {wrapper.missing_input_count} / {wrapper.run_count}, errors: {wrapper.errors}')
assert wrapper.missing_input_count == missing
assert wrapper.run_count == run
assert wrapper.errors == errors


@pytest.mark.parametrize(
'config_overrides, expected_filename', [
# 0 - set forecast level
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

from metplus.wrappers.extract_tiles_wrapper import ExtractTilesWrapper


def extract_tiles_wrapper(metplus_config):
config = metplus_config
config.set('config', 'PROCESS_LIST', 'ExtractTiles')
Expand All @@ -22,10 +23,6 @@ def extract_tiles_wrapper(metplus_config):
config.set('config', 'EXTRACT_TILES_LAT_ADJ', '15')
config.set('config', 'EXTRACT_TILES_LON_ADJ', '15')
config.set('config', 'EXTRACT_TILES_FILTER_OPTS', '-basin ML')
config.set('config', 'FCST_EXTRACT_TILES_INPUT_TEMPLATE',
'gfs_4_{init?fmt=%Y%m%d}_{init?fmt=%H}00_{lead?fmt=%HHH}.grb2')
config.set('config', 'OBS_EXTRACT_TILES_INPUT_TEMPLATE',
'gfs_4_{valid?fmt=%Y%m%d}_{valid?fmt=%H}00_000.grb2')
config.set('config', 'EXTRACT_TILES_GRID_INPUT_DIR',
'{INPUT_BASE}/cyclone_track_feature/reduced_model_data')
config.set('config', 'EXTRACT_TILES_PAIRS_INPUT_DIR',
Expand Down
Loading

0 comments on commit 0f5beca

Please sign in to comment.