Internal: Add support for creating multiple input datasets for use case categories #1694

georgemccabe · 2022-07-11T23:22:39Z

Currently the automated tests are set up so that each model_applications category corresponds to an input data set that contains all of the data required to run all of the use cases in that category. The s2s input data set has become so large that while it doesn't exceed the maximum allowable size for the Docker data volume that stores it to use in the tests, but use case test groups that use this data run out of disk space when they write output data from the use cases.

The use case groups that fail also use the Conda environments required for METplotpy and METcalcpy, which are very large in size due to the many Python package dependencies. This also contributes to the total disk size that can be used in the test environment. The newly created metplotpy environment for #1566 is much larger in size than the existing environment, so this may cause disk space issues when that work is completed.

Size of current conda environments:

du -sh /usr/local/envs/*
897M    /usr/local/envs/metplotpy
185M    /usr/local/envs/metplus_base

Size of conda environments using Python 3.8.6 and updated package requirements:

du -sh /usr/local/envs/*
2.2G    /usr/local/envs/metplotpy.v5
168M    /usr/local/envs/metplus_base.v5

We may need to reconsider new requirements of use cases and how to group them in the tests, including:

Size of input data
Size out output data generated
Size of conda environment required to run
Others?

Describe the Enhancement

Come up with a good naming convention for the additional input data sets. Currently they are named after the category, i.e. s2s. We will need another data set such as s2s_2.
Update the automated test logic to support multiple input data sets for a given category
Update the Contributor's Guide Add Use Cases chapter with the updated process for adding new data
Update User's Guide to describe how to find input data for use cases since they will not just correspond to the model_applications sub-directory name anymore

Time Estimate

1-3 days

Sub-Issues

Consider breaking the enhancement down into sub-issues.

Add a checkbox for each sub-issue here.

Relevant Deadlines

ASAP

Funding Source

2702691 2792541

Define the Metadata

Assignee

Select engineer(s) or no engineer required
Select scientist(s) or no scientist required

Labels

Select component(s)
Select priority
Select requestor(s)

Projects and Milestone

Select Repository and/or Organization level Project(s) or add alert: NEED PROJECT ASSIGNMENT label
Select Milestone as the next official version or Future Versions

Define Related Issue(s)

Consider the impact to the other METplus components.

METplus, MET, METdatadb, METviewer, METexpress, METcalcpy, METplotpy

Enhancement Checklist

See the METplus Workflow for details.

The text was updated successfully, but these errors were encountered:

hankenstein2 · 2022-07-12T16:29:57Z

Look into paying for more disk space to host bigger data sets
Look into hosting a TDS(Thredds) server to dynamically host data, i.e. just grab what you want.
Probably need to split up datasets more starting with S2S

hankenstein2 · 2022-07-12T16:35:05Z

Try to split s2s data into two parts that are two descriptive s2s categories, i.e. s2s_ocean or s2s_fubar

hankenstein2 · 2022-07-12T16:44:01Z

Phase, OMI, RMM can be grouped (7-10) NJO
Blocking, Weather Regime (1-3, 11) Blocking
s2s_mjo
s2s_mid_lat
s2s - for all the rest (0,4-6,12-14)

georgemccabe · 2022-07-12T17:04:26Z

Proposed groupings for splitting up s2s use cases

s2s_mjo

UserScript_obsERA_obsOnly_PhaseDiagram
UserScript_fcstGFS_obsERA_OMI
UserScript_obsERA_obsOnly_OMI
UserScript_obsERA_obsOnly_RMM

s2s_mid_lat

UserScript_fcstGFS_obsERA_Blocking
UserScript_obsERA_obsOnly_Blocking
UserScript_obsERA_obsOnly_WeatherRegime
UserScript_fcstGFS_obsERA_WeatherRegime

s2s

GridStat_SeriesAnalysis_fcstNMME_obsCPC_seasonal_forecast
TCGen_fcstGFSO_obsBDECKS_GDF_TDF
UserScript_obsPrecip_obsOnly_Hovmoeller
UserScript_obsPrecip_obsOnly_CrossSpectraPlot
UserScript_obsERA_obsOnly_Stratosphere
SeriesAnalysis_fcstCFSv2_obsGHCNCAMS_climoStandardized_MultiStatisticTool
GridStat_fcstCFSv2_obsGHCNCAMS_MultiTercile

Should the s2s group have an additional identifier? Should/can this group be divided into smaller groups?

TaraJensen · 2022-07-12T17:12:28Z

@CPKalb @georgemccabe @j-opatz - please run these stratifications past CPC, PSL, etc... to see if they make sense to them. After all - they are the S2S community we are serving. Thanks!

CPKalb · 2022-07-12T18:23:39Z

I just heard back from Maria, and she thinks s2s_mjo is great! I'll let you know when I hear back about the blocking weather regime

…dingly, and turned on all s2s use cases to test that they all run successfully after the changes

CPKalb · 2022-07-12T21:17:26Z

I just heard back from Doug, and he thinks s2s_mid_lat is a good name as well.

…ort range in the Verification Datasets section of the documentation

* per #1694, moved 4 use cases from s2s to s2s_mjo, updated paths accordingly, and turned on all s2s use cases to test that they all run successfully after the changes * per #1694, fixed paths to s2s_mjo conf files * updated documentation for use cases that were moved from s2s to s2s_mjo * attempt to free up unused disk space in GHA runner environment * moved 4 s2s use cases into s2s_mid_lat * added new model application categories to contrib guide for adding new use cases * per #947, changed convection_allowing_models use cases to short_range * changed which use case tests run to the ones that are failing and added other METdbLoad use case to see if that fails as well * test to determine which files are preventing MySQL database from being created properly * test 2 to determine which files are preventing MySQL database from being created properly * test 3 to see if removing these files is not the cause of the METdbLoad failure * updated references to METdatadb to METdataio since the repository was renamed * fixed typo * changed path to sql file needed to create database because it was moved from METviewer to METdataio * fixed path to sql file that was moved from METviewer to METdataio * removed temporary fix because metdataio conda env was created in the dtcenter/metplus-envs:metdataio Docker image * added note to update path when METviewer Dockerfile changes to reflect METdatadb rename to METdataio, ci-skip-unit-tests * fixed path to METdataio repo * add back commands to free up disk space because issue with METdbLoad use case was likely not related, ci-skip-unit-tests * run all tests with ci-run-all-diff * remove use case group added for testing, ci-skip-all * changed exit code for diff tests to 2 so it is easier to see if a use case test job failed due to an actual failure or due to differences in the output * changed grouping of s2s mid lat use cases to original grouping to prevent warning that artifact contains more than 10,000 files. The 2 WeatherRegime use cases produce a lot of output files, so splitting them up should resolve this warning * per #1694, changed all references to convection allowing models to short range in the Verification Datasets section of the documentation * changed URLs to develop version of documentation to a URL relative to the current version of the documentation to match the quick search links from the METplus User's Guide * per #947, changed references to convection_allowing_model (without the s) to short_range that were missed * updated use case test scripts to rename convection_allowing_models to short_range and added note to alert developers that the list of use cases in the script is not maintained and therefore not complete

georgemccabe added this to the METplus-5.0.0 milestone Jul 11, 2022

georgemccabe self-assigned this Jul 11, 2022

georgemccabe added this to METplus-Wrappers-5.0.0-beta2 (8/3/22) Jul 11, 2022

georgemccabe moved this to Todo in METplus-Wrappers-5.0.0-beta2 (8/3/22) Jul 11, 2022

georgemccabe added a commit that referenced this issue Jul 12, 2022

per #1694, moved 4 use cases from s2s to s2s_mjo, updated paths accor…

900443a

…dingly, and turned on all s2s use cases to test that they all run successfully after the changes

georgemccabe added a commit that referenced this issue Jul 12, 2022

per #1694, fixed paths to s2s_mjo conf files

12bcfba

georgemccabe mentioned this issue Jul 13, 2022

Change all references from Convection Allowing Models to Short Range #947

Closed

21 tasks

georgemccabe moved this from Todo to In Progress in METplus-Wrappers-5.0.0-beta2 (8/3/22) Jul 13, 2022

georgemccabe linked a pull request Jul 14, 2022 that will close this issue

Feature 1694 s2s subgroups #1697

Merged

14 tasks

TaraJensen added reporting: DTC NCAR Base NCAR Base DTC Project reporting: DTC NOAA BASE NOAA Office of Atmospheric Research DTC Project and removed alert: NEED ACCOUNT KEY Need to assign an account key to this issue labels Jul 14, 2022

georgemccabe moved this from In Progress to Review in METplus-Wrappers-5.0.0-beta2 (8/3/22) Jul 14, 2022

georgemccabe removed the alert: NEED MORE DEFINITION Not yet actionable, additional definition required label Jul 14, 2022

georgemccabe added a commit that referenced this issue Jul 18, 2022

per #1694, changed all references to convection allowing models to sh…

89ea557

…ort range in the Verification Datasets section of the documentation

georgemccabe closed this as completed Jul 19, 2022

Repository owner moved this from Review to Done in METplus-Wrappers-5.0.0-beta2 (8/3/22) Jul 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Internal: Add support for creating multiple input datasets for use case categories #1694

Internal: Add support for creating multiple input datasets for use case categories #1694

georgemccabe commented Jul 11, 2022 •

edited by TaraJensen

Loading

hankenstein2 commented Jul 12, 2022

hankenstein2 commented Jul 12, 2022

hankenstein2 commented Jul 12, 2022

georgemccabe commented Jul 12, 2022

TaraJensen commented Jul 12, 2022

CPKalb commented Jul 12, 2022 •

edited

Loading

CPKalb commented Jul 12, 2022

Internal: Add support for creating multiple input datasets for use case categories #1694

Internal: Add support for creating multiple input datasets for use case categories #1694

Comments

georgemccabe commented Jul 11, 2022 • edited by TaraJensen Loading

Describe the Enhancement

Time Estimate

Sub-Issues

Relevant Deadlines

Funding Source

Define the Metadata

Assignee

Labels

Projects and Milestone

Define Related Issue(s)

Enhancement Checklist

hankenstein2 commented Jul 12, 2022

hankenstein2 commented Jul 12, 2022

hankenstein2 commented Jul 12, 2022

georgemccabe commented Jul 12, 2022

s2s_mjo

s2s_mid_lat

s2s

TaraJensen commented Jul 12, 2022

CPKalb commented Jul 12, 2022 • edited Loading

CPKalb commented Jul 12, 2022

georgemccabe commented Jul 11, 2022 •

edited by TaraJensen

Loading

CPKalb commented Jul 12, 2022 •

edited

Loading