-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Internal: Add support for creating multiple input datasets for use case categories #1694
Comments
Look into paying for more disk space to host bigger data sets |
Try to split s2s data into two parts that are two descriptive s2s categories, i.e. s2s_ocean or s2s_fubar |
Phase, OMI, RMM can be grouped (7-10) NJO |
Proposed groupings for splitting up s2s use cases s2s_mjo
s2s_mid_lat
s2s
Should the s2s group have an additional identifier? Should/can this group be divided into smaller groups? |
@CPKalb @georgemccabe @j-opatz - please run these stratifications past CPC, PSL, etc... to see if they make sense to them. After all - they are the S2S community we are serving. Thanks! |
I just heard back from Maria, and she thinks s2s_mjo is great! I'll let you know when I hear back about the blocking weather regime |
…dingly, and turned on all s2s use cases to test that they all run successfully after the changes
I just heard back from Doug, and he thinks s2s_mid_lat is a good name as well. |
…ort range in the Verification Datasets section of the documentation
* per #1694, moved 4 use cases from s2s to s2s_mjo, updated paths accordingly, and turned on all s2s use cases to test that they all run successfully after the changes * per #1694, fixed paths to s2s_mjo conf files * updated documentation for use cases that were moved from s2s to s2s_mjo * attempt to free up unused disk space in GHA runner environment * moved 4 s2s use cases into s2s_mid_lat * added new model application categories to contrib guide for adding new use cases * per #947, changed convection_allowing_models use cases to short_range * changed which use case tests run to the ones that are failing and added other METdbLoad use case to see if that fails as well * test to determine which files are preventing MySQL database from being created properly * test 2 to determine which files are preventing MySQL database from being created properly * test 3 to see if removing these files is not the cause of the METdbLoad failure * updated references to METdatadb to METdataio since the repository was renamed * fixed typo * changed path to sql file needed to create database because it was moved from METviewer to METdataio * fixed path to sql file that was moved from METviewer to METdataio * removed temporary fix because metdataio conda env was created in the dtcenter/metplus-envs:metdataio Docker image * added note to update path when METviewer Dockerfile changes to reflect METdatadb rename to METdataio, ci-skip-unit-tests * fixed path to METdataio repo * add back commands to free up disk space because issue with METdbLoad use case was likely not related, ci-skip-unit-tests * run all tests with ci-run-all-diff * remove use case group added for testing, ci-skip-all * changed exit code for diff tests to 2 so it is easier to see if a use case test job failed due to an actual failure or due to differences in the output * changed grouping of s2s mid lat use cases to original grouping to prevent warning that artifact contains more than 10,000 files. The 2 WeatherRegime use cases produce a lot of output files, so splitting them up should resolve this warning * per #1694, changed all references to convection allowing models to short range in the Verification Datasets section of the documentation * changed URLs to develop version of documentation to a URL relative to the current version of the documentation to match the quick search links from the METplus User's Guide * per #947, changed references to convection_allowing_model (without the s) to short_range that were missed * updated use case test scripts to rename convection_allowing_models to short_range and added note to alert developers that the list of use cases in the script is not maintained and therefore not complete
Currently the automated tests are set up so that each model_applications category corresponds to an input data set that contains all of the data required to run all of the use cases in that category. The s2s input data set has become so large that while it doesn't exceed the maximum allowable size for the Docker data volume that stores it to use in the tests, but use case test groups that use this data run out of disk space when they write output data from the use cases.
The use case groups that fail also use the Conda environments required for METplotpy and METcalcpy, which are very large in size due to the many Python package dependencies. This also contributes to the total disk size that can be used in the test environment. The newly created metplotpy environment for #1566 is much larger in size than the existing environment, so this may cause disk space issues when that work is completed.
Size of current conda environments:
Size of conda environments using Python 3.8.6 and updated package requirements:
We may need to reconsider new requirements of use cases and how to group them in the tests, including:
Describe the Enhancement
Time Estimate
1-3 days
Sub-Issues
Consider breaking the enhancement down into sub-issues.
Relevant Deadlines
ASAP
Funding Source
2702691 2792541
Define the Metadata
Assignee
Labels
Projects and Milestone
Define Related Issue(s)
Consider the impact to the other METplus components.
Enhancement Checklist
See the METplus Workflow for details.
Branch name:
feature_<Issue Number>_<Description>
Pull request:
feature <Issue Number> <Description>
Select: Reviewer(s) and Linked issues
Select: Repository level development cycle Project for the next official release
Select: Milestone as the next official version
The text was updated successfully, but these errors were encountered: