-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testing suite #9
Testing suite #9
Conversation
…st. Replace the cice.<test_name>.csh scripts with cice.test.csh.
… file if the CICE_TEST variable is defined
…ation to the test_output file (only if CICE_TEST is defined)
… script. The cice.test.csh script adds a new environment variable (CICE_TEST) to cice.settings. The build process has been removed from cice.test.csh, and replaced with a check to see if the executable exists. If not, the script prints an error message and exits.
…ting_suite Conflicts: configuration/scripts/tests/cice.annual.csh configuration/scripts/tests/cice.smoke.csh
…e.batch.csh. Move job launch logic from cice.run.setup.csh and cice.test.csh to cice.launch.csh.
….case to generate a test
…in the case_dir name
… and -bd are not specified
…f smoke, 10day, annual, or restart
…er file. This is necessary for the exact restart test
…cases. Also add the testid field to README_v6
…emove writing to test_output from cice.run.setup.csh
…ut the test (baseline generating? baseline directory, etc.). Add output to test_output after call to cice.run. Add conditional to check for output data if the test is a baseline-generating test, and write PASS/FAIL to test_output
…ments. Remove 'baseline' from casename. Always add testid to casename if running a test and not generating a baseline dataset. Add new code to modify CICE_RUNDIR to the baseline directory if generating a baseline dataset. Add new code to modify CICE_BASELINE to the baseline directory if not generating a baseline dataset. Modification to how cice.test.csh is called.
…eplace calls to perl (when modifying namelists) with a call to parse_namelist.sh. Slight modifications to allow for the restart case to be re-run successfully.
…ile to casescripts directory
…or setting CICE_BASELINE, remove call to cice.restart.csh
…mbine the cice.restart.csh and cice.test.csh scripts into a single cice.test.csh script.
I have tested the tests: Some changes are needed in README.test for consistency:
Other comments/questions: It would be nice to set diagfreq to print diagnostics during the annual run, e.g. once a day, in case something goes wrong during the run and we want to see if anything looks particularly out of whack. But if this is inconvenient from a scripting point of view, it’s fine the way it is. We can easily change the namelist and re-run. How do we know what the version names are for regression testing, particularly the new code with our most recent changes, which has been committed to our fork but not yet pulled to the consortium? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests that I tried worked for me. I'm requesting that the README.test file be updated to be consistent with the scripts in this version.
README.test
Outdated
files to a baseline dataset. | ||
3. Exact restart - Ensures that the output from a cold-start simulation | ||
and a restart simulation provide bit-for-bit identical | ||
results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
State explicitly that a baseline does not need to be created for restart tests
README.test
Outdated
Quickstart (example): | ||
|
||
./create.case -t smoke -m conrad -b | ||
cd smoke_baseline_gx3_conrad_4x1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove _baseline from first directory and add .t00 to second one
README.test
Outdated
to completion for a 1 day simulation. Validation is | ||
determined by performing a binary comparison on restart | ||
files to a baseline dataset. | ||
2. 10-day test - Ensures that the model compiles successfully and runs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the actual arguments needed for -t (smoke, 10day, restart, annual)
README.test
Outdated
want the baseline dataset to be written. Without specifying '-bd', the | ||
baseline dataset will be written to the default baseline directory found | ||
in the env.<machine> file (CICE_MACHINE_BASELINE). | ||
- If '-b' is not passed, the '-bd' option will specify the location of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be -bc instead of -bd?
… argument passed to '-t' in order to create the test case
I just updated README.test per your comments. I also changed the diagfreq variable to 24 for the annual test case. Regarding the version names, I'm not sure of a good way to answer that. I would imagine that the regression tests would be performed against the most recent release version. But I'm not sure how we would want to handle version naming. Would it make sense to add an option to create.case that would print the available version names in the baseline directory? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes address my concerns.
This looks fine to me and I'm willing to do the pull request, partly so that I can start using these tests for changes that I make. However I would like @dabail10 or @apcraig to weigh in. Tony had some questions about the -b, -bg, -bc flags. Dave, I didn't test the -bg and -bc flags, so it would be great if you could do that for at least one of the tests. I'll hold off on merging the pull request for now, but it can be done before any other reviews. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the intent to only have one increment of a test? I generated a baseline test:
smoke_gx3_cheyenne_4x1
and then the compare test:
smoke_gx3_cheyenne_4x1.t00
Then, I thought I would try to generate a second compare test and expected the t00 to increment to t01. However, it tried to generate t00 again.
The intent was originally to not have it auto-increment, but instead error if there was already a case directory with that name. If we think auto-incrementing would be better for the test cases, I can implement that. As of right now, you would have to pass a different number to the -testid flag (and it doesn't have to be of the 't##' variety): ./create.case -m conrad -t smoke -testid pull01 |
Excellent. I missed the -testid flag. When we generate test suites for the CESM, we give them ids of the form: YYYYMMDD_nnnnnn_ssssss. I don't think you need something this complicated, but it might be nice to have a date string so we understand where the baselines came from. I'm running the other test cases now to see that they work as expected on our machine. |
For now, I will update README.test to include information about using the -testid flag (it should have been there). Perhaps on the next testing telecon (or maybe once more users test the testing scripts), we can iron out the finer details (such as the YYYMMDD_nnnnnn_ssssss test id). We do have the date and time printed in README.case, which is within the case directory. This file gets a message printed to it when the case is created, the build script completes, and the run / test scripts are run. |
Sounds good. These are more enhancement ideas. I am running through the tests to make sure they work as advertised. I should be able to submit my review today. |
A couple enhancement suggestions as well, and I am happy to add these.
PASS 10day_gx3_cheyenne_4x1.t00 build Note that the fails here were intentional. I changed ktherm to 1 in smoke_gx3_cheyenne_4x1.t00 and this caused the run to crash because it does not understand the initial files in this case. I then changed ktherm back to 2 and change mu_rdg to 4. This runs, but causes the compare to fail. I am still testing the annual tests using the -bg and -bd flags. Note that these do nothing if you forget the -b flag. This I assume is the intended behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy to approve the changes to this point. This is a really nice functionality. I did test the -bg/-bc flags and these seem to work as advertised. In my case, I did a regression test of 6.0.1 versus 6.0.0. I did not have a baseline for 6.0.0, so it failed. I couldn't find log information to this effect though. This is what I got:
./cice.run:
CICE rundir is /glade/scratch/dbailey/CICE_BASELINE/cicev6.0.1/smoke_gx3_cheyenne_4x1
CICE log file is cice.runlog.170627-132201
CICE run started : Tue Jun 27 13:22:01 MDT 2017
CICE run finished: Tue Jun 27 13:22:05 MDT 2017
CICE COMPLETED SUCCESSFULLY
done ./cice.run
Performing binary comparison between files:
baseline: /glade/scratch/dbailey/CICE_BASELINE/cicev6.0.0/smoke_gx3_cheyenne_4x1/restart/iced.1998-01-02-00000.nc
test: /glade/scratch/dbailey/CICE_BASELINE/cicev6.0.1/smoke_gx3_cheyenne_4x1/restart/iced.1998-01-02-00000.nc
I would have a liked a message saying the one baseline did not exist.
Thanks for the suggestions. The testing suite is still evolving, so its a great time for suggestions! There are plans to add functionality to run a single script and have it generate an array of tests, although I hadn't tought about including the option to have it also generate baseline datasets. I have also thought about parsing the test_output files for the suites (once developed) to show the PASS/FAIL status of each tests and give an overall score (something along the lines of "4 out of 5 tests passed"). You should not need to use the '-b' flag in order to use '-bc' or '-bg'. I am working on an update to create.case that removes this requirement. Regarding your regression test failing, there definitely should be a clear message stating why the test failed. I will add this. There is still a decent amount of development that needs to be done on the testing scripts, but if the pull request is merged then other users can start testing the new scripts. I can always create another pull request with the updates and new features. |
just a quick followup, i'm a bitout of the loop, but i was thinking we
would not have a create_test_suite separately, that we would just use
create.case -ts
for a test suite, -t for a single test, and -c for a case.
Regarding the test reporting, that is still a work in progress. there
are many things still to do.
Finally, just a comment about test suites and generating baselines. This
is part of the plan. It would look like
./create.case -ts biglist -testid 601 -bg cice.601 -bc cice.600
that would generate new baselines for "cice.601" and compare to the
prior baseline, "cice.600" for all tests in "biglist". Again, you could
use -bg and/or -bc or neither as needed.
tony.......
…On 6/27/17 12:07 PM, dabail10 wrote:
A couple enhancement suggestions as well, and I am happy to add these.
1.
A top level script called "create_test_suite" or something that
will generate the four tests. This can have the option to generate
baselines or not.
2.
Maybe a top level script that parses the "test_output" files. I
can sort of do this manually with "cat"
***@***.*** ~/CICE_matt]> cat */test_output
PASS 10day_gx3_cheyenne_4x1.t00 build
PASS 10day_gx3_cheyenne_4x1.t00 run
PASS 10day_gx3_cheyenne_4x1.t00 compare
PASS 10day_gx3_cheyenne_4x1 build
PASS 10day_gx3_cheyenne_4x1 run
PASS 10day_gx3_cheyenne_4x1 generate
PASS annual_gx3_cheyenne_4x1.t00 build
PASS annual_gx3_cheyenne_4x1 build
PASS restart_gx3_cheyenne_4x1.t00 build
PASS restart_gx3_cheyenne_4x1.t00 2-day-run
PASS restart_gx3_cheyenne_4x1.t00 restart-run
PASS restart_gx3_cheyenne_4x1.t00 compare
PASS restart_gx3_cheyenne_4x1 build
PASS restart_gx3_cheyenne_4x1 2-day-run
PASS restart_gx3_cheyenne_4x1 restart-run
PASS restart_gx3_cheyenne_4x1 compare
PASS smoke_gx3_cheyenne_4x1.t00 build
FAIL smoke_gx3_cheyenne_4x1.t00 run
PASS smoke_gx3_cheyenne_4x1.t00 build
PASS smoke_gx3_cheyenne_4x1.t00 run
FAIL smoke_gx3_cheyenne_4x1.t00 compare
PASS smoke_gx3_cheyenne_4x1 build
PASS smoke_gx3_cheyenne_4x1 run
PASS smoke_gx3_cheyenne_4x1 generate
Note that the fails here were intentional. I changed ktherm to 1 in
smoke_gx3_cheyenne_4x1.t00 and this caused the run to crash because it
does not understand the initial files in this case. I then changed
ktherm back to 2 and change mu_rdg to 4. This runs, but causes the
compare to fail.
I am still testing the annual tests using the -bg and -bd flags. Note
that these do nothing if you forget the -b flag. This I assume is the
intended behavior.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHzcHvIspbBSXn1kwvvN9pbBxOIYtF_rks5sIVLfgaJpZM4N-q1v>.
|
Modifications and enhancements to the CICE testing suite.