Testing suite #9

mattdturner · 2017-06-19T19:17:29Z

Modifications and enhancements to the CICE testing suite.

…st. Replace the cice.<test_name>.csh scripts with cice.test.csh.

… file if the CICE_TEST variable is defined

…ation to the test_output file (only if CICE_TEST is defined)

… script. The cice.test.csh script adds a new environment variable (CICE_TEST) to cice.settings. The build process has been removed from cice.test.csh, and replaced with a check to see if the executable exists. If not, the script prints an error message and exits.

…ting_suite Conflicts: configuration/scripts/tests/cice.annual.csh configuration/scripts/tests/cice.smoke.csh

…e.batch.csh. Move job launch logic from cice.run.setup.csh and cice.test.csh to cice.launch.csh.

….case to generate a test

…in the case_dir name

… and -bd are not specified

…f smoke, 10day, annual, or restart

…er file. This is necessary for the exact restart test

…cases. Also add the testid field to README_v6

… files

…emove writing to test_output from cice.run.setup.csh

…ut the test (baseline generating? baseline directory, etc.). Add output to test_output after call to cice.run. Add conditional to check for output data if the test is a baseline-generating test, and write PASS/FAIL to test_output

…ments. Remove 'baseline' from casename. Always add testid to casename if running a test and not generating a baseline dataset. Add new code to modify CICE_RUNDIR to the baseline directory if generating a baseline dataset. Add new code to modify CICE_BASELINE to the baseline directory if not generating a baseline dataset. Modification to how cice.test.csh is called.

…eplace calls to perl (when modifying namelists) with a call to parse_namelist.sh. Slight modifications to allow for the restart case to be re-run successfully.

…ile to casescripts directory

…uite

…or setting CICE_BASELINE, remove call to cice.restart.csh

…mbine the cice.restart.csh and cice.test.csh scripts into a single cice.test.csh script.

eclare108213 · 2017-06-22T01:21:30Z

I have tested the tests:
-t smoke (baseline and comparison)
-t 10day (baseline and comparison)
-t restart (comparison -- extra baseline is not needed)
-t annual (baseline only)
and also a default run without -t. This all looks good to me. I didn’t check the -bg and -bc flags, and I did not try to do a regression test with different versions of the code.

Some changes are needed in README.test for consistency:

For the baseline quickstart, change "cd smoke_baseline_gx3_conrad_4x1" to "cd smoke_gx3_conrad_4x1"
For the comparison quickstart, change "cd smoke_gx3_conrad_4x1" to "cd smoke_gx3_conrad_4x1.t00"
Please provide the -t arguments for each test (smoke, 10day, restart, annual) so we don't have to look at the create.case script. You could suggest "create.case -h" for further info.
State explicitly that a baseline does not need to be created separately for restart runs.
Under “Additional Details”, should the second bullet be about -bc rather than -bd?

Other comments/questions:

It would be nice to set diagfreq to print diagnostics during the annual run, e.g. once a day, in case something goes wrong during the run and we want to see if anything looks particularly out of whack. But if this is inconvenient from a scripting point of view, it’s fine the way it is. We can easily change the namelist and re-run.

How do we know what the version names are for regression testing, particularly the new code with our most recent changes, which has been committed to our fork but not yet pulled to the consortium?

eclare108213

The tests that I tried worked for me. I'm requesting that the README.test file be updated to be consistent with the scripts in this version.

eclare108213 · 2017-06-22T01:30:45Z

README.test

+                     files to a baseline dataset.
+  3. Exact restart - Ensures that the output from a cold-start simulation
+                     and a restart simulation provide bit-for-bit identical
+                     results.


State explicitly that a baseline does not need to be created for restart tests

eclare108213 · 2017-06-22T01:31:26Z

README.test

+Quickstart (example):
+
+./create.case -t smoke -m conrad -b
+cd smoke_baseline_gx3_conrad_4x1


remove _baseline from first directory and add .t00 to second one

eclare108213 · 2017-06-22T01:33:04Z

README.test

+                     to completion for a 1 day simulation.  Validation is
+                     determined by performing a binary comparison on restart
+                     files to a baseline dataset.
+  2. 10-day test   - Ensures that the model compiles successfully and runs


add the actual arguments needed for -t (smoke, 10day, restart, annual)

eclare108213 · 2017-06-22T01:33:45Z

README.test

+  want the baseline dataset to be written.  Without specifying '-bd', the
+  baseline dataset will be written to the default baseline directory found
+  in the env.<machine> file (CICE_MACHINE_BASELINE).
+- If '-b' is not passed, the '-bd' option will specify the location of the


should this be -bc instead of -bd?

…nts to the README.test file.

… argument passed to '-t' in order to create the test case

mattdturner · 2017-06-22T14:31:14Z

I just updated README.test per your comments. I also changed the diagfreq variable to 24 for the annual test case.

Regarding the version names, I'm not sure of a good way to answer that. I would imagine that the regression tests would be performed against the most recent release version. But I'm not sure how we would want to handle version naming. Would it make sense to add an option to create.case that would print the available version names in the baseline directory?

eclare108213

These changes address my concerns.

eclare108213 · 2017-06-23T00:52:29Z

This looks fine to me and I'm willing to do the pull request, partly so that I can start using these tests for changes that I make. However I would like @dabail10 or @apcraig to weigh in. Tony had some questions about the -b, -bg, -bc flags. Dave, I didn't test the -bg and -bc flags, so it would be great if you could do that for at least one of the tests. I'll hold off on merging the pull request for now, but it can be done before any other reviews.

dabail10

Is the intent to only have one increment of a test? I generated a baseline test:

smoke_gx3_cheyenne_4x1

and then the compare test:

smoke_gx3_cheyenne_4x1.t00

Then, I thought I would try to generate a second compare test and expected the t00 to increment to t01. However, it tried to generate t00 again.

mattdturner · 2017-06-27T15:32:59Z

The intent was originally to not have it auto-increment, but instead error if there was already a case directory with that name. If we think auto-incrementing would be better for the test cases, I can implement that. As of right now, you would have to pass a different number to the -testid flag (and it doesn't have to be of the 't##' variety): ./create.case -m conrad -t smoke -testid pull01

dabail10 · 2017-06-27T15:38:07Z

Excellent. I missed the -testid flag. When we generate test suites for the CESM, we give them ids of the form: YYYYMMDD_nnnnnn_ssssss. I don't think you need something this complicated, but it might be nice to have a date string so we understand where the baselines came from. I'm running the other test cases now to see that they work as expected on our machine.

mattdturner · 2017-06-27T15:46:17Z

For now, I will update README.test to include information about using the -testid flag (it should have been there). Perhaps on the next testing telecon (or maybe once more users test the testing scripts), we can iron out the finer details (such as the YYYMMDD_nnnnnn_ssssss test id). We do have the date and time printed in README.case, which is within the case directory. This file gets a message printed to it when the case is created, the build script completes, and the run / test scripts are run.

dabail10 · 2017-06-27T15:49:07Z

Sounds good. These are more enhancement ideas. I am running through the tests to make sure they work as advertised. I should be able to submit my review today.

dabail10 · 2017-06-27T19:07:11Z

A couple enhancement suggestions as well, and I am happy to add these.

A top level script called "create_test_suite" or something that will generate the four tests. This can have the option to generate baselines or not.
Maybe a top level script that parses the "test_output" files. I can sort of do this manually with "cat"
[dbailey@cheyenne4 ~/CICE_matt]> cat */test_output

PASS 10day_gx3_cheyenne_4x1.t00 build
PASS 10day_gx3_cheyenne_4x1.t00 run
PASS 10day_gx3_cheyenne_4x1.t00 compare
PASS 10day_gx3_cheyenne_4x1 build
PASS 10day_gx3_cheyenne_4x1 run
PASS 10day_gx3_cheyenne_4x1 generate
PASS annual_gx3_cheyenne_4x1.t00 build
PASS annual_gx3_cheyenne_4x1 build
PASS restart_gx3_cheyenne_4x1.t00 build
PASS restart_gx3_cheyenne_4x1.t00 2-day-run
PASS restart_gx3_cheyenne_4x1.t00 restart-run
PASS restart_gx3_cheyenne_4x1.t00 compare
PASS restart_gx3_cheyenne_4x1 build
PASS restart_gx3_cheyenne_4x1 2-day-run
PASS restart_gx3_cheyenne_4x1 restart-run
PASS restart_gx3_cheyenne_4x1 compare
PASS smoke_gx3_cheyenne_4x1.t00 build
FAIL smoke_gx3_cheyenne_4x1.t00 run
PASS smoke_gx3_cheyenne_4x1.t00 build
PASS smoke_gx3_cheyenne_4x1.t00 run
FAIL smoke_gx3_cheyenne_4x1.t00 compare
PASS smoke_gx3_cheyenne_4x1 build
PASS smoke_gx3_cheyenne_4x1 run
PASS smoke_gx3_cheyenne_4x1 generate

Note that the fails here were intentional. I changed ktherm to 1 in smoke_gx3_cheyenne_4x1.t00 and this caused the run to crash because it does not understand the initial files in this case. I then changed ktherm back to 2 and change mu_rdg to 4. This runs, but causes the compare to fail.

I am still testing the annual tests using the -bg and -bd flags. Note that these do nothing if you forget the -b flag. This I assume is the intended behavior.

dabail10

I am happy to approve the changes to this point. This is a really nice functionality. I did test the -bg/-bc flags and these seem to work as advertised. In my case, I did a regression test of 6.0.1 versus 6.0.0. I did not have a baseline for 6.0.0, so it failed. I couldn't find log information to this effect though. This is what I got:

./cice.run:

CICE rundir is /glade/scratch/dbailey/CICE_BASELINE/cicev6.0.1/smoke_gx3_cheyenne_4x1
CICE log file is cice.runlog.170627-132201
CICE run started : Tue Jun 27 13:22:01 MDT 2017
CICE run finished: Tue Jun 27 13:22:05 MDT 2017

CICE COMPLETED SUCCESSFULLY
done ./cice.run
Performing binary comparison between files:
baseline: /glade/scratch/dbailey/CICE_BASELINE/cicev6.0.0/smoke_gx3_cheyenne_4x1/restart/iced.1998-01-02-00000.nc
test: /glade/scratch/dbailey/CICE_BASELINE/cicev6.0.1/smoke_gx3_cheyenne_4x1/restart/iced.1998-01-02-00000.nc

I would have a liked a message saying the one baseline did not exist.

mattdturner · 2017-06-27T20:58:31Z

Thanks for the suggestions. The testing suite is still evolving, so its a great time for suggestions!

There are plans to add functionality to run a single script and have it generate an array of tests, although I hadn't tought about including the option to have it also generate baseline datasets. I have also thought about parsing the test_output files for the suites (once developed) to show the PASS/FAIL status of each tests and give an overall score (something along the lines of "4 out of 5 tests passed").

You should not need to use the '-b' flag in order to use '-bc' or '-bg'. I am working on an update to create.case that removes this requirement.

Regarding your regression test failing, there definitely should be a clear message stating why the test failed. I will add this.

There is still a decent amount of development that needs to be done on the testing scripts, but if the pull request is merged then other users can start testing the new scripts. I can always create another pull request with the updates and new features.

apcraig · 2017-06-28T20:05:02Z

just a quick followup, i'm a bitout of the loop, but i was thinking we would not have a create_test_suite separately, that we would just use create.case -ts for a test suite, -t for a single test, and -c for a case. Regarding the test reporting, that is still a work in progress. there are many things still to do. Finally, just a comment about test suites and generating baselines. This is part of the plan. It would look like ./create.case -ts biglist -testid 601 -bg cice.601 -bc cice.600 that would generate new baselines for "cice.601" and compare to the prior baseline, "cice.600" for all tests in "biglist". Again, you could use -bg and/or -bc or neither as needed. tony.......

…

On 6/27/17 12:07 PM, dabail10 wrote: A couple enhancement suggestions as well, and I am happy to add these. 1. A top level script called "create_test_suite" or something that will generate the four tests. This can have the option to generate baselines or not. 2. Maybe a top level script that parses the "test_output" files. I can sort of do this manually with "cat" ***@***.*** ~/CICE_matt]> cat */test_output PASS 10day_gx3_cheyenne_4x1.t00 build PASS 10day_gx3_cheyenne_4x1.t00 run PASS 10day_gx3_cheyenne_4x1.t00 compare PASS 10day_gx3_cheyenne_4x1 build PASS 10day_gx3_cheyenne_4x1 run PASS 10day_gx3_cheyenne_4x1 generate PASS annual_gx3_cheyenne_4x1.t00 build PASS annual_gx3_cheyenne_4x1 build PASS restart_gx3_cheyenne_4x1.t00 build PASS restart_gx3_cheyenne_4x1.t00 2-day-run PASS restart_gx3_cheyenne_4x1.t00 restart-run PASS restart_gx3_cheyenne_4x1.t00 compare PASS restart_gx3_cheyenne_4x1 build PASS restart_gx3_cheyenne_4x1 2-day-run PASS restart_gx3_cheyenne_4x1 restart-run PASS restart_gx3_cheyenne_4x1 compare PASS smoke_gx3_cheyenne_4x1.t00 build FAIL smoke_gx3_cheyenne_4x1.t00 run PASS smoke_gx3_cheyenne_4x1.t00 build PASS smoke_gx3_cheyenne_4x1.t00 run FAIL smoke_gx3_cheyenne_4x1.t00 compare PASS smoke_gx3_cheyenne_4x1 build PASS smoke_gx3_cheyenne_4x1 run PASS smoke_gx3_cheyenne_4x1 generate Note that the fails here were intentional. I changed ktherm to 1 in smoke_gx3_cheyenne_4x1.t00 and this caused the run to crash because it does not understand the initial files in this case. I then changed ktherm back to 2 and change mu_rdg to 4. This runs, but causes the compare to fail. I am still testing the annual tests using the -bg and -bd flags. Note that these do nothing if you forget the -b flag. This I assume is the intended behavior. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHzcHvIspbBSXn1kwvvN9pbBxOIYtF_rks5sIVLfgaJpZM4N-q1v>.

turner and others added 26 commits June 12, 2017 15:35

Test scripts now write to README.case instead of a separate README.te…

095f3bf

…st. Replace the cice.<test_name>.csh scripts with cice.test.csh.

when checking for compile success, write PASS/FAIL to new test_output…

34cd5f0

… file if the CICE_TEST variable is defined

update cice.run.setup.csh to add the option to write PASS/FAIL inform…

cfabe15

…ation to the test_output file (only if CICE_TEST is defined)

Merge branch 'master' of https://github.com/mattdturner/CICE into tes…

576c0cb

…ting_suite Conflicts: configuration/scripts/tests/cice.annual.csh configuration/scripts/tests/cice.smoke.csh

Move batch logic from cice.run.setup.csh and cice.test.csh to new cic…

b2aafa2

…e.batch.csh. Move job launch logic from cice.run.setup.csh and cice.test.csh to cice.launch.csh.

Remove the requirement for -b or -bd to be provided when using create…

5668668

….case to generate a test

Add the option to pass a test ID to create.case, which gets included …

17d722d

…in the case_dir name

Modification to create.case to properly handle running tests where -b…

f52288d

… and -bd are not specified

Add a check in create.case to ensure that the test requested is one o…

d2dbf28

…f smoke, 10day, annual, or restart

Update cice.run.setup.csh to not overwrite the ice.restart_file point…

0db5c73

…er file. This is necessary for the exact restart test

Create new exact restart test case.

d224516

Add new README.test file that gives instructions on running the test …

2ddbdd9

…cases. Also add the testid field to README_v6

add CICE_MACHINE_BASELINE default baseline directory to env.<machine>…

4b7b976

… files

bugfix for cheyenne machine. Also added error if using unknown machine

a3c6c85

Check return status of calls to cice.launch.csh and cice.batch.csh. R…

473d158

…emove writing to test_output from cice.run.setup.csh

Update the documentation for the recent modifications to create.case

91e432f

add default CICE_TEST and CICE_BASELINE variables to cice.settings file

ff71bdb

Add PASS/FAIL commands to cice.restart.csh after calls to cice.run. R…

6482090

…eplace calls to perl (when modifying namelists) with a call to parse_namelist.sh. Slight modifications to allow for the restart case to be re-run successfully.

For restart test case, create.case now copies new test_nml.restart2 f…

f5e4543

…ile to casescripts directory

add new test_nml.restart2 options file

374955d

Modify README.test to account for the recent updates to the testing s…

356f958

…uite

Add logic to handle cases where -bg and -bc are both passed. Bugfix f…

7716032

…or setting CICE_BASELINE, remove call to cice.restart.csh

Add test name as an argument. Add logic to handle regression runs. Co…

39b3e56

…mbine the cice.restart.csh and cice.test.csh scripts into a single cice.test.csh script.

eclare108213 requested changes Jun 22, 2017

View reviewed changes

mattdturner added 2 commits June 22, 2017 14:17

A few corrections in README.test. Also added a few clarifying stateme…

204fd6c

…nts to the README.test file.

Change diagfreq from once per year to daily for annual test case

cb845ef

Update the list of available tests in README.test to also specify the…

5445671

… argument passed to '-t' in order to create the test case

eclare108213 approved these changes Jun 23, 2017

View reviewed changes

eclare108213 requested a review from dabail10 June 23, 2017 00:47

dabail10 requested changes Jun 27, 2017

View reviewed changes

Add information about the -testid flag to README.test

3984d99

dabail10 approved these changes Jun 27, 2017

View reviewed changes

eclare108213 merged commit 44af118 into CICE-Consortium:master Jun 27, 2017

mattdturner deleted the testing_suite branch December 5, 2017 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing suite #9

Testing suite #9

mattdturner commented Jun 19, 2017

eclare108213 commented Jun 22, 2017

eclare108213 left a comment

eclare108213 Jun 22, 2017

eclare108213 Jun 22, 2017

eclare108213 Jun 22, 2017

eclare108213 Jun 22, 2017

mattdturner commented Jun 22, 2017

eclare108213 left a comment

eclare108213 commented Jun 23, 2017

dabail10 left a comment

mattdturner commented Jun 27, 2017

dabail10 commented Jun 27, 2017

mattdturner commented Jun 27, 2017

dabail10 commented Jun 27, 2017

dabail10 commented Jun 27, 2017

dabail10 left a comment

mattdturner commented Jun 27, 2017

apcraig commented Jun 28, 2017 via email

Testing suite #9

Testing suite #9

Conversation

mattdturner commented Jun 19, 2017

eclare108213 commented Jun 22, 2017

eclare108213 left a comment

Choose a reason for hiding this comment

eclare108213 Jun 22, 2017

Choose a reason for hiding this comment

eclare108213 Jun 22, 2017

Choose a reason for hiding this comment

eclare108213 Jun 22, 2017

Choose a reason for hiding this comment

eclare108213 Jun 22, 2017

Choose a reason for hiding this comment

mattdturner commented Jun 22, 2017

eclare108213 left a comment

Choose a reason for hiding this comment

eclare108213 commented Jun 23, 2017

dabail10 left a comment

Choose a reason for hiding this comment

mattdturner commented Jun 27, 2017

dabail10 commented Jun 27, 2017

mattdturner commented Jun 27, 2017

dabail10 commented Jun 27, 2017

dabail10 commented Jun 27, 2017

dabail10 left a comment

Choose a reason for hiding this comment

mattdturner commented Jun 27, 2017

apcraig commented Jun 28, 2017 via email