Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing suite #9

Merged
merged 30 commits into from
Jun 27, 2017
Merged

Conversation

mattdturner
Copy link
Contributor

Modifications and enhancements to the CICE testing suite.

turner and others added 26 commits June 12, 2017 15:35
…st. Replace the cice.<test_name>.csh scripts with cice.test.csh.
… file if the CICE_TEST variable is defined
…ation to the test_output file (only if CICE_TEST is defined)
… script. The cice.test.csh script adds a new environment variable (CICE_TEST) to cice.settings. The build process has been removed from cice.test.csh, and replaced with a check to see if the executable exists. If not, the script prints an error message and exits.
…ting_suite

Conflicts:
	configuration/scripts/tests/cice.annual.csh
	configuration/scripts/tests/cice.smoke.csh
…e.batch.csh. Move job launch logic from cice.run.setup.csh and cice.test.csh to cice.launch.csh.
…er file. This is necessary for the exact restart test
…cases. Also add the testid field to README_v6
…emove writing to test_output from cice.run.setup.csh
…ut the test (baseline generating? baseline directory, etc.). Add output to test_output after call to cice.run. Add conditional to check for output data if the test is a baseline-generating test, and write PASS/FAIL to test_output
…ments. Remove 'baseline' from casename. Always add testid to casename if running a test and not generating a baseline dataset. Add new code to modify CICE_RUNDIR to the baseline directory if generating a baseline dataset. Add new code to modify CICE_BASELINE to the baseline directory if not generating a baseline dataset. Modification to how cice.test.csh is called.
…eplace calls to perl (when modifying namelists) with a call to parse_namelist.sh. Slight modifications to allow for the restart case to be re-run successfully.
…or setting CICE_BASELINE, remove call to cice.restart.csh
…mbine the cice.restart.csh and cice.test.csh scripts into a single cice.test.csh script.
@eclare108213
Copy link
Contributor

I have tested the tests:
-t smoke (baseline and comparison)
-t 10day (baseline and comparison)
-t restart (comparison -- extra baseline is not needed)
-t annual (baseline only)
and also a default run without -t. This all looks good to me. I didn’t check the -bg and -bc flags, and I did not try to do a regression test with different versions of the code.

Some changes are needed in README.test for consistency:

  1. For the baseline quickstart, change "cd smoke_baseline_gx3_conrad_4x1" to "cd smoke_gx3_conrad_4x1"
  2. For the comparison quickstart, change "cd smoke_gx3_conrad_4x1" to "cd smoke_gx3_conrad_4x1.t00"
  3. Please provide the -t arguments for each test (smoke, 10day, restart, annual) so we don't have to look at the create.case script. You could suggest "create.case -h" for further info.
  4. State explicitly that a baseline does not need to be created separately for restart runs.
  5. Under “Additional Details”, should the second bullet be about -bc rather than -bd?

Other comments/questions:

It would be nice to set diagfreq to print diagnostics during the annual run, e.g. once a day, in case something goes wrong during the run and we want to see if anything looks particularly out of whack. But if this is inconvenient from a scripting point of view, it’s fine the way it is. We can easily change the namelist and re-run.

How do we know what the version names are for regression testing, particularly the new code with our most recent changes, which has been committed to our fork but not yet pulled to the consortium?

Copy link
Contributor

@eclare108213 eclare108213 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests that I tried worked for me. I'm requesting that the README.test file be updated to be consistent with the scripts in this version.

README.test Outdated
files to a baseline dataset.
3. Exact restart - Ensures that the output from a cold-start simulation
and a restart simulation provide bit-for-bit identical
results.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

State explicitly that a baseline does not need to be created for restart tests

README.test Outdated
Quickstart (example):

./create.case -t smoke -m conrad -b
cd smoke_baseline_gx3_conrad_4x1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove _baseline from first directory and add .t00 to second one

README.test Outdated
to completion for a 1 day simulation. Validation is
determined by performing a binary comparison on restart
files to a baseline dataset.
2. 10-day test - Ensures that the model compiles successfully and runs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the actual arguments needed for -t (smoke, 10day, restart, annual)

README.test Outdated
want the baseline dataset to be written. Without specifying '-bd', the
baseline dataset will be written to the default baseline directory found
in the env.<machine> file (CICE_MACHINE_BASELINE).
- If '-b' is not passed, the '-bd' option will specify the location of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be -bc instead of -bd?

… argument passed to '-t' in order to create the test case
@mattdturner
Copy link
Contributor Author

I just updated README.test per your comments. I also changed the diagfreq variable to 24 for the annual test case.

Regarding the version names, I'm not sure of a good way to answer that. I would imagine that the regression tests would be performed against the most recent release version. But I'm not sure how we would want to handle version naming. Would it make sense to add an option to create.case that would print the available version names in the baseline directory?

Copy link
Contributor

@eclare108213 eclare108213 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes address my concerns.

@eclare108213 eclare108213 requested a review from dabail10 June 23, 2017 00:47
@eclare108213
Copy link
Contributor

This looks fine to me and I'm willing to do the pull request, partly so that I can start using these tests for changes that I make. However I would like @dabail10 or @apcraig to weigh in. Tony had some questions about the -b, -bg, -bc flags. Dave, I didn't test the -bg and -bc flags, so it would be great if you could do that for at least one of the tests. I'll hold off on merging the pull request for now, but it can be done before any other reviews.

Copy link
Contributor

@dabail10 dabail10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intent to only have one increment of a test? I generated a baseline test:

smoke_gx3_cheyenne_4x1

and then the compare test:

smoke_gx3_cheyenne_4x1.t00

Then, I thought I would try to generate a second compare test and expected the t00 to increment to t01. However, it tried to generate t00 again.

@mattdturner
Copy link
Contributor Author

The intent was originally to not have it auto-increment, but instead error if there was already a case directory with that name. If we think auto-incrementing would be better for the test cases, I can implement that. As of right now, you would have to pass a different number to the -testid flag (and it doesn't have to be of the 't##' variety): ./create.case -m conrad -t smoke -testid pull01

@dabail10
Copy link
Contributor

Excellent. I missed the -testid flag. When we generate test suites for the CESM, we give them ids of the form: YYYYMMDD_nnnnnn_ssssss. I don't think you need something this complicated, but it might be nice to have a date string so we understand where the baselines came from. I'm running the other test cases now to see that they work as expected on our machine.

@mattdturner
Copy link
Contributor Author

For now, I will update README.test to include information about using the -testid flag (it should have been there). Perhaps on the next testing telecon (or maybe once more users test the testing scripts), we can iron out the finer details (such as the YYYMMDD_nnnnnn_ssssss test id). We do have the date and time printed in README.case, which is within the case directory. This file gets a message printed to it when the case is created, the build script completes, and the run / test scripts are run.

@dabail10
Copy link
Contributor

Sounds good. These are more enhancement ideas. I am running through the tests to make sure they work as advertised. I should be able to submit my review today.

@dabail10
Copy link
Contributor

A couple enhancement suggestions as well, and I am happy to add these.

  1. A top level script called "create_test_suite" or something that will generate the four tests. This can have the option to generate baselines or not.

  2. Maybe a top level script that parses the "test_output" files. I can sort of do this manually with "cat"
    [dbailey@cheyenne4 ~/CICE_matt]> cat */test_output

PASS 10day_gx3_cheyenne_4x1.t00 build
PASS 10day_gx3_cheyenne_4x1.t00 run
PASS 10day_gx3_cheyenne_4x1.t00 compare
PASS 10day_gx3_cheyenne_4x1 build
PASS 10day_gx3_cheyenne_4x1 run
PASS 10day_gx3_cheyenne_4x1 generate
PASS annual_gx3_cheyenne_4x1.t00 build
PASS annual_gx3_cheyenne_4x1 build
PASS restart_gx3_cheyenne_4x1.t00 build
PASS restart_gx3_cheyenne_4x1.t00 2-day-run
PASS restart_gx3_cheyenne_4x1.t00 restart-run
PASS restart_gx3_cheyenne_4x1.t00 compare
PASS restart_gx3_cheyenne_4x1 build
PASS restart_gx3_cheyenne_4x1 2-day-run
PASS restart_gx3_cheyenne_4x1 restart-run
PASS restart_gx3_cheyenne_4x1 compare
PASS smoke_gx3_cheyenne_4x1.t00 build
FAIL smoke_gx3_cheyenne_4x1.t00 run
PASS smoke_gx3_cheyenne_4x1.t00 build
PASS smoke_gx3_cheyenne_4x1.t00 run
FAIL smoke_gx3_cheyenne_4x1.t00 compare
PASS smoke_gx3_cheyenne_4x1 build
PASS smoke_gx3_cheyenne_4x1 run
PASS smoke_gx3_cheyenne_4x1 generate

Note that the fails here were intentional. I changed ktherm to 1 in smoke_gx3_cheyenne_4x1.t00 and this caused the run to crash because it does not understand the initial files in this case. I then changed ktherm back to 2 and change mu_rdg to 4. This runs, but causes the compare to fail.

I am still testing the annual tests using the -bg and -bd flags. Note that these do nothing if you forget the -b flag. This I assume is the intended behavior.

Copy link
Contributor

@dabail10 dabail10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to approve the changes to this point. This is a really nice functionality. I did test the -bg/-bc flags and these seem to work as advertised. In my case, I did a regression test of 6.0.1 versus 6.0.0. I did not have a baseline for 6.0.0, so it failed. I couldn't find log information to this effect though. This is what I got:

./cice.run:

CICE rundir is /glade/scratch/dbailey/CICE_BASELINE/cicev6.0.1/smoke_gx3_cheyenne_4x1
CICE log file is cice.runlog.170627-132201
CICE run started : Tue Jun 27 13:22:01 MDT 2017
CICE run finished: Tue Jun 27 13:22:05 MDT 2017

CICE COMPLETED SUCCESSFULLY
done ./cice.run
Performing binary comparison between files:
baseline: /glade/scratch/dbailey/CICE_BASELINE/cicev6.0.0/smoke_gx3_cheyenne_4x1/restart/iced.1998-01-02-00000.nc
test: /glade/scratch/dbailey/CICE_BASELINE/cicev6.0.1/smoke_gx3_cheyenne_4x1/restart/iced.1998-01-02-00000.nc

I would have a liked a message saying the one baseline did not exist.

@mattdturner
Copy link
Contributor Author

Thanks for the suggestions. The testing suite is still evolving, so its a great time for suggestions!

There are plans to add functionality to run a single script and have it generate an array of tests, although I hadn't tought about including the option to have it also generate baseline datasets. I have also thought about parsing the test_output files for the suites (once developed) to show the PASS/FAIL status of each tests and give an overall score (something along the lines of "4 out of 5 tests passed").

You should not need to use the '-b' flag in order to use '-bc' or '-bg'. I am working on an update to create.case that removes this requirement.

Regarding your regression test failing, there definitely should be a clear message stating why the test failed. I will add this.

There is still a decent amount of development that needs to be done on the testing scripts, but if the pull request is merged then other users can start testing the new scripts. I can always create another pull request with the updates and new features.

@eclare108213 eclare108213 merged commit 44af118 into CICE-Consortium:master Jun 27, 2017
@apcraig
Copy link
Contributor

apcraig commented Jun 28, 2017 via email

@mattdturner mattdturner deleted the testing_suite branch December 5, 2017 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants