Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need better tests of system tests #1640

Closed
rljacob opened this issue Jun 5, 2017 · 10 comments
Closed

Need better tests of system tests #1640

rljacob opened this issue Jun 5, 2017 · 10 comments

Comments

@rljacob
Copy link
Member

rljacob commented Jun 5, 2017

The discussion in issue #1638 shows that its possible to put changes on master that will break a test without detection, in this case the widely used ERS test.

We need better "tests of system tests". We currently test the ERS.py test by deploying them with a couple of A cases in cime_developer. But if the test itself fails to exercise what its supposed to, while still running, then it will always seem to "pass'. If the models under test with ERS actually break restart, it could be weeks/months before anyone notices.

What we need is a case which by design will fail an ERS test. ERS detecting "fail" is then a pass for that test. May need that for other system tests as well.

@billsacks
Copy link
Member

I think the problem was ERP, not ERS.

See also the discussion in #290 - which I guess shouldn't have been closed based on the fact that at least the ERP test is still prone to getting broken in this way. I think I closed it because we couldn't come up with an easy way to test the system tests in this way.

Much of my motivation for creating the SystemTestsCompareTwo infrastructure was to allow us to test that infrastructure extensively (via unit tests) and then build some simple tests on top of that. It feels to me that the tests that most often break are ones that have two different configurations but are not handled via SystemTestsCompareTwo - so require some resetting between each build/run. However, we never came up with a way to handle tests that require a restart using that framework (see #448 and #824 ). This leaves ERP as one of the most error-prone tests, which is unfortunate because it is also one of the most widely-used tests.

@rljacob
Copy link
Member Author

rljacob commented Jun 6, 2017

Can testmods include source code mods?

@billsacks
Copy link
Member

Can testmods include source code mods?

Yes. But, as discussed in #290, this carries a maintenance cost, since the SourceMods will need to be updated whenever the production code is updated. (As an aside, we've had some discussions about storing SourceMods as diffs rather than the full file, which could then be applied via a patch command. That would help somewhat - but would require developing some infrastructure to accomplish it.)

@rljacob
Copy link
Member Author

rljacob commented Jun 8, 2017

I think it might be possible to generate an ERP fail with CIME code: Random numbers in the o-a flux. Since those are also part of CIME, it can all get updated at once.

@billsacks
Copy link
Member

billsacks commented Jun 8, 2017

I think it might be possible to generate an ERP fail with CIME code: Random numbers in the o-a flux. Since those are also part of CIME, it can all get updated at once.

If I understand your suggestion, that would generate a fail in any test that compares two runs. This would test some aspects of the system test infrastructure, and so could be valuable in some respects. However, this wouldn't test the ERP-specific logic - for example, the bug that was introduced recently in the ERP test would still have gone undetected even with this in place. (This could still be worth doing; I'm just trying to point out what would & wouldn't be covered by this.)

@jgfouca jgfouca self-assigned this Jul 19, 2017
jgfouca added a commit that referenced this issue Aug 23, 2017
compare two test case names match

This PR makes the case2 casename in system_tests_compare_two match the case one casename and nests both the case directory and the output directories inside that of case1. This is a precursor for a PR to address issue #1640

Addresses #1640

Test suite: scripts_regression_tests.py
Test baseline:
Test namelist changes:
Test status: bit for bit
Fixes

User interface changes?:

Update gh-pages html (Y/N)?: N

Code review:
jgfouca added a commit that referenced this issue Oct 17, 2017
…(PR #1640)

Update netcdf for sandia machines.

This is necessary to support the cdf5 file type which is needed
for internal CIME testing.

Machines that do not use PIO in pnetcdf mode do not need to need
to worry about updating netcdf. If they do, netcdf 4.4 is needed.

[BFB]

* jgfouca/update_netcdf_for_sandia_machines:
  Point to new cprnc build
  Update netcdf for sandia machines.
@jgfouca
Copy link
Contributor

jgfouca commented Nov 13, 2017

@billsacks , I'm thinking this task may not be necessary now that nearly all the key test types use the compare-two infrastructure. Compare-two is well tested, does it include tests for failed comparisons? If so, I'm tempted to close this.

@billsacks
Copy link
Member

@jgfouca I agree that this may not be necessary. Have you also checked with @rljacob since he's the one who opened this issue?

However, on this point:

Compare-two is well tested, does it include tests for failed comparisons?

All it's testing right now is that _component_compare_test is actually called at the right time. I was assuming that the logic in _component_compare_test was tested elsewhere.

However: Based on your question, I have decided to go one level further in this testing. I just made some changes (which I'll submit in a PR tomorrow, once scripts_regression_tests finishes) that add unit tests to cover _component_compare_test (success or failure) from system_tests_compare_two.

These new tests will cover the relevant logic in system_tests_compare_two and system_tests_common. However, they will NOT cover the logic in hist_utils: I'm stubbing out the actual call into hist_utils, under the assumption that this is - or should be - covered by other tests.

But I'm not sure if hist_utils is actually covered sufficiently by unit tests. If it isn't, it should be, and maybe that could be a specific issue opened to replace this one. There are some test files in tools/cprnc/test_inputs that could possibly be used for this purpose (pre-staging these files into the appropriate location, or (maybe better) generating files like this on the fly, then calling one of the routines in hist_utils.py).

@rljacob
Copy link
Member Author

rljacob commented Nov 14, 2017

Maybe this can be replaced with a bunch of more specific issues. We still don't have a test that could have caught the fail discussed in #1638 : A change to the test's python code turned it in to a test that would always pass. A tests that forces a fail would have noticed that.

@billsacks
Copy link
Member

@rljacob While I generally agree with the sentiment of making sure our tests are tested, in particular to make sure they pick up fails: I also feel that, now that most of our non-trivial tests use system_tests_compare_two, these sorts of problems are less likely to occur. (In particular, the error in #1638 involves code that no longer exists with the conversion of erp to system_tests_compare_two.) I feel that the best way to handle this is to have near-100% unit test coverage of the relevant code. I feel good about the unit test coverage of system_tests_compare_two itself, but I suspect we still need additional unit tests of some of the things it relies on (like hist_utils).

@jgfouca
Copy link
Contributor

jgfouca commented Nov 14, 2017

I agree with @billsacks . I think hist_utils is pretty well-covered in scripts_regression_tests. We do lots of tests that do hist comparisons checking for success and we also do a few that verify failure (diff).

jgfouca added a commit that referenced this issue Nov 14, 2017
…ests

Add unit tests of compare_two's handling of pass/fail in comparison

Previously, I had figured that it was sufficient to ensure that
_component_compare_test was actually being called, figuring that the
tests of _component_compare_test belong elsewhere. But, since this is
such a critical aspect of the system_test_compare_two infrastructure,
I'm adding some tests covering _component_compare_test here.

These new tests cover the logic related to in-test comparisons in
system_tests_compare_two and system_tests_common. However, they will
NOT cover the logic in hist_utils: I'm stubbing out the actual call into
hist_utils, under the assumption that this is - or should be - covered
by other tests. But I'm not sure if hist_utils is actually covered
sufficiently by unit tests. If it isn't, it should be.

This also required some minor refactoring of system_tests_common in
order to allow stubbing out the call into hist_utils. (This refactoring
would not have been needed if we allowed use of the mock module: see
#2056.)

Test suite: scripts_regression_tests on yellowstone
Passes other than the issues documented in #2057 and #2059

Also ensured that comparison failures are still reported correctly by
running a REP test that forces a comparison failure (due to missing
cprnc).
Test baseline: n/a
Test namelist changes: none
Test status: bit for bit

Fixes: Partially addresses #1640

User interface changes?: none

Update gh-pages html (Y/N)?: N

Code review:
jgfouca added a commit that referenced this issue Feb 23, 2018
…(PR #1640)

Update netcdf for sandia machines.

This is necessary to support the cdf5 file type which is needed
for internal CIME testing.

Machines that do not use PIO in pnetcdf mode do not need to need
to worry about updating netcdf. If they do, netcdf 4.4 is needed.

[BFB]

* jgfouca/update_netcdf_for_sandia_machines:
  Point to new cprnc build
  Update netcdf for sandia machines.
jgfouca added a commit that referenced this issue Mar 13, 2018
…(PR #1640)

Update netcdf for sandia machines.

This is necessary to support the cdf5 file type which is needed
for internal CIME testing.

Machines that do not use PIO in pnetcdf mode do not need to need
to worry about updating netcdf. If they do, netcdf 4.4 is needed.

[BFB]

* jgfouca/update_netcdf_for_sandia_machines:
  Point to new cprnc build
  Update netcdf for sandia machines.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants