Add retry capability to create_test #2034

jgfouca · 2017-11-07T21:03:36Z

Plus new regression test to exercise this capability.

Test suite: scripts_regression_tests --fast
Test baseline:
Test namelist changes:
Test status: bit for bit

Fixes #1865

User interface changes?: Yes, new --retry option to create_test

Update gh-pages html (Y/N)?: N

Code review: @jedwards4b @billsacks

billsacks · 2017-11-08T19:14:27Z

@jgfouca - Thanks a lot for taking this on!

My understanding of the requests from #1865 - particularly from @jedwards4b and @gold2718 - was that people wanted the ability for the test system to create totally new versions of the failed tests, rather than just rerunning out of the existing test directories. My understanding is that that isn't done here. I don't personally have strong feelings about that, but I'd like to let them comment on whether they feel that's important.

jgfouca · 2017-11-08T19:29:28Z

@billsacks that would make for a much-more complicated PR. I'd prefer to stay with this approach unless people really don't want it. The logs will contain the old failures if they happened.

jedwards4b · 2017-11-08T19:45:08Z

I think this approach is fine, thanks.

billsacks · 2017-11-08T20:59:26Z

Okay, that's fine with me, then. I'll give @gold2718 a little time to reply in case he feels strongly.

I have a few other questions:

(1) Just for my own understanding: What are the implications of setting --wait? Does this simply mean that the create_test process doesn't exit until all batch jobs have completed (by doing something like periodically polling the jobs it has spawned)?

And two questions regarding testing:

Test suite: scripts_regression_tests --fast

(2) It seems like the full scripts_regression_tests should be run, without '--fast', right?

(3) I appreciate that you have added an automated test of this feature. However, I don't love seeing test-specific code in the production code (it makes it harder to understand both the tests and the production code, increases the chances that test logic would accidentally pollute the production environment, etc.). Would it be feasible to do these tests by introducing new fake test classes that fail the first time (either in the build or run) but pass the second time? I don't think it would work to store state in the object, but could you do this with something like this?:

class TESTBUILDFAIL_THEN_PASS:

    def build_phase(self, sharedlib_only=False, model_only=False):
        already_run_filename = os.path.join(self._case.get_value("CASEROOT"), "already_run")
        if os.path.isfile(already_run_filename):
            # Do whatever is needed to pass the test
        else:
            open(already_run_filename, 'a').close()
            # Do whatever is needed to fail the test

I'd imagine having three classes like that:

(a) One that fails the build the first time, passes on subsequent runs

(b) One that fails the run the first time, passes on subsequent runs

(c) One that PASSES the run the first time, FAILS on subsequent runs.

(a) and (b) reimplement what you already have, but note that (c) goes beyond the tests you already have, and I feel this would be a helpful addition to ensure that already-passed tests are NOT rerun.

I don't feel super-strongly about the need to reimplement the tests this way, though I would like to see the addition of something like (c) using some mechanism.

gold2718 · 2017-11-08T21:51:14Z

While I think having the new test version would be good, I do not feel it is a high priority item. My usual workaround is to move the failed test so that a new test is created. Can that work with test suites in this system?

jgfouca · 2017-11-08T23:03:51Z

@billsacks

What are the implications of setting --wait?

It means create_test will wait until all tests have gone through the batch system. This was necessary in order to support retrying tests that had RUN failures.

It seems like the full scripts_regression_tests should be run, without '--fast'

I think only a code-check and testing retry is necessary. There are plenty of tests in the --fast suite to make sure that normal create_test usage isn't broken.

However, I don't love seeing test-specific code in the production code

Yeah, I didn't like that either. But it was sooo much simpler than doing it another way.

jgfouca · 2017-11-08T23:04:56Z

@gold2718 , I'm not sure I understand the question. If you turn on retry, you won't have to move anything.

billsacks · 2017-11-08T23:16:54Z

Thanks for the replies, @jgfouca

However, I don't love seeing test-specific code in the production code

Yeah, I didn't like that either. But it was sooo much simpler than doing it another way.

I'm willing to hold my nose and accept that.

The one thing I would like to see added before accepting this PR is a test of:

(c) One that PASSES the run the first time, FAILS on subsequent runs... to ensure that already-passed tests are NOT rerun

Since the --retry mechanism mostly just leverages the --use-existing mechanism, I don't care whether this test is done for --retry or as part of O_TestTestScheduler:test_c_use_existing.

jgfouca · 2017-11-08T23:29:47Z

@billsacks , that's a good idea.

gold2718 · 2017-11-08T23:40:41Z

@jgfouca: Sorry, I do the move so I can compare the two runs. Sometimes, two failures still points at a system problem so I like to keep the first failed run around. It is not (IMHO) a requirement for CIME.

jgfouca · 2017-11-08T23:42:09Z

@gold2718 , oh, I see. The current approach will preserve the log info from the original failed run, so that info is not lost.

gold2718 · 2017-11-08T23:43:04Z

Well, if you insist on making my life easier, I suppose I can adapt.

jgfouca · 2017-11-09T00:29:16Z

@billsacks , I added the test you asked for.

billsacks · 2017-11-09T02:18:08Z

Okay, I'm happy with this now. Thanks again, @jgfouca !

Add retry capability to create_test

6ffe2cc

jgfouca added tp: script tools ty: enhancement labels Nov 7, 2017

jgfouca assigned jedwards4b Nov 7, 2017

jgfouca requested review from jedwards4b and billsacks November 7, 2017 21:03

jgfouca added the in progress label Nov 7, 2017

jedwards4b approved these changes Nov 7, 2017

View reviewed changes

Add test to make sure successful tests are not re-run by use-existing

93856e6

billsacks approved these changes Nov 9, 2017

View reviewed changes

jedwards4b merged commit 426f61c into master Nov 9, 2017

jedwards4b removed the in progress label Nov 9, 2017

jgfouca deleted the jgfouca/create_test_retry branch November 15, 2017 20:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add retry capability to create_test #2034

Add retry capability to create_test #2034

jgfouca commented Nov 7, 2017

billsacks commented Nov 8, 2017

jgfouca commented Nov 8, 2017 •

edited

Loading

jedwards4b commented Nov 8, 2017

billsacks commented Nov 8, 2017

gold2718 commented Nov 8, 2017

jgfouca commented Nov 8, 2017

jgfouca commented Nov 8, 2017

billsacks commented Nov 8, 2017

jgfouca commented Nov 8, 2017

gold2718 commented Nov 8, 2017

jgfouca commented Nov 8, 2017

gold2718 commented Nov 8, 2017

jgfouca commented Nov 9, 2017

billsacks commented Nov 9, 2017

Add retry capability to create_test #2034

Add retry capability to create_test #2034

Conversation

jgfouca commented Nov 7, 2017

billsacks commented Nov 8, 2017

jgfouca commented Nov 8, 2017 • edited Loading

jedwards4b commented Nov 8, 2017

billsacks commented Nov 8, 2017

gold2718 commented Nov 8, 2017

jgfouca commented Nov 8, 2017

jgfouca commented Nov 8, 2017

billsacks commented Nov 8, 2017

jgfouca commented Nov 8, 2017

gold2718 commented Nov 8, 2017

jgfouca commented Nov 8, 2017

gold2718 commented Nov 8, 2017

jgfouca commented Nov 9, 2017

billsacks commented Nov 9, 2017

jgfouca commented Nov 8, 2017 •

edited

Loading