Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry capability to create_test #2034

Merged
merged 2 commits into from
Nov 9, 2017
Merged

Conversation

jgfouca
Copy link
Contributor

@jgfouca jgfouca commented Nov 7, 2017

Plus new regression test to exercise this capability.

Test suite: scripts_regression_tests --fast
Test baseline:
Test namelist changes:
Test status: bit for bit

Fixes #1865

User interface changes?: Yes, new --retry option to create_test

Update gh-pages html (Y/N)?: N

Code review: @jedwards4b @billsacks

@billsacks
Copy link
Member

@jgfouca - Thanks a lot for taking this on!

My understanding of the requests from #1865 - particularly from @jedwards4b and @gold2718 - was that people wanted the ability for the test system to create totally new versions of the failed tests, rather than just rerunning out of the existing test directories. My understanding is that that isn't done here. I don't personally have strong feelings about that, but I'd like to let them comment on whether they feel that's important.

@jgfouca
Copy link
Contributor Author

jgfouca commented Nov 8, 2017

@billsacks that would make for a much-more complicated PR. I'd prefer to stay with this approach unless people really don't want it. The logs will contain the old failures if they happened.

@jedwards4b
Copy link
Contributor

I think this approach is fine, thanks.

@billsacks
Copy link
Member

Okay, that's fine with me, then. I'll give @gold2718 a little time to reply in case he feels strongly.

I have a few other questions:

(1) Just for my own understanding: What are the implications of setting --wait? Does this simply mean that the create_test process doesn't exit until all batch jobs have completed (by doing something like periodically polling the jobs it has spawned)?

And two questions regarding testing:

Test suite: scripts_regression_tests --fast

(2) It seems like the full scripts_regression_tests should be run, without '--fast', right?

(3) I appreciate that you have added an automated test of this feature. However, I don't love seeing test-specific code in the production code (it makes it harder to understand both the tests and the production code, increases the chances that test logic would accidentally pollute the production environment, etc.). Would it be feasible to do these tests by introducing new fake test classes that fail the first time (either in the build or run) but pass the second time? I don't think it would work to store state in the object, but could you do this with something like this?:

class TESTBUILDFAIL_THEN_PASS:

    def build_phase(self, sharedlib_only=False, model_only=False):
        already_run_filename = os.path.join(self._case.get_value("CASEROOT"), "already_run")
        if os.path.isfile(already_run_filename):
            # Do whatever is needed to pass the test
        else:
            open(already_run_filename, 'a').close()
            # Do whatever is needed to fail the test

I'd imagine having three classes like that:

(a) One that fails the build the first time, passes on subsequent runs

(b) One that fails the run the first time, passes on subsequent runs

(c) One that PASSES the run the first time, FAILS on subsequent runs.

(a) and (b) reimplement what you already have, but note that (c) goes beyond the tests you already have, and I feel this would be a helpful addition to ensure that already-passed tests are NOT rerun.

I don't feel super-strongly about the need to reimplement the tests this way, though I would like to see the addition of something like (c) using some mechanism.

@gold2718
Copy link

gold2718 commented Nov 8, 2017

While I think having the new test version would be good, I do not feel it is a high priority item. My usual workaround is to move the failed test so that a new test is created. Can that work with test suites in this system?

@jgfouca
Copy link
Contributor Author

jgfouca commented Nov 8, 2017

@billsacks

What are the implications of setting --wait?

It means create_test will wait until all tests have gone through the batch system. This was necessary in order to support retrying tests that had RUN failures.

It seems like the full scripts_regression_tests should be run, without '--fast'

I think only a code-check and testing retry is necessary. There are plenty of tests in the --fast suite to make sure that normal create_test usage isn't broken.

However, I don't love seeing test-specific code in the production code

Yeah, I didn't like that either. But it was sooo much simpler than doing it another way.

@jgfouca
Copy link
Contributor Author

jgfouca commented Nov 8, 2017

@gold2718 , I'm not sure I understand the question. If you turn on retry, you won't have to move anything.

@billsacks
Copy link
Member

Thanks for the replies, @jgfouca

However, I don't love seeing test-specific code in the production code

Yeah, I didn't like that either. But it was sooo much simpler than doing it another way.

I'm willing to hold my nose and accept that.

The one thing I would like to see added before accepting this PR is a test of:

(c) One that PASSES the run the first time, FAILS on subsequent runs... to ensure that already-passed tests are NOT rerun

Since the --retry mechanism mostly just leverages the --use-existing mechanism, I don't care whether this test is done for --retry or as part of O_TestTestScheduler:test_c_use_existing.

@jgfouca
Copy link
Contributor Author

jgfouca commented Nov 8, 2017

@billsacks , that's a good idea.

@gold2718
Copy link

gold2718 commented Nov 8, 2017

@jgfouca: Sorry, I do the move so I can compare the two runs. Sometimes, two failures still points at a system problem so I like to keep the first failed run around. It is not (IMHO) a requirement for CIME.

@jgfouca
Copy link
Contributor Author

jgfouca commented Nov 8, 2017

@gold2718 , oh, I see. The current approach will preserve the log info from the original failed run, so that info is not lost.

@gold2718
Copy link

gold2718 commented Nov 8, 2017

Well, if you insist on making my life easier, I suppose I can adapt.

@jgfouca
Copy link
Contributor Author

jgfouca commented Nov 9, 2017

@billsacks , I added the test you asked for.

@billsacks
Copy link
Member

Okay, I'm happy with this now. Thanks again, @jgfouca !

@jedwards4b jedwards4b merged commit 426f61c into master Nov 9, 2017
@jgfouca jgfouca deleted the jgfouca/create_test_retry branch November 15, 2017 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants