-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add retry capability to create_test #2034
Conversation
@jgfouca - Thanks a lot for taking this on! My understanding of the requests from #1865 - particularly from @jedwards4b and @gold2718 - was that people wanted the ability for the test system to create totally new versions of the failed tests, rather than just rerunning out of the existing test directories. My understanding is that that isn't done here. I don't personally have strong feelings about that, but I'd like to let them comment on whether they feel that's important. |
@billsacks that would make for a much-more complicated PR. I'd prefer to stay with this approach unless people really don't want it. The logs will contain the old failures if they happened. |
I think this approach is fine, thanks. |
Okay, that's fine with me, then. I'll give @gold2718 a little time to reply in case he feels strongly. I have a few other questions: (1) Just for my own understanding: What are the implications of setting And two questions regarding testing:
(2) It seems like the full scripts_regression_tests should be run, without '--fast', right? (3) I appreciate that you have added an automated test of this feature. However, I don't love seeing test-specific code in the production code (it makes it harder to understand both the tests and the production code, increases the chances that test logic would accidentally pollute the production environment, etc.). Would it be feasible to do these tests by introducing new fake test classes that fail the first time (either in the build or run) but pass the second time? I don't think it would work to store state in the object, but could you do this with something like this?: class TESTBUILDFAIL_THEN_PASS:
def build_phase(self, sharedlib_only=False, model_only=False):
already_run_filename = os.path.join(self._case.get_value("CASEROOT"), "already_run")
if os.path.isfile(already_run_filename):
# Do whatever is needed to pass the test
else:
open(already_run_filename, 'a').close()
# Do whatever is needed to fail the test I'd imagine having three classes like that: (a) One that fails the build the first time, passes on subsequent runs (b) One that fails the run the first time, passes on subsequent runs (c) One that PASSES the run the first time, FAILS on subsequent runs. (a) and (b) reimplement what you already have, but note that (c) goes beyond the tests you already have, and I feel this would be a helpful addition to ensure that already-passed tests are NOT rerun. I don't feel super-strongly about the need to reimplement the tests this way, though I would like to see the addition of something like (c) using some mechanism. |
While I think having the new test version would be good, I do not feel it is a high priority item. My usual workaround is to move the failed test so that a new test is created. Can that work with test suites in this system? |
It means create_test will wait until all tests have gone through the batch system. This was necessary in order to support retrying tests that had RUN failures.
I think only a code-check and testing retry is necessary. There are plenty of tests in the --fast suite to make sure that normal create_test usage isn't broken.
Yeah, I didn't like that either. But it was sooo much simpler than doing it another way. |
@gold2718 , I'm not sure I understand the question. If you turn on retry, you won't have to move anything. |
Thanks for the replies, @jgfouca
I'm willing to hold my nose and accept that. The one thing I would like to see added before accepting this PR is a test of:
Since the |
@billsacks , that's a good idea. |
@jgfouca: Sorry, I do the move so I can compare the two runs. Sometimes, two failures still points at a system problem so I like to keep the first failed run around. It is not (IMHO) a requirement for CIME. |
@gold2718 , oh, I see. The current approach will preserve the log info from the original failed run, so that info is not lost. |
Well, if you insist on making my life easier, I suppose I can adapt. |
@billsacks , I added the test you asked for. |
Okay, I'm happy with this now. Thanks again, @jgfouca ! |
Plus new regression test to exercise this capability.
Test suite: scripts_regression_tests --fast
Test baseline:
Test namelist changes:
Test status: bit for bit
Fixes #1865
User interface changes?: Yes, new --retry option to create_test
Update gh-pages html (Y/N)?: N
Code review: @jedwards4b @billsacks