-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify 'make baseline' to continue running when a baseline fails. #3044
Conversation
…sh without running other tests.
I like this change. It can be useful on testbed machines to make it through the baseline process even if a test fails. What do you think of cherry-picking this related commit into the PR? It disables the SL test when SL isn't built. This fixes the problem of running this test on a GPU machine. SL doesn't yet run end-to-end on GPU: ambrad@ef95295 |
I suppose a counterargument is that someone might think the fact that Edit: I withdraw this comment. The output will show the error messages, so it will be clear there was a failure. |
I think it's good that each baseline test gets a chance to run -- it will cover more code, at a minimum. Plus, if a test is known to fail, like the SL test on GPU, that shouldn't stop other dev from moving forward. I'll add "Test failed" to the output to make sure it's clear. I agree with cherry-picking the above SL commit. |
What if we are making baselines on some machine thru ./create_test ... ? Does this new change mean case status will not show that some baseline was not generated? If there is nothing alarming in the case status, there is no reason to read log files closely. I guess i can test this by messing with some namelist to cause it crash... |
…' into pbosler/homme/make-baseline-crash-fix
Sample output:
|
Pete, re: Oksana's comment. You could move the |
With the new change, the error return is back:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oksana and Mark should have the final say, but I think this looks like a great change, and the error output seems good to me.
Add a fix for #3049 Can now use |
@oksanaguba , are you OK with this PR? |
@jgfouca not yet. I want to run this from cime and confirm it behaves as it should. this is a low priority feature, not a bug (I will change labels). |
It is fixing a bug so adding back the bug-fix label. |
@rljacob this PR fixes infrastructure issues; one is about nvcc-wrapper, maybe, it can be considered a bug; I only meant it does not fix bugs in homme as in dycore. |
@rljacob @oksanaguba: I originally added "bug fix" label because I thought it was a bug that one baseline failure would prevent all of the others from running. Depending on the order of the tests relative to the failure, this could be a large number. In my case, for example, a test we expected to fail, did fail; it was number 8 of 91 and its failure prevented the other 83 baselines from being run -- to me that seemed like a bug since we depend on those baselines to verify subsequent PRs. I've no problem changing labels if "bug fix" is reserved for the Homme source code or if another label is a better fit. |
@mt5555 please update your review if your changes have been made. |
I still need to test it,so, it is not ready. |
This PR is a month old already. Make time to test it. |
I am trying this now: I merged it and broke 1st run for baroCamMoist by meddling with nl. The 1st run for this set did not run, the rest ran (because they are omp runs with different namelists). Total output is
I moved to break a simpler test, baro2b. It did not run, movies folder is empty, there is an error, but the output at the end contains
So, for now the code completes baselines (the next test, baro2c, has movies), but it also shows PASS
I think it should show FAIL. |
@pbosler ^^ |
Since in this version case status returns pass even when some baselines failed (which can lead to broken baselines), could I suggest that the change about SL test is moved to a new PR and this PR is not being merged for now? |
@pbosler, @oksanaguba what is the status of this PR? |
@oksanaguba The reason the test passes is because pass/fail is determined by the executable's return value to the shell, e.g., the
Hence, in order to show failure, the test must crash, throw an exception, or otherwise not |
@oksanaguba what is the status of this PR? |
I did not have time to work on this. |
But what's the status going forward? |
Closing until this can be worked on. |
…x_testing_docs Automatically Merged using E3SM Pull Request AutoTester PR Title: Update EAMxx docs and improve help formatters PR Author: jgfouca PR LABELS: AT: AUTOMERGE, AT: Integrate Without Testing
Fixed an issue where a failed test would cause 'make baseline' to crash without running other tests.
Fixes #3049
[BFB]