-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade history tools to python #413
Conversation
Full change list: 1) Remove old shell-based tools and update calls to use python versions 2) Move most functionality in old shell tools into hist_utils.py 3) Make thin python wrapper programs to access hist_utils from the command line 4) Do st_archive as LAST step in run_indv so that coupler_log_path is not needed 5) Fix ERR test 6) Update fake tests to create a fake hist file 7) Large refactor of bless_test_results
@@ -28,7 +28,7 @@ def run_phase(self): | |||
dout_s_root = self._case.get_value("DOUT_S_ROOT") | |||
rundir = self._case.get_value("RUNDIR") | |||
logger.info("staging files from archive %s" % dout_s_root) | |||
for item in glob.glob(os.path.join(dout_s_root, "rest", "*", "*")): | |||
for item in glob.glob(os.path.join(dout_s_root, "*", "hist", "*base")): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important change: I think ERR was broken before this PR, we just didn't notice it. The *base file did not exist in the "rest" subdirectory before or after this PR, so the "rest" vs. "base" comparison in ers_second_phase should not have worked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
I'd like to take a close look at this, but don't have time this week. If this can wait, I'd like to review it early next week. |
Have you done some system tests where you force a difference in the history files, to confirm that this is picked up and reported correctly? I'd like to make sure that has been tested both for the in-test comparisons and for baseline comparisons. |
I am ready to accept this PR. It removes and replaces the some of the last functionality still in sh. |
Before this PR is accepted - I think that Bill's question should be On Wed, Aug 17, 2016 at 7:23 AM, jedwards4b [email protected]
|
@billsacks , yes we have our TESTRUNDIFF fake test case that generates a DIFF and confirms that it was caught. I hope the new python-based infrastructure is more robust than the old; there is some evidence this it is because I caught some longstanding problems when I made the switch. |
|
||
CIME.utils.handle_standard_logging_options(args) | ||
|
||
return args.caseroot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You only return args.caseroot, yet the caller expects caseroot and baseline_dir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
cprnc_exe = case.get_value("CCSM_CPRNC") | ||
basename = os.path.basename(file1) | ||
stat, out, _ = run_cmd("%s %s %s 2>&1 | tee %s/%s.cprnc.out" % (cprnc_exe, file1, file2, rundir, basename)) | ||
return (stat == 0 and "IDENTICAL" in out, out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a new issue (it looks like this is just replicating the old behavior), so it doesn't necessarily need to be addressed in this PR, but if it isn't, I'd like to open a separate issue to have this fixed: It looks like the comparison passes as long as the string "IDENTICAL" is found anywhere in the cprnc output. In the event that there is a history field with IDENTICAL in its name, the comparison would always look successful. It would be more robust to check against the exact string - i.e., looking for "diff_test: the two files seem to be IDENTICAL".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'd rather check return code, but I don't think I can currently rely on cprnc to have a sane return code. I'll look for "seem to be IDENTICAL"
I would definitely believe that the new infrastructure is more robust than the old, and I like the way you have changed things to be robust to some additional cases. (I was going to make a request yesterday to add some more robustness in this respect, and then saw that you just added it - so thank you!) I also think that it's great to have this TESTRUNDIFF fake test case. However, I still feel like these unit tests need to be accompanied by some manual system tests for changes of this magnitude, and @mvertens also agreed in a conversation I just had with her. It sounds like @mvertens is in the process of doing some manual tests. Brainstorming what I think should be verified, at the request of @mvertens (it's possible that enough confidence can be gained from unit tests for some of these, but it's not clear to me how many of these are covered by unit tests right now... and it would be good to supplement unit tests with manual comparisons for at least the most critical of these like (2)).
|
In hist_utils.py _iter_model_file_substrs() is using drv_comp.get_valid_model_components |
I think I have a fix, testing now. |
Return non-zero if there were significant problems during bless
test_hists.sort() | ||
return test_hists | ||
|
||
def move(case, suffix): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this and other functions in this file - at least for public functions: Can you please add documentation? At the least, I'd like to have documentation on each of the arguments to each function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pfffft, documentation... hah.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
# Make sure user knows that some tests were not compareed | ||
success = True | ||
for broken_compare, reason in broken_compares: | ||
logging.warning("COMPARE FAILED FOR TEST: %s, reason %s" % (broken_compare, reason)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally I would want the output from this tool to look basically the same as the output put in the TestStatus file - although just for the COMPARE_baseline phase. That is, I'd want to see something like:
PASS SOMETEST BASELINE_COMPARE
or
FAIL SOMETEST BASELINE_COMPARE
That's approximately what happened with the earlier version of these tools.
This comment applies both to the main output from the comparison, and also to various other extraneous information that is printed (or not). So, for example, I'd prefer not to have this output for every test:
logging.info("Comparing results for test: %s, most recent result: %s" % (test_name, overall_result))
-- because I'd like to be able to parse this output in the same way that I parse the normal TestStatus or cs.status output.
However, I agree that it does make sense to report on failed tests here. One possibility would be to print out the testStatus output for each test that failed.
Actually, maybe you could accomplish all of this cleanly by making more use of the TestStatus class here: Rather than having this function be responsible for output, it could instead just update the ts object with the results of the BASELINE_COMPARE phase (I know, there isn't a separate BASELINE_COMPARE phase yet... but the idea still applies). Then you can output the updated TestStatus results.
One complication with this is that I don't think you want to change the TestStatus file in the test itself (it seems like a Bad Idea to have some external script change the TestStatus file). In this usage, we instead would want the TestStatus to be written for each test via the logger. I think this could be accomplished by adding a str method to TestStatus. This would contain the bulk of what is currently TestStatus.flush, and then TestStatus.flush could simply write out the results of str(self).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There actually is a BASELINE_COMPARE phase, it would be called "COMPARE_baseline" though.
I agree we do not want this script to change TestStatus files. Leveraging the TestStatus class for output is an interesting idea but I'm starting to get a bit overwhelmed by the number of changes being requested. Is there anyway we could punt this thought to a later ticket/PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm okay with that. Do you mean punting on the whole reworking of the output format? (I'm okay with that; I just want to make sure I open an appropriate ticket.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the ticket should be to rework the output of compare/bless test results and to change TestStatus class to be able to write to logging.
def compare_history(testcase_dir_for_test, baseline_name): | ||
############################################################################### | ||
with Case(testcase_dir_for_test) as case: | ||
baseline_full_dir = os.path.join(case.get_value("BASELINE_ROOT"), case.get_value("COMPILER"), baseline_name, case.get_value("CASEBASEID")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the CESM workflow, the COMPILER doesn't enter in to the baseline path. At least, it didn't with cime4, and still doesn't seem to in the latest baselines (cesm2_0_alpha01e, which uses some version of cime5).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be OK if the CESM workflow was modified in this way? It does make it much easier to maintain multiple sets of baselines for multiple compilers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't answer that myself. It would affect many people in CSEG, so I imagine needs some broader buy-in. Personally, I can't think of any major objections, but I also don't see the advantage, since our test names have the compiler in them. So it seems to work just fine to have a baseline directory that contains, say, SMS.f10_f10.ICLM45.yellowstone_intel, SMS.f10_f10.ICLM45.yellowstone_pgi and SMS.f10_f10.ICLM45.yellowstone_gnu.
I'll defer to @mvertens , @jedwards4b and other CSEG members on this question.
I do see that there are other places in the scripts that assume the ACME organization in this respect: at least bless_test_results.py and maybe test_scheduler.py and scripts_regression_tests.py (I just did a simple grep for grep -i 'path.join.*compiler').
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at my current baselines, I see directory names like "ERS.f09_g16.I1850CLM45CN.melvin_gnu", so the compiler is clearly included in the name and therefore runs with different compilers would not conflict; so maybe adding compiler to the path is not necessary. This was a change we made in ACME a long time ago back when we were working with an ancient version of CIME, so it's possible it's not necessary anymore. For now, I can do an if/else based on model here and in bless_test_results. I don't think anywhere else is impacted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have read your post more closely. Since this path system is built into test_scheduler, then CESM must be using it already. I guess they like it since I haven't seen any complaints ;)
def parse_command_line(args, description): | ||
############################################################################### | ||
parser = argparse.ArgumentParser( | ||
usage="""\n%s [-n] [-r <TESTROOT>] [-b <BRANCH>] [-c <COMPILER>] [<TEST> <TEST> ...] [--verbose] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This usage message is wrong: it's missing -t, and I don't see any documentation / meaning of the -n option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, fix incoming.
I believe all concerns have been addressed. I'm going to merge. |
Upgrade history tools to python Full change list: 1) Remove old shell-based tools and update calls to use python versions 2) Move most functionality in old shell tools into hist_utils.py 3) Make thin python wrapper programs to access hist_utils from the command line 4) Do st_archive as LAST step in run_indv so that coupler_log_path is not needed 5) Fix ERR test 6) Update fake tests to create a fake hist file 7) Large refactor of bless_test_results 8) Add new compare_test_results, counterpart to bless_test_results. Test suite: scripts_regression_tests Test baseline: Test namelist changes: Test status: bit for bit Fixes #332 User interface changes?: Significant changes to compare_* scripts Code review: @jedwards4b @mvertens @billsacks @gold2718 * jgfouca/hist_tools_conv_to_python: Make comparison matchups more robust Fix user docs for compare_test_results improved reporting of baseline file count mismatch correct location of debug log in help message, store baselines with original filename Add usage example for typical CESM workflow Get rid of pdb trace that I believe was mistakenly left in Make a very obvious simplification to code Remove unneeded global Update hist infra to better-support user-chosen baseline_root minor help string fix More fixes from review fix issue in component_generate_baseline, get only most recent files Remove last cwd default args Remove dangerous cwd defaults, add documentation to hist_utils public API Add new compare_test_results, counterpart to bless_test_results bless_test_results: Need sane error code remove check for None fixes in hist_utils Fix mistake caught by code review Upgrade history tools to python Conflicts: utils/python/CIME/check_lockedfiles.py utils/python/CIME/test_status.py
Are we merging our own pull requests now? |
@gold2718 Once the OK has been given by the reviewers, does it matter who clicks the button? |
@gold2718 To clarify, there's no such thing as an integrator role in CIME like there is in ACME. |
Understood, it's just that there was no time to review your significant change to the compare code before the merge. |
Ah, sorry about that. You can click here to see the changes: 164cfd9 I'll be sure to implement whatever changes you'd like in another PR. |
Full change list:
Test suite: scripts_regression_tests
Test baseline:
Test namelist changes:
Test status: bit for bit
Fixes #332
User interface changes?: Significant changes to compare_* scripts
Code review: @jedwards4b @mvertens @billsacks @gold2718