Upgrade history tools to python #413

jgfouca · 2016-08-16T23:05:42Z

Full change list:

Remove old shell-based tools and update calls to use python versions
Move most functionality in old shell tools into hist_utils.py
Make thin python wrapper programs to access hist_utils from the command line
Do st_archive as LAST step in run_indv so that coupler_log_path is not needed
Fix ERR test
Update fake tests to create a fake hist file
Large refactor of bless_test_results
Add new compare_test_results, counterpart to bless_test_results.

Test suite: scripts_regression_tests
Test baseline:
Test namelist changes:
Test status: bit for bit

Fixes #332

User interface changes?: Significant changes to compare_* scripts

Code review: @jedwards4b @mvertens @billsacks @gold2718

Full change list: 1) Remove old shell-based tools and update calls to use python versions 2) Move most functionality in old shell tools into hist_utils.py 3) Make thin python wrapper programs to access hist_utils from the command line 4) Do st_archive as LAST step in run_indv so that coupler_log_path is not needed 5) Fix ERR test 6) Update fake tests to create a fake hist file 7) Large refactor of bless_test_results

jgfouca · 2016-08-16T23:09:54Z

utils/python/CIME/SystemTests/err.py

@@ -28,7 +28,7 @@ def run_phase(self):
            dout_s_root = self._case.get_value("DOUT_S_ROOT")
            rundir = self._case.get_value("RUNDIR")
            logger.info("staging files from archive %s" % dout_s_root)
-            for item in glob.glob(os.path.join(dout_s_root, "rest", "*", "*")):
+            for item in glob.glob(os.path.join(dout_s_root, "*", "hist", "*base")):


Important change: I think ERR was broken before this PR, we just didn't notice it. The *base file did not exist in the "rest" subdirectory before or after this PR, so the "rest" vs. "base" comparison in ers_second_phase should not have worked.

billsacks · 2016-08-17T03:00:30Z

I'd like to take a close look at this, but don't have time this week. If this can wait, I'd like to review it early next week.

billsacks · 2016-08-17T03:46:11Z

Have you done some system tests where you force a difference in the history files, to confirm that this is picked up and reported correctly? I'd like to make sure that has been tested both for the in-test comparisons and for baseline comparisons.

jedwards4b · 2016-08-17T13:23:42Z

I am ready to accept this PR. It removes and replaces the some of the last functionality still in sh.

mvertens · 2016-08-17T14:20:44Z

Before this PR is accepted - I think that Bill's question should be
answered. I feel that these type of tests should be run to confirm the
robustness of the PR - if they have not been run.

On Wed, Aug 17, 2016 at 7:23 AM, jedwards4b [email protected]
wrote:

I am ready to accept this PR. It removes and replaces the some of the last
functionality still in sh.

—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
#413 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AHlxE5giIpGh6CeCWTm2mjpC8oIKlBbUks5qgwtfgaJpZM4Jl8Do
.

jgfouca · 2016-08-17T16:20:51Z

@billsacks , yes we have our TESTRUNDIFF fake test case that generates a DIFF and confirms that it was caught. I hope the new python-based infrastructure is more robust than the old; there is some evidence this it is because I caught some longstanding problems when I made the switch.

billsacks · 2016-08-17T19:35:12Z

scripts/Tools/component_compare_baseline

+
+    CIME.utils.handle_standard_logging_options(args)
+
+    return args.caseroot


You only return args.caseroot, yet the caller expects caseroot and baseline_dir

billsacks · 2016-08-17T20:47:39Z

utils/python/CIME/hist_utils.py

+    cprnc_exe = case.get_value("CCSM_CPRNC")
+    basename = os.path.basename(file1)
+    stat, out, _ = run_cmd("%s %s %s 2>&1 | tee %s/%s.cprnc.out" % (cprnc_exe, file1, file2, rundir, basename))
+    return (stat == 0 and "IDENTICAL" in out, out)


This is not a new issue (it looks like this is just replicating the old behavior), so it doesn't necessarily need to be addressed in this PR, but if it isn't, I'd like to open a separate issue to have this fixed: It looks like the comparison passes as long as the string "IDENTICAL" is found anywhere in the cprnc output. In the event that there is a history field with IDENTICAL in its name, the comparison would always look successful. It would be more robust to check against the exact string - i.e., looking for "diff_test: the two files seem to be IDENTICAL".

Yeah, I'd rather check return code, but I don't think I can currently rely on cprnc to have a sane return code. I'll look for "seem to be IDENTICAL"

billsacks · 2016-08-17T21:08:45Z

@billsacks , yes we have our TESTRUNDIFF fake test case that generates a DIFF and confirms that it was caught. I hope the new python-based infrastructure is more robust than the old; there is some evidence this it is because I caught some longstanding problems when I made the switch.

I would definitely believe that the new infrastructure is more robust than the old, and I like the way you have changed things to be robust to some additional cases. (I was going to make a request yesterday to add some more robustness in this respect, and then saw that you just added it - so thank you!) I also think that it's great to have this TESTRUNDIFF fake test case.

However, I still feel like these unit tests need to be accompanied by some manual system tests for changes of this magnitude, and @mvertens also agreed in a conversation I just had with her. It sounds like @mvertens is in the process of doing some manual tests.

Brainstorming what I think should be verified, at the request of @mvertens (it's possible that enough confidence can be gained from unit tests for some of these, but it's not clear to me how many of these are covered by unit tests right now... and it would be good to supplement unit tests with manual comparisons for at least the most critical of these like (2)).

Passing in-test comparison
a. Ensure that all history files were compared in a fully-coupled case; it's probably sufficient to ensure that all of the expected cprnc files were created.
Failing in-test comparison: can be verified with an ERS test using the diffs I gave here: ERP test does not fail when it should #295
Baselines generated correctly: ensure that all expected files show up in the baseline directory
Results reported correctly when a file exists in the baseline directory but not in the test case, for some / all of the history files
a. This can be tested by adding some fake file to the baseline directory
b. I can't remember how this was reported in the past
Results reported correctly when a file exists in the test case but not in the baseline directory, for some / all of the history files
a. i.e., baseline directory exists, but some / all history files are missing from it
Results reported correctly when the baseline directory is missing entirely
a. e.g., if you're comparing some test ERS.f09_g16.X.yellowstone_intel in baseline tag cesm2_0_alpha01e, I'm imagining that the cesm2_0_alpha01e directory exists, but there is no ERS.f09_g16.X.yellowstone_intel subdirectory
Some history files identical but some differ
a. From examining the code, I'm pretty confident this works correctly, and it's probably hard to force with a manual test, so I'm okay skipping it. (But it would be a good target for a unit test if there isn't already one like that.)
What about the various _N2, NCK, etc. cases that were handled specially in the old component_compare_test.sh? I can't say I understand the old logic well, so it's hard for me to tell what is needed in the new code to reproduce the necessary functionality.

jedwards4b · 2016-08-17T21:17:45Z

In hist_utils.py _iter_model_file_substrs() is using drv_comp.get_valid_model_components
which is a list of component class names (atm, lnd, etc) but it should use the list
of specific component names (cam, clm, etc)

jedwards4b · 2016-08-17T21:23:39Z

I think I have a fix, testing now.

Return non-zero if there were significant problems during bless

billsacks · 2016-08-18T13:37:27Z

utils/python/CIME/hist_utils.py

+    test_hists.sort()
+    return test_hists
+
+def move(case, suffix):


For this and other functions in this file - at least for public functions: Can you please add documentation? At the least, I'd like to have documentation on each of the arguments to each function.

Pfffft, documentation... hah.

billsacks · 2016-08-18T20:16:48Z

utils/python/CIME/compare_test_results.py

+    # Make sure user knows that some tests were not compareed
+    success = True
+    for broken_compare, reason in broken_compares:
+        logging.warning("COMPARE FAILED FOR TEST: %s, reason %s" % (broken_compare, reason))


Ideally I would want the output from this tool to look basically the same as the output put in the TestStatus file - although just for the COMPARE_baseline phase. That is, I'd want to see something like:

PASS SOMETEST BASELINE_COMPARE

or

FAIL SOMETEST BASELINE_COMPARE

That's approximately what happened with the earlier version of these tools.

This comment applies both to the main output from the comparison, and also to various other extraneous information that is printed (or not). So, for example, I'd prefer not to have this output for every test:

logging.info("Comparing results for test: %s, most recent result: %s" % (test_name, overall_result))

-- because I'd like to be able to parse this output in the same way that I parse the normal TestStatus or cs.status output.

However, I agree that it does make sense to report on failed tests here. One possibility would be to print out the testStatus output for each test that failed.

Actually, maybe you could accomplish all of this cleanly by making more use of the TestStatus class here: Rather than having this function be responsible for output, it could instead just update the ts object with the results of the BASELINE_COMPARE phase (I know, there isn't a separate BASELINE_COMPARE phase yet... but the idea still applies). Then you can output the updated TestStatus results.

One complication with this is that I don't think you want to change the TestStatus file in the test itself (it seems like a Bad Idea to have some external script change the TestStatus file). In this usage, we instead would want the TestStatus to be written for each test via the logger. I think this could be accomplished by adding a str method to TestStatus. This would contain the bulk of what is currently TestStatus.flush, and then TestStatus.flush could simply write out the results of str(self).

There actually is a BASELINE_COMPARE phase, it would be called "COMPARE_baseline" though.

I agree we do not want this script to change TestStatus files. Leveraging the TestStatus class for output is an interesting idea but I'm starting to get a bit overwhelmed by the number of changes being requested. Is there anyway we could punt this thought to a later ticket/PR?

Yes, I'm okay with that. Do you mean punting on the whole reworking of the output format? (I'm okay with that; I just want to make sure I open an appropriate ticket.)

Yes, the ticket should be to rework the output of compare/bless test results and to change TestStatus class to be able to write to logging.

billsacks · 2016-08-18T21:42:39Z

utils/python/CIME/compare_test_results.py

+def compare_history(testcase_dir_for_test, baseline_name):
+###############################################################################
+    with Case(testcase_dir_for_test) as case:
+        baseline_full_dir = os.path.join(case.get_value("BASELINE_ROOT"), case.get_value("COMPILER"), baseline_name, case.get_value("CASEBASEID"))


In the CESM workflow, the COMPILER doesn't enter in to the baseline path. At least, it didn't with cime4, and still doesn't seem to in the latest baselines (cesm2_0_alpha01e, which uses some version of cime5).

Would it be OK if the CESM workflow was modified in this way? It does make it much easier to maintain multiple sets of baselines for multiple compilers.

I can't answer that myself. It would affect many people in CSEG, so I imagine needs some broader buy-in. Personally, I can't think of any major objections, but I also don't see the advantage, since our test names have the compiler in them. So it seems to work just fine to have a baseline directory that contains, say, SMS.f10_f10.ICLM45.yellowstone_intel, SMS.f10_f10.ICLM45.yellowstone_pgi and SMS.f10_f10.ICLM45.yellowstone_gnu.

I'll defer to @mvertens , @jedwards4b and other CSEG members on this question.

I do see that there are other places in the scripts that assume the ACME organization in this respect: at least bless_test_results.py and maybe test_scheduler.py and scripts_regression_tests.py (I just did a simple grep for grep -i 'path.join.*compiler').

Looking at my current baselines, I see directory names like "ERS.f09_g16.I1850CLM45CN.melvin_gnu", so the compiler is clearly included in the name and therefore runs with different compilers would not conflict; so maybe adding compiler to the path is not necessary. This was a change we made in ACME a long time ago back when we were working with an ancient version of CIME, so it's possible it's not necessary anymore. For now, I can do an if/else based on model here and in bless_test_results. I don't think anywhere else is impacted.

Should have read your post more closely. Since this path system is built into test_scheduler, then CESM must be using it already. I guess they like it since I haven't seen any complaints ;)

billsacks · 2016-08-19T01:30:58Z

scripts/Tools/compare_test_results

+def parse_command_line(args, description):
+###############################################################################
+    parser = argparse.ArgumentParser(
+usage="""\n%s [-n] [-r <TESTROOT>] [-b <BRANCH>] [-c <COMPILER>] [<TEST> <TEST> ...] [--verbose]


This usage message is wrong: it's missing -t, and I don't see any documentation / meaning of the -n option

You're right, fix incoming.

…riginal filename

jgfouca · 2016-08-19T21:53:20Z

I believe all concerns have been addressed. I'm going to merge.

@jedwards4b

Upgrade history tools to python Full change list: 1) Remove old shell-based tools and update calls to use python versions 2) Move most functionality in old shell tools into hist_utils.py 3) Make thin python wrapper programs to access hist_utils from the command line 4) Do st_archive as LAST step in run_indv so that coupler_log_path is not needed 5) Fix ERR test 6) Update fake tests to create a fake hist file 7) Large refactor of bless_test_results 8) Add new compare_test_results, counterpart to bless_test_results. Test suite: scripts_regression_tests Test baseline: Test namelist changes: Test status: bit for bit Fixes #332 User interface changes?: Significant changes to compare_* scripts Code review: @jedwards4b @mvertens @billsacks @gold2718 * jgfouca/hist_tools_conv_to_python: Make comparison matchups more robust Fix user docs for compare_test_results improved reporting of baseline file count mismatch correct location of debug log in help message, store baselines with original filename Add usage example for typical CESM workflow Get rid of pdb trace that I believe was mistakenly left in Make a very obvious simplification to code Remove unneeded global Update hist infra to better-support user-chosen baseline_root minor help string fix More fixes from review fix issue in component_generate_baseline, get only most recent files Remove last cwd default args Remove dangerous cwd defaults, add documentation to hist_utils public API Add new compare_test_results, counterpart to bless_test_results bless_test_results: Need sane error code remove check for None fixes in hist_utils Fix mistake caught by code review Upgrade history tools to python Conflicts: utils/python/CIME/check_lockedfiles.py utils/python/CIME/test_status.py

gold2718 · 2016-08-20T00:07:46Z

Are we merging our own pull requests now?

jgfouca · 2016-08-21T21:19:42Z

@gold2718 Once the OK has been given by the reviewers, does it matter who clicks the button?

jgfouca · 2016-08-21T21:23:56Z

@gold2718 To clarify, there's no such thing as an integrator role in CIME like there is in ACME.

gold2718 · 2016-08-21T23:21:26Z

Understood, it's just that there was no time to review your significant change to the compare code before the merge.

jgfouca · 2016-08-21T23:34:04Z

Ah, sorry about that. You can click here to see the changes: 164cfd9

I'll be sure to implement whatever changes you'd like in another PR.

jgfouca assigned jedwards4b, billsacks and mvertens Aug 16, 2016

jgfouca added the in progress label Aug 16, 2016

jgfouca reviewed Aug 16, 2016
View reviewed changes

billsacks mentioned this pull request Aug 17, 2016

Does the ERR test truly cover short term archiving? #416

Closed

billsacks reviewed Aug 17, 2016
View reviewed changes

Fix mistake caught by code review

cc502e4

rljacob added this to the CIME5.1.0 milestone Aug 17, 2016

billsacks reviewed Aug 17, 2016
View reviewed changes

jedwards4b and others added 3 commits August 17, 2016 16:46

fixes in hist_utils

39a778b

remove check for None

30a48da

bless_test_results: Need sane error code

26345c0

Return non-zero if there were significant problems during bless

billsacks reviewed Aug 18, 2016
View reviewed changes

jgfouca added 3 commits August 18, 2016 14:55

Update hist infra to better-support user-chosen baseline_root

51afde0

Remove unneeded global

51e181b

Make a very obvious simplification to code

df31225

billsacks reviewed Aug 18, 2016
View reviewed changes

Get rid of pdb trace that I believe was mistakenly left in

9fac682

billsacks mentioned this pull request Aug 19, 2016

Rework output of compare/bless_test_results to look more like TestStatus output #429

Closed

billsacks reviewed Aug 19, 2016
View reviewed changes

billsacks and others added 5 commits August 18, 2016 19:34

Add usage example for typical CESM workflow

4e5facf

correct location of debug log in help message, store baselines with o…

96c3c18

…riginal filename

improved reporting of baseline file count mismatch

f6ea4a1

Fix user docs for compare_test_results

6fb0220

Make comparison matchups more robust

164cfd9

jgfouca merged commit 164cfd9 into master Aug 19, 2016

jgfouca removed the in progress label Aug 19, 2016

jgfouca deleted the jgfouca/hist_tools_conv_to_python branch August 19, 2016 21:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade history tools to python #413

Upgrade history tools to python #413

jgfouca commented Aug 16, 2016 •

edited

Loading

jgfouca Aug 16, 2016

jedwards4b Aug 17, 2016

billsacks commented Aug 17, 2016

billsacks commented Aug 17, 2016

jedwards4b commented Aug 17, 2016

mvertens commented Aug 17, 2016

jgfouca commented Aug 17, 2016

billsacks Aug 17, 2016

jgfouca Aug 17, 2016

billsacks Aug 17, 2016

jgfouca Aug 17, 2016

billsacks commented Aug 17, 2016

jedwards4b commented Aug 17, 2016

jedwards4b commented Aug 17, 2016

billsacks Aug 18, 2016

jgfouca Aug 18, 2016

jgfouca Aug 18, 2016

billsacks Aug 18, 2016 •

edited

Loading

jgfouca Aug 18, 2016

billsacks Aug 18, 2016

jgfouca Aug 18, 2016 •

edited

Loading

billsacks Aug 18, 2016

jgfouca Aug 18, 2016

billsacks Aug 19, 2016

jgfouca Aug 19, 2016

jgfouca Aug 19, 2016

billsacks Aug 19, 2016

jgfouca Aug 19, 2016

jgfouca commented Aug 19, 2016

gold2718 commented Aug 20, 2016

jgfouca commented Aug 21, 2016

jgfouca commented Aug 21, 2016

gold2718 commented Aug 21, 2016

jgfouca commented Aug 21, 2016


		CIME.utils.handle_standard_logging_options(args)

		return args.caseroot

Upgrade history tools to python #413

Upgrade history tools to python #413

Conversation

jgfouca commented Aug 16, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

billsacks commented Aug 17, 2016

billsacks commented Aug 17, 2016

jedwards4b commented Aug 17, 2016

mvertens commented Aug 17, 2016

jgfouca commented Aug 17, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

billsacks commented Aug 17, 2016

jedwards4b commented Aug 17, 2016

jedwards4b commented Aug 17, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

billsacks Aug 18, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgfouca Aug 18, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgfouca commented Aug 19, 2016

gold2718 commented Aug 20, 2016

jgfouca commented Aug 21, 2016

jgfouca commented Aug 21, 2016

gold2718 commented Aug 21, 2016

jgfouca commented Aug 21, 2016

jgfouca commented Aug 16, 2016 •

edited

Loading

billsacks Aug 18, 2016 •

edited

Loading

jgfouca Aug 18, 2016 •

edited

Loading