Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade history tools to python #413

Merged
merged 20 commits into from
Aug 19, 2016
Merged

Conversation

jgfouca
Copy link
Contributor

@jgfouca jgfouca commented Aug 16, 2016

Full change list:

  1. Remove old shell-based tools and update calls to use python versions
  2. Move most functionality in old shell tools into hist_utils.py
  3. Make thin python wrapper programs to access hist_utils from the command line
  4. Do st_archive as LAST step in run_indv so that coupler_log_path is not needed
  5. Fix ERR test
  6. Update fake tests to create a fake hist file
  7. Large refactor of bless_test_results
  8. Add new compare_test_results, counterpart to bless_test_results.

Test suite: scripts_regression_tests
Test baseline:
Test namelist changes:
Test status: bit for bit

Fixes #332

User interface changes?: Significant changes to compare_* scripts

Code review: @jedwards4b @mvertens @billsacks @gold2718

Full change list:
1) Remove old shell-based tools and update calls to use python versions
2) Move most functionality in old shell tools into hist_utils.py
3) Make thin python wrapper programs to access hist_utils from the command line
4) Do st_archive as LAST step in run_indv so that coupler_log_path is not needed
5) Fix ERR test
6) Update fake tests to create a fake hist file
7) Large refactor of bless_test_results
@@ -28,7 +28,7 @@ def run_phase(self):
dout_s_root = self._case.get_value("DOUT_S_ROOT")
rundir = self._case.get_value("RUNDIR")
logger.info("staging files from archive %s" % dout_s_root)
for item in glob.glob(os.path.join(dout_s_root, "rest", "*", "*")):
for item in glob.glob(os.path.join(dout_s_root, "*", "hist", "*base")):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important change: I think ERR was broken before this PR, we just didn't notice it. The *base file did not exist in the "rest" subdirectory before or after this PR, so the "rest" vs. "base" comparison in ers_second_phase should not have worked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

@billsacks
Copy link
Member

I'd like to take a close look at this, but don't have time this week. If this can wait, I'd like to review it early next week.

@billsacks
Copy link
Member

Have you done some system tests where you force a difference in the history files, to confirm that this is picked up and reported correctly? I'd like to make sure that has been tested both for the in-test comparisons and for baseline comparisons.

@jedwards4b
Copy link
Contributor

I am ready to accept this PR. It removes and replaces the some of the last functionality still in sh.

@mvertens
Copy link
Contributor

Before this PR is accepted - I think that Bill's question should be
answered. I feel that these type of tests should be run to confirm the
robustness of the PR - if they have not been run.

On Wed, Aug 17, 2016 at 7:23 AM, jedwards4b [email protected]
wrote:

I am ready to accept this PR. It removes and replaces the some of the last
functionality still in sh.


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
#413 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AHlxE5giIpGh6CeCWTm2mjpC8oIKlBbUks5qgwtfgaJpZM4Jl8Do
.

@jgfouca
Copy link
Contributor Author

jgfouca commented Aug 17, 2016

@billsacks , yes we have our TESTRUNDIFF fake test case that generates a DIFF and confirms that it was caught. I hope the new python-based infrastructure is more robust than the old; there is some evidence this it is because I caught some longstanding problems when I made the switch.


CIME.utils.handle_standard_logging_options(args)

return args.caseroot
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You only return args.caseroot, yet the caller expects caseroot and baseline_dir

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@rljacob rljacob added this to the CIME5.1.0 milestone Aug 17, 2016
cprnc_exe = case.get_value("CCSM_CPRNC")
basename = os.path.basename(file1)
stat, out, _ = run_cmd("%s %s %s 2>&1 | tee %s/%s.cprnc.out" % (cprnc_exe, file1, file2, rundir, basename))
return (stat == 0 and "IDENTICAL" in out, out)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a new issue (it looks like this is just replicating the old behavior), so it doesn't necessarily need to be addressed in this PR, but if it isn't, I'd like to open a separate issue to have this fixed: It looks like the comparison passes as long as the string "IDENTICAL" is found anywhere in the cprnc output. In the event that there is a history field with IDENTICAL in its name, the comparison would always look successful. It would be more robust to check against the exact string - i.e., looking for "diff_test: the two files seem to be IDENTICAL".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd rather check return code, but I don't think I can currently rely on cprnc to have a sane return code. I'll look for "seem to be IDENTICAL"

@billsacks
Copy link
Member

@billsacks , yes we have our TESTRUNDIFF fake test case that generates a DIFF and confirms that it was caught. I hope the new python-based infrastructure is more robust than the old; there is some evidence this it is because I caught some longstanding problems when I made the switch.

I would definitely believe that the new infrastructure is more robust than the old, and I like the way you have changed things to be robust to some additional cases. (I was going to make a request yesterday to add some more robustness in this respect, and then saw that you just added it - so thank you!) I also think that it's great to have this TESTRUNDIFF fake test case.

However, I still feel like these unit tests need to be accompanied by some manual system tests for changes of this magnitude, and @mvertens also agreed in a conversation I just had with her. It sounds like @mvertens is in the process of doing some manual tests.

Brainstorming what I think should be verified, at the request of @mvertens (it's possible that enough confidence can be gained from unit tests for some of these, but it's not clear to me how many of these are covered by unit tests right now... and it would be good to supplement unit tests with manual comparisons for at least the most critical of these like (2)).

  1. Passing in-test comparison
    a. Ensure that all history files were compared in a fully-coupled case; it's probably sufficient to ensure that all of the expected cprnc files were created.
  2. Failing in-test comparison: can be verified with an ERS test using the diffs I gave here: ERP test does not fail when it should #295
  3. Baselines generated correctly: ensure that all expected files show up in the baseline directory
  4. Results reported correctly when a file exists in the baseline directory but not in the test case, for some / all of the history files
    a. This can be tested by adding some fake file to the baseline directory
    b. I can't remember how this was reported in the past
  5. Results reported correctly when a file exists in the test case but not in the baseline directory, for some / all of the history files
    a. i.e., baseline directory exists, but some / all history files are missing from it
  6. Results reported correctly when the baseline directory is missing entirely
    a. e.g., if you're comparing some test ERS.f09_g16.X.yellowstone_intel in baseline tag cesm2_0_alpha01e, I'm imagining that the cesm2_0_alpha01e directory exists, but there is no ERS.f09_g16.X.yellowstone_intel subdirectory
  7. Some history files identical but some differ
    a. From examining the code, I'm pretty confident this works correctly, and it's probably hard to force with a manual test, so I'm okay skipping it. (But it would be a good target for a unit test if there isn't already one like that.)
  8. What about the various _N2, NCK, etc. cases that were handled specially in the old component_compare_test.sh? I can't say I understand the old logic well, so it's hard for me to tell what is needed in the new code to reproduce the necessary functionality.

@jedwards4b
Copy link
Contributor

In hist_utils.py _iter_model_file_substrs() is using drv_comp.get_valid_model_components
which is a list of component class names (atm, lnd, etc) but it should use the list
of specific component names (cam, clm, etc)

@jedwards4b
Copy link
Contributor

I think I have a fix, testing now.

jedwards4b and others added 3 commits August 17, 2016 16:46
Return non-zero if there were significant problems during bless
test_hists.sort()
return test_hists

def move(case, suffix):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this and other functions in this file - at least for public functions: Can you please add documentation? At the least, I'd like to have documentation on each of the arguments to each function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pfffft, documentation... hah.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# Make sure user knows that some tests were not compareed
success = True
for broken_compare, reason in broken_compares:
logging.warning("COMPARE FAILED FOR TEST: %s, reason %s" % (broken_compare, reason))
Copy link
Member

@billsacks billsacks Aug 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally I would want the output from this tool to look basically the same as the output put in the TestStatus file - although just for the COMPARE_baseline phase. That is, I'd want to see something like:

PASS SOMETEST BASELINE_COMPARE

or

FAIL SOMETEST BASELINE_COMPARE

That's approximately what happened with the earlier version of these tools.

This comment applies both to the main output from the comparison, and also to various other extraneous information that is printed (or not). So, for example, I'd prefer not to have this output for every test:

                logging.info("Comparing results for test: %s, most recent result: %s" % (test_name, overall_result))

-- because I'd like to be able to parse this output in the same way that I parse the normal TestStatus or cs.status output.

However, I agree that it does make sense to report on failed tests here. One possibility would be to print out the testStatus output for each test that failed.

Actually, maybe you could accomplish all of this cleanly by making more use of the TestStatus class here: Rather than having this function be responsible for output, it could instead just update the ts object with the results of the BASELINE_COMPARE phase (I know, there isn't a separate BASELINE_COMPARE phase yet... but the idea still applies). Then you can output the updated TestStatus results.

One complication with this is that I don't think you want to change the TestStatus file in the test itself (it seems like a Bad Idea to have some external script change the TestStatus file). In this usage, we instead would want the TestStatus to be written for each test via the logger. I think this could be accomplished by adding a str method to TestStatus. This would contain the bulk of what is currently TestStatus.flush, and then TestStatus.flush could simply write out the results of str(self).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There actually is a BASELINE_COMPARE phase, it would be called "COMPARE_baseline" though.

I agree we do not want this script to change TestStatus files. Leveraging the TestStatus class for output is an interesting idea but I'm starting to get a bit overwhelmed by the number of changes being requested. Is there anyway we could punt this thought to a later ticket/PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm okay with that. Do you mean punting on the whole reworking of the output format? (I'm okay with that; I just want to make sure I open an appropriate ticket.)

Copy link
Contributor Author

@jgfouca jgfouca Aug 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the ticket should be to rework the output of compare/bless test results and to change TestStatus class to be able to write to logging.

def compare_history(testcase_dir_for_test, baseline_name):
###############################################################################
with Case(testcase_dir_for_test) as case:
baseline_full_dir = os.path.join(case.get_value("BASELINE_ROOT"), case.get_value("COMPILER"), baseline_name, case.get_value("CASEBASEID"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the CESM workflow, the COMPILER doesn't enter in to the baseline path. At least, it didn't with cime4, and still doesn't seem to in the latest baselines (cesm2_0_alpha01e, which uses some version of cime5).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be OK if the CESM workflow was modified in this way? It does make it much easier to maintain multiple sets of baselines for multiple compilers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't answer that myself. It would affect many people in CSEG, so I imagine needs some broader buy-in. Personally, I can't think of any major objections, but I also don't see the advantage, since our test names have the compiler in them. So it seems to work just fine to have a baseline directory that contains, say, SMS.f10_f10.ICLM45.yellowstone_intel, SMS.f10_f10.ICLM45.yellowstone_pgi and SMS.f10_f10.ICLM45.yellowstone_gnu.

I'll defer to @mvertens , @jedwards4b and other CSEG members on this question.

I do see that there are other places in the scripts that assume the ACME organization in this respect: at least bless_test_results.py and maybe test_scheduler.py and scripts_regression_tests.py (I just did a simple grep for grep -i 'path.join.*compiler').

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at my current baselines, I see directory names like "ERS.f09_g16.I1850CLM45CN.melvin_gnu", so the compiler is clearly included in the name and therefore runs with different compilers would not conflict; so maybe adding compiler to the path is not necessary. This was a change we made in ACME a long time ago back when we were working with an ancient version of CIME, so it's possible it's not necessary anymore. For now, I can do an if/else based on model here and in bless_test_results. I don't think anywhere else is impacted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have read your post more closely. Since this path system is built into test_scheduler, then CESM must be using it already. I guess they like it since I haven't seen any complaints ;)

def parse_command_line(args, description):
###############################################################################
parser = argparse.ArgumentParser(
usage="""\n%s [-n] [-r <TESTROOT>] [-b <BRANCH>] [-c <COMPILER>] [<TEST> <TEST> ...] [--verbose]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This usage message is wrong: it's missing -t, and I don't see any documentation / meaning of the -n option

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, fix incoming.

@jgfouca
Copy link
Contributor Author

jgfouca commented Aug 19, 2016

I believe all concerns have been addressed. I'm going to merge.

@jgfouca jgfouca merged commit 164cfd9 into master Aug 19, 2016
jgfouca added a commit that referenced this pull request Aug 19, 2016
Upgrade history tools to python

Full change list:
1) Remove old shell-based tools and update calls to use python versions
2) Move most functionality in old shell tools into hist_utils.py
3) Make thin python wrapper programs to access hist_utils from the command line
4) Do st_archive as LAST step in run_indv so that coupler_log_path is not needed
5) Fix ERR test
6) Update fake tests to create a fake hist file
7) Large refactor of bless_test_results
8) Add new compare_test_results, counterpart to bless_test_results.

Test suite: scripts_regression_tests
Test baseline:
Test namelist changes:
Test status: bit for bit

Fixes #332

User interface changes?: Significant changes to compare_* scripts

Code review: @jedwards4b @mvertens @billsacks @gold2718

* jgfouca/hist_tools_conv_to_python:
  Make comparison matchups more robust
  Fix user docs for compare_test_results
  improved reporting of baseline file count mismatch
  correct location of debug log in help message, store baselines with original filename
  Add usage example for typical CESM workflow
  Get rid of pdb trace that I believe was mistakenly left in
  Make a very obvious simplification to code
  Remove unneeded global
  Update hist infra to better-support user-chosen baseline_root
  minor help string fix
  More fixes from review
  fix issue in component_generate_baseline, get only most recent files
  Remove last cwd default args
  Remove dangerous cwd defaults, add documentation to hist_utils public API
  Add new compare_test_results, counterpart to bless_test_results
  bless_test_results: Need sane error code
  remove check for None
  fixes in hist_utils
  Fix mistake caught by code review
  Upgrade history tools to python

Conflicts:
	utils/python/CIME/check_lockedfiles.py
	utils/python/CIME/test_status.py
@jgfouca jgfouca deleted the jgfouca/hist_tools_conv_to_python branch August 19, 2016 21:56
@gold2718
Copy link

Are we merging our own pull requests now?

@jgfouca
Copy link
Contributor Author

jgfouca commented Aug 21, 2016

@gold2718 Once the OK has been given by the reviewers, does it matter who clicks the button?

@jgfouca
Copy link
Contributor Author

jgfouca commented Aug 21, 2016

@gold2718 To clarify, there's no such thing as an integrator role in CIME like there is in ACME.

@gold2718
Copy link

Understood, it's just that there was no time to review your significant change to the compare code before the merge.

@jgfouca
Copy link
Contributor Author

jgfouca commented Aug 21, 2016

Ah, sorry about that. You can click here to see the changes: 164cfd9

I'll be sure to implement whatever changes you'd like in another PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants