Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mv erp test to compare two #1845

Closed
wants to merge 3 commits into from

Conversation

jedwards4b
Copy link
Contributor

Rewrite erp test using the compare two paradigm

Test suite: scripts_regression_tests.py, ERP_Ln9.f09_f09_mg17.F2000_DEV.cheyenne_intel.cam-outfrq9s
Test baseline: cesm2_0_alpha07d
Test namelist changes:
Test status: bit for bit

Addresses #1647

User interface changes?:

Update gh-pages html (Y/N)?:

Code review:

@@ -78,11 +78,11 @@
<hist_file_extension>\.h.*.nc$|\.d[dovt]\.</hist_file_extension>
<rest_history_varname>unset</rest_history_varname>
<rpointer>
<rpointer_file>rpointer.ocn.restart$NINST_STRING</rpointer_file>
<rpointer_file>rpointer.ocn$NINST_STRING.restart</rpointer_file>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unrelated to this PR, its a fix for test IRT.f09_g17.B1850

rundir2 = self._case2.get_value("RUNDIR")
case = self._case1.get_value("CASE")
datenames = _get_datenames(self._case1)
for file_ in glob.iglob(os.path.join(rundir1,"*")):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This copies the rpointer files and links the restart and hist restart files from the restart time to the case2 run directory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for coming up with this solution that avoids running the full short-term archiver.

Could the bulk of this function be put in some shared location so that it can be reused by other tests if needed? This could go in scripts/lib/CIME/SystemTests/test_utils/.

Copy link
Member

@billsacks billsacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @jedwards4b . The conversion to system_tests_compare_two looks good. I have some line comments related to the linking to restart files.

rundir2 = self._case2.get_value("RUNDIR")
case = self._case1.get_value("CASE")
datenames = _get_datenames(self._case1)
for file_ in glob.iglob(os.path.join(rundir1,"*")):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for coming up with this solution that avoids running the full short-term archiver.

Could the bulk of this function be put in some shared location so that it can be reused by other tests if needed? This could go in scripts/lib/CIME/SystemTests/test_utils/.

rundir1 = self._case1.get_value("RUNDIR")
rundir2 = self._case2.get_value("RUNDIR")
case = self._case1.get_value("CASE")
datenames = _get_datenames(self._case1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_datenames should have the leading underscore removed since it is no longer private.

if os.path.basename(file_).startswith("rpointer"):
logger.info("Copy {} to {}".format(file_, rundir2))
shutil.copy(file_, rundir2)
elif os.path.basename(file_).startswith(case) and datenames[0] in file_:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this guaranteed to pick up everything that's needed? There's a lot of complexity in _archive_restarts that isn't captured here. For the most part, I don't see what this might be missing, except that it seems not to capture any unfinished history files that the short-term archiver would capture with get_histfiles_for_restarts.

(This is why I was hoping we could still reuse pieces of the case_st_archive code....)

@jedwards4b
Copy link
Contributor Author

@billsacks i've added your suggestions.

@billsacks
Copy link
Member

@jedwards4b I'm playing with this a bit to see if I can accomplish the refactor I was envisioning of case_st_archive, in order to remove the test-specific logic you've added. I'll let you know if I have any luck.

@billsacks
Copy link
Member

With the current code, plugged into

https://svn-ccsm-models.cgd.ucar.edu/clm2/trunk_tags/clm4_5_16_r253

I get a failure in this test: ERP_P24x2_Ld5.f10_f10.I1850Clm50Bgc.cheyenne_intel.clm-default:

2017-08-30 09:38:05: Exception during run:
`/glade/scratch/sacks/ERP_P24x2_Ld5.f10_f10.I1850Clm50Bgc.cheyenne_intel.clm-default.20170830_091616_gns0xp/run/ERP_P24x2_Ld5.f10_f10.I1850Clm50Bgc.cheyenne_intel.clm-default.20170830_091616_gns0xp.clm2.h0.0001-01-04-00000.nc` and `/glade/scratch/sacks/ERP_P24x2_Ld5.f10_f10.I1850Clm50Bgc.cheyenne_intel.clm-default.20170830_091616_gns0xp/case2/ERP_P24x2_Ld5.f10_f10.I1850Clm50Bgc.cheyenne_intel.clm-default.20170830_091616_gns0xp/run/ERP_P24x2_Ld5.f10_f10.I1850Clm50Bgc.cheyenne_intel.clm-default.20170830_091616_gns0xp.clm2.h0.0001-01-04-00000.nc` are the same file
Traceback (most recent call last):
  File "/glade2/scratch2/sacks/cesm_code/clm4_5_16_r253_newCime/cime/scripts/Tools/../../scripts/lib/CIME/SystemTests/system_tests_common.py", line 148, in run
    self.run_phase()
  File "/glade2/scratch2/sacks/cesm_code/clm4_5_16_r253_newCime/cime/scripts/Tools/../../scripts/lib/CIME/SystemTests/system_tests_compare_two.py", line 213, in run_phase
    self._case_one_custom_postrun_action()
  File "/glade2/scratch2/sacks/cesm_code/clm4_5_16_r253_newCime/cime/scripts/Tools/../../scripts/lib/CIME/SystemTests/erp.py", line 65, in _case_one_custom_postrun_action
    self.setup_restart()
  File "/glade2/scratch2/sacks/cesm_code/clm4_5_16_r253_newCime/cime/scripts/Tools/../../scripts/lib/CIME/SystemTests/system_tests_compare_two.py", line 264, in setup_restart
    shutil.copy(os.path.join(rundir1,histfile), rundir2)
  File "/usr/lib64/python2.7/shutil.py", line 119, in copy
    copyfile(src, dst)
  File "/usr/lib64/python2.7/shutil.py", line 69, in copyfile
    raise Error("`%s` and `%s` are the same file" % (src, dst))
Error: `/glade/scratch/sacks/ERP_P24x2_Ld5.f10_f10.I1850Clm50Bgc.cheyenne_intel.clm-default.20170830_091616_gns0xp/run/ERP_P24x2_Ld5.f10_f10.I1850Clm50Bgc.cheyenne_intel.clm-default.20170830_091616_gns0xp.clm2.h0.0001-01-04-00000.nc` and `/glade/scratch/sacks/ERP_P24x2_Ld5.f10_f10.I1850Clm50Bgc.cheyenne_intel.clm-default.20170830_091616_gns0xp/case2/ERP_P24x2_Ld5.f10_f10.I1850Clm50Bgc.cheyenne_intel.clm-default.20170830_091616_gns0xp/run/ERP_P24x2_Ld5.f10_f10.I1850Clm50Bgc.cheyenne_intel.clm-default.20170830_091616_gns0xp.clm2.h0.0001-01-04-00000.nc` are the same file

It's fine with me if you wait to look into this until I've gotten a chance to play around with a possible refactor that might make this issue irrelevant.

@jedwards4b
Copy link
Contributor Author

This is a case where the history file has the restart date in it and so is linked to rundir2, we need to add code to specifically exclude history files from the link step and only copy them if needed in the next step.

@jedwards4b jedwards4b closed this Aug 31, 2017
jgfouca added a commit that referenced this pull request Nov 7, 2017
ESMCI Version: cime5.4.0-alpha.06 10/25/2017

New user interface features
case.submit now supports arbitrary prerequisites
Add option to define test types in query_testlists
Add ability to cancel batch jobs to the system
Add --baseline-root argument to case.cmpgen_baselines
Use keyword-value pairs for create_clone call

Changes to case control system.
Fix COST_PES and TOTALPES for titan (aprun)
Rename PES_PER_NODE to MAX_MPITASKS_PER_NODE.
Resolve confusion between components and component classes
Add support for python3!
Allow certain batch systems (PBS) to pass flags to case.run/case.test via environment var
Refactor queue selection, use nodemin/nodemax instead of jobmin jobmax
Fix E3SM idmap check.
ERR test rework.
New error trap if invalid "idmap" file is present in seq_maps.rc
Fix generate and compare baseline functions
Add a derived variable NTASKS_PER_INST_COMP where COMP is the component name (ATM, LND, etc)
New batch_env optional entry in config_batch
new batch_cancel field in config_batch
Fix tests for small systems.
Refactor CPLHIST mode and add DATM CPLHIST topo capability
Cleanup of the compvar implementation
Add COMP_ROOT_DIR_ variables so that a component can be moved with the change of a single cime variable
Make sure user changes to wallclock/queue are not lost

New tools
Rewrite of load balancing tool (from CMDV)

Coupler/driver changes
only run driver build namelist after first da cycle

Fixes #1845
Fixes #1426

[BFB]

* jgfouca/branch-for-to-acme-10-25-2017-pr: (437 commits)
  Fixes for 2-case building
  Critical bug fix correcting unicode/str confusion
  Add mpas rpointer fix back in
  Fix config mistakes in config_batch
  Fix archive configuration problem for acme
  Fix R test-opt
  Fix duplication of entries
  Minor config_machines fixup
  Remove extraneous logger.info and change a warning to logger.warning
  Make the prereq test more flexible and correct
  Implement a much simpler script_regression_test tests
  Partial implementation of the prereq test. Still need to find out a generic way of verifying that the dependant job actually depends on the other one
  Revert change from afterok to afterany; this should only be done if the user specifically requests it
  Implements the prereq argument for case.submit, allowing the user to specify jobs which should finish (not necessarily successfully) before running the current job
  Update ChangeLog
  Change case.st_archive dependency string to not include strings indicating unsupported logic. Fix cobalt depend_string. Add depend_separator field to support LSF
  update mpt and pnetcdf on cheyenne
  Update description for TOTALPES
  Include spare nodes in COST_PES
  Update ChangeLog
  ...
@jedwards4b jedwards4b deleted the erp2systemcomparetwo branch November 16, 2017 19:56
jgfouca added a commit that referenced this pull request Feb 23, 2018
ESMCI Version: cime5.4.0-alpha.06 10/25/2017

New user interface features
case.submit now supports arbitrary prerequisites
Add option to define test types in query_testlists
Add ability to cancel batch jobs to the system
Add --baseline-root argument to case.cmpgen_baselines
Use keyword-value pairs for create_clone call

Changes to case control system.
Fix COST_PES and TOTALPES for titan (aprun)
Rename PES_PER_NODE to MAX_MPITASKS_PER_NODE.
Resolve confusion between components and component classes
Add support for python3!
Allow certain batch systems (PBS) to pass flags to case.run/case.test via environment var
Refactor queue selection, use nodemin/nodemax instead of jobmin jobmax
Fix E3SM idmap check.
ERR test rework.
New error trap if invalid "idmap" file is present in seq_maps.rc
Fix generate and compare baseline functions
Add a derived variable NTASKS_PER_INST_COMP where COMP is the component name (ATM, LND, etc)
New batch_env optional entry in config_batch
new batch_cancel field in config_batch
Fix tests for small systems.
Refactor CPLHIST mode and add DATM CPLHIST topo capability
Cleanup of the compvar implementation
Add COMP_ROOT_DIR_ variables so that a component can be moved with the change of a single cime variable
Make sure user changes to wallclock/queue are not lost

New tools
Rewrite of load balancing tool (from CMDV)

Coupler/driver changes
only run driver build namelist after first da cycle

Fixes #1845
Fixes #1426

[BFB]

* jgfouca/branch-for-to-acme-10-25-2017-pr: (437 commits)
  Fixes for 2-case building
  Critical bug fix correcting unicode/str confusion
  Add mpas rpointer fix back in
  Fix config mistakes in config_batch
  Fix archive configuration problem for acme
  Fix R test-opt
  Fix duplication of entries
  Minor config_machines fixup
  Remove extraneous logger.info and change a warning to logger.warning
  Make the prereq test more flexible and correct
  Implement a much simpler script_regression_test tests
  Partial implementation of the prereq test. Still need to find out a generic way of verifying that the dependant job actually depends on the other one
  Revert change from afterok to afterany; this should only be done if the user specifically requests it
  Implements the prereq argument for case.submit, allowing the user to specify jobs which should finish (not necessarily successfully) before running the current job
  Update ChangeLog
  Change case.st_archive dependency string to not include strings indicating unsupported logic. Fix cobalt depend_string. Add depend_separator field to support LSF
  update mpt and pnetcdf on cheyenne
  Update description for TOTALPES
  Include spare nodes in COST_PES
  Update ChangeLog
  ...
jgfouca added a commit that referenced this pull request Mar 13, 2018
ESMCI Version: cime5.4.0-alpha.06 10/25/2017

New user interface features
case.submit now supports arbitrary prerequisites
Add option to define test types in query_testlists
Add ability to cancel batch jobs to the system
Add --baseline-root argument to case.cmpgen_baselines
Use keyword-value pairs for create_clone call

Changes to case control system.
Fix COST_PES and TOTALPES for titan (aprun)
Rename PES_PER_NODE to MAX_MPITASKS_PER_NODE.
Resolve confusion between components and component classes
Add support for python3!
Allow certain batch systems (PBS) to pass flags to case.run/case.test via environment var
Refactor queue selection, use nodemin/nodemax instead of jobmin jobmax
Fix E3SM idmap check.
ERR test rework.
New error trap if invalid "idmap" file is present in seq_maps.rc
Fix generate and compare baseline functions
Add a derived variable NTASKS_PER_INST_COMP where COMP is the component name (ATM, LND, etc)
New batch_env optional entry in config_batch
new batch_cancel field in config_batch
Fix tests for small systems.
Refactor CPLHIST mode and add DATM CPLHIST topo capability
Cleanup of the compvar implementation
Add COMP_ROOT_DIR_ variables so that a component can be moved with the change of a single cime variable
Make sure user changes to wallclock/queue are not lost

New tools
Rewrite of load balancing tool (from CMDV)

Coupler/driver changes
only run driver build namelist after first da cycle

Fixes #1845
Fixes #1426

[BFB]

* jgfouca/branch-for-to-acme-10-25-2017-pr: (437 commits)
  Fixes for 2-case building
  Critical bug fix correcting unicode/str confusion
  Add mpas rpointer fix back in
  Fix config mistakes in config_batch
  Fix archive configuration problem for acme
  Fix R test-opt
  Fix duplication of entries
  Minor config_machines fixup
  Remove extraneous logger.info and change a warning to logger.warning
  Make the prereq test more flexible and correct
  Implement a much simpler script_regression_test tests
  Partial implementation of the prereq test. Still need to find out a generic way of verifying that the dependant job actually depends on the other one
  Revert change from afterok to afterany; this should only be done if the user specifically requests it
  Implements the prereq argument for case.submit, allowing the user to specify jobs which should finish (not necessarily successfully) before running the current job
  Update ChangeLog
  Change case.st_archive dependency string to not include strings indicating unsupported logic. Fix cobalt depend_string. Add depend_separator field to support LSF
  update mpt and pnetcdf on cheyenne
  Update description for TOTALPES
  Include spare nodes in COST_PES
  Update ChangeLog
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants