Skip to content

Commit

Permalink
Merge pull request #1370 from ESMCI/jgfouca/minor_fix_for_cheyenne
Browse files Browse the repository at this point in the history
Fix NODEFAIL test on cheyenne.
  • Loading branch information
jedwards4b authored Apr 18, 2017
2 parents a99eab3 + cedf8f0 commit 9d9f09c
Show file tree
Hide file tree
Showing 3 changed files with 52 additions and 2 deletions.
4 changes: 3 additions & 1 deletion config/cesm/machines/config_machines.xml
Original file line number Diff line number Diff line change
Expand Up @@ -219,8 +219,10 @@
<mpirun mpilib="default">
<executable>mpiexec_mpt</executable>
<arguments>
<arg name="anum_tasks"> -np $TOTALPES</arg>
<arg name="labelstdout">-p "%g:"</arg>
<arg name="threadplacement"> omplace </arg>
<!-- the omplace argument needs to be last -->
<arg name="zthreadplacement"> omplace </arg>
</arguments>
</mpirun>
<mpirun mpilib="mpi-serial">
Expand Down
42 changes: 41 additions & 1 deletion config/config_tests.xml
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,45 @@ PRE pause-resume test: by default a BFB test of pause-resume cycling
LII CLM initial condition interpolation test
======================================================================
Infrastructural tests for CIME. These are used by scripts_regression_tests.
Users won't generally run these.
======================================================================
TESTBUILDFAIL Insta-fail build step. Used to confirm that failed
builds are caught and reported correctly.
TESTBUILDFAILEXC Insta-fail build step by failing to init. Used to test
correct behavior when exceptions are generated.
TESTRUNFAIL Insta-fail run step. Used to confirm that model run
failures are caught and reported correctly.
TESTRUNFAILEXC Insta-fail run step via exception. Used to test correct
correct behavior when exceptions are generated.
TESTRUNPASS Insta-pass run step. Used to test that run that work
are reported correctly.
TESTMEMLEAKFAIL Insta-fail memleak step. Used to test that memleaks are
detected and reported correctly.
TESTMEMLEAKPASS Insta-pass memleak step. Used to test that non-memleaks are
reported correctly.
TESTRUNDIFF Produces a canned hist file. Env var TESTRUNDIFF_ALTERNATE can
be used to cause a DIFF. Used to check that baseline diffs are
detected and reported correctly.
TESTTESTDIFF Simulates internal test diff (non baseline). Used to check that
internal comparison failures are detected and reported correctly.
TESTRUNSLOWPASS After 5 minutes of sleep, pass run step. Used to test timeouts
and kills.
NODEFAIL Tests restart upon detected node failure. Generates fake failures,
the number of which is controlled by NODEFAIL_NUM_FAILS.
-->

Expand Down Expand Up @@ -366,7 +405,8 @@ LII CLM initial condition interpolation test
<test NAME="NODEFAIL">
<DESC>For testing infra only. Tests restart upon detected node failure</DESC>
<INFO_DBUG>1</INFO_DBUG>
<STOP_OPTION>ndays</STOP_OPTION>
<STOP_OPTION>nsteps</STOP_OPTION>
<OCN_NCPL>$ATM_NCPL</OCN_NCPL>
<STOP_N>11</STOP_N>
<REST_N>$STOP_N / 2 + 1</REST_N>
<REST_OPTION>$STOP_OPTION</REST_OPTION>
Expand Down
8 changes: 8 additions & 0 deletions scripts/lib/CIME/SystemTests/nodefail.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,16 @@ def _restart_fake_phase(self):
env_mach_specific.set_value("run_exe", fake_exe_file)
self._case.flush(flushall=True)

# This flag is needed by mpt to run a script under mpiexec
mpilib = self._case.get_value("MPILIB")
if mpilib == "mpt":
os.environ["MPI_SHEPHERD"] = "true"

self.run_indv(suffix=None)

if mpilib == "mpt":
del os.environ["MPI_SHEPHERD"]

env_mach_specific = self._case.get_env("mach_specific")
env_mach_specific.set_value("run_exe", prev_run_exe)
self._case.flush(flushall=True)
Expand Down

0 comments on commit 9d9f09c

Please sign in to comment.