Skip to content

Commit

Permalink
Merge branch 'worleyph/cime/save_timing_dir_projects' into master (PR #…
Browse files Browse the repository at this point in the history
…1950)

Limit performance archiving to specific projects on each system

To allow for automatic performance archiving on ACME production
platforms for ACME allocations without also forcing this on non-ACME
allocations, add a new XML variable to config_machines.xml and
env_run.xml:

<SAVE_TIMING_DIR_PROJECTS>proj1,proj2</SAVE_TIMING_DIR_PROJECTS>

If (a) PROJECT is one of the projects in this list and (b) SAVE_TIMING
is true and (c) SAVE_TIMING_DIR is a legal location, then
performance archiving into SAVE_TIMING_DIR will take place. Otherwise
it will not.

If the first element of the list is ANY, then any allocation will
pass test (a). If SAVE_TIMING_DIR_PROJECTS is missing for a given
machine in config_machines.xml, then no allocation will pass test (a).

Fixes #1949

BFB

* origin/worleyph/cime/save_timing_dir_projects:
  put SAVE_TIMING_DIR_PROJECTS in env_run.xml
  simplify implementation of is_save_timing_dir_project
  fix typo in comment in machines.py
  change logger.warning to logger.info
  modify logger output when archiving performance data
  Improve support for project-specific performance archiving
  • Loading branch information
jgfouca committed Dec 7, 2017
2 parents e62ada8 + acdd673 commit 26164a2
Show file tree
Hide file tree
Showing 8 changed files with 59 additions and 17 deletions.
13 changes: 12 additions & 1 deletion config/acme/machines/config_machines.xml
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@
<BASELINE_ROOT>/project/projectdirs/acme/baselines/$COMPILER</BASELINE_ROOT>
<CCSM_CPRNC>/project/projectdirs/acme/tools/cprnc.edison/cprnc</CCSM_CPRNC>
<SAVE_TIMING_DIR>/project/projectdirs/acme</SAVE_TIMING_DIR>
<SAVE_TIMING_DIR_PROJECTS>acme,m2830,m2833</SAVE_TIMING_DIR_PROJECTS>
<TEST_TPUT_TOLERANCE>0.1</TEST_TPUT_TOLERANCE>
<mpirun mpilib="default">
<executable>srun</executable>
Expand Down Expand Up @@ -204,6 +205,7 @@
<BASELINE_ROOT>/project/projectdirs/acme/baselines/$COMPILER</BASELINE_ROOT>
<CCSM_CPRNC>/project/projectdirs/acme/tools/cprnc.cori/cprnc</CCSM_CPRNC>
<SAVE_TIMING_DIR>/project/projectdirs/acme</SAVE_TIMING_DIR>
<SAVE_TIMING_DIR_PROJECTS>acme,m2830,m2833</SAVE_TIMING_DIR_PROJECTS>
<OS>CNL</OS>
<BATCH_SYSTEM>slurm</BATCH_SYSTEM>
<SUPPORTED_BY>acme</SUPPORTED_BY>
Expand Down Expand Up @@ -348,6 +350,7 @@
<BASELINE_ROOT>/project/projectdirs/acme/baselines/$COMPILER</BASELINE_ROOT>
<CCSM_CPRNC>/project/projectdirs/acme/tools/cprnc.cori/cprnc</CCSM_CPRNC>
<SAVE_TIMING_DIR>/project/projectdirs/acme</SAVE_TIMING_DIR>
<SAVE_TIMING_DIR_PROJECTS>acme,m2830,m2833</SAVE_TIMING_DIR_PROJECTS>
<OS>CNL</OS>
<BATCH_SYSTEM>slurm</BATCH_SYSTEM>
<SUPPORTED_BY>acme</SUPPORTED_BY>
Expand Down Expand Up @@ -564,6 +567,7 @@
<DOUT_L_MSROOT>csm/$CASE</DOUT_L_MSROOT>
<BASELINE_ROOT>/sems-data-store/ACME/baselines/$COMPILER</BASELINE_ROOT>
<SAVE_TIMING_DIR>/sems-data-store/ACME/timings</SAVE_TIMING_DIR>
<SAVE_TIMING_DIR_PROJECTS>.*</SAVE_TIMING_DIR_PROJECTS>
<CCSM_CPRNC>/sems-data-store/ACME/cprnc/build.new/cprnc</CCSM_CPRNC>
<SUPPORTED_BY>jgfouca at sandia dot gov</SUPPORTED_BY>
<!-- <GMAKE>make</GMAKE> <- this doesn't actually work! -->
Expand Down Expand Up @@ -764,6 +768,7 @@
<BASELINE_ROOT>/projects/ccsm/ccsm_baselines/$COMPILER</BASELINE_ROOT>
<CCSM_CPRNC>/projects/ccsm/cprnc/build.toss3/cprnc_wrap</CCSM_CPRNC> <!-- path to the cprnc tool used to compare netcdf history files in testing -->
<SAVE_TIMING_DIR>/projects/ccsm/timings</SAVE_TIMING_DIR>
<SAVE_TIMING_DIR_PROJECTS>.*</SAVE_TIMING_DIR_PROJECTS>
<BATCH_SYSTEM>slurm</BATCH_SYSTEM>
<SUPPORTED_BY>jgfouca at sandia dot gov</SUPPORTED_BY>
<GMAKE_J>8</GMAKE_J>
Expand Down Expand Up @@ -1003,6 +1008,7 @@
<MPILIBS>mvapich,openmpi</MPILIBS>
<CIME_OUTPUT_ROOT>/lcrc/group/acme/$USER/acme_scratch</CIME_OUTPUT_ROOT>
<SAVE_TIMING_DIR>/lcrc/group/acme</SAVE_TIMING_DIR>
<SAVE_TIMING_DIR_PROJECTS>.*</SAVE_TIMING_DIR_PROJECTS>
<RUNDIR>$CIME_OUTPUT_ROOT/$CASE/run</RUNDIR>
<EXEROOT>$CIME_OUTPUT_ROOT/$CASE/bld</EXEROOT>
<DIN_LOC_ROOT>/home/ccsm-data/inputdata</DIN_LOC_ROOT>
Expand Down Expand Up @@ -1449,6 +1455,7 @@
<BASELINE_ROOT>/projects/ccsm/ccsm_baselines//$COMPILER</BASELINE_ROOT>
<CCSM_CPRNC>/projects/ccsm/tools/cprnc/cprnc</CCSM_CPRNC>
<SAVE_TIMING_DIR>/projects/$PROJECT</SAVE_TIMING_DIR>
<SAVE_TIMING_DIR_PROJECTS>ClimateEnergy_2</SAVE_TIMING_DIR_PROJECTS>
<OS>BGQ</OS>
<BATCH_SYSTEM>cobalt</BATCH_SYSTEM>
<SUPPORTED_BY> mickelso -at- mcs.anl.gov</SUPPORTED_BY>
Expand Down Expand Up @@ -1511,7 +1518,8 @@
<DOUT_L_MSROOT>$CIME_OUTPUT_ROOT/csm/$CASE</DOUT_L_MSROOT>
<BASELINE_ROOT>/projects/$PROJECT/acme/baselines/$COMPILER</BASELINE_ROOT>
<CCSM_CPRNC>/projects/$PROJECT/acme/tools/cprnc/cprnc</CCSM_CPRNC>
<SAVE_TIMING_DIR>/projects/OceanClimate</SAVE_TIMING_DIR>
<SAVE_TIMING_DIR>/projects/$PROJECT</SAVE_TIMING_DIR>
<SAVE_TIMING_DIR_PROJECTS>OceanClimate</SAVE_TIMING_DIR_PROJECTS>
<OS>CNL</OS>
<BATCH_SYSTEM>cobalt_theta</BATCH_SYSTEM>
<SUPPORTED_BY>acme</SUPPORTED_BY>
Expand Down Expand Up @@ -2011,6 +2019,7 @@
<BASELINE_ROOT>/lustre/atlas1/cli900/world-shared/cesm/baselines/$COMPILER</BASELINE_ROOT>
<CCSM_CPRNC>/lustre/atlas1/cli900/world-shared/cesm/tools/cprnc/cprnc.titan</CCSM_CPRNC>
<SAVE_TIMING_DIR>$ENV{PROJWORK}/$PROJECT</SAVE_TIMING_DIR>
<SAVE_TIMING_DIR_PROJECTS>cli115,cli127,cli106,csc190</SAVE_TIMING_DIR_PROJECTS>
<OS>CNL</OS>
<BATCH_SYSTEM>pbs</BATCH_SYSTEM>
<ALLOCATE_SPARE_NODES>TRUE</ALLOCATE_SPARE_NODES>
Expand Down Expand Up @@ -2192,6 +2201,7 @@
<BASELINE_ROOT>/lustre/atlas1/cli900/world-shared/cesm/baselines/$COMPILER</BASELINE_ROOT>
<CCSM_CPRNC>/lustre/atlas1/cli900/world-shared/cesm/tools/cprnc/cprnc.eos</CCSM_CPRNC>
<SAVE_TIMING_DIR>$ENV{PROJWORK}/$PROJECT</SAVE_TIMING_DIR>
<SAVE_TIMING_DIR_PROJECTS>cli115,cli127,cli106,csc190</SAVE_TIMING_DIR_PROJECTS>
<OS>CNL</OS>
<BATCH_SYSTEM>pbs</BATCH_SYSTEM>
<SUPPORTED_BY>acme</SUPPORTED_BY>
Expand Down Expand Up @@ -2786,6 +2796,7 @@
<BASELINE_ROOT>/lustre/atlas1/cli900/world-shared/cesm/baselines/$COMPILER</BASELINE_ROOT>
<CCSM_CPRNC>/lustre/atlas1/cli900/world-shared/cesm/tools/cprnc/cprnc</CCSM_CPRNC>
<SAVE_TIMING_DIR>/lustre/atlas/proj-shared/$PROJECT</SAVE_TIMING_DIR>
<SAVE_TIMING_DIR_PROJECTS>cli115,cli127,cli106,csc190</SAVE_TIMING_DIR_PROJECTS>
<OS>LINUX</OS>
<BATCH_SYSTEM>lsf</BATCH_SYSTEM>
<SUPPORTED_BY>acme</SUPPORTED_BY>
Expand Down
5 changes: 4 additions & 1 deletion config/xml_schemas/config_machines.xsd
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
<xs:element name="OS" type="xs:NCName"/>
<xs:element name="PROXY" type="xs:string"/>
<xs:element name="SAVE_TIMING_DIR" type="xs:string" />
<xs:element name="SAVE_TIMING_DIR_PROJECTS" type="xs:string" />
<xs:element name="PROJECT" type="xs:NCName" />
<xs:element name="CHARGE_ACCOUNT" type="xs:NCName" />
<xs:element name="COMPILERS" type="xs:string"/>
Expand Down Expand Up @@ -82,8 +83,10 @@
<!-- CHARGE_ACCOUNT: The name of the account to charge for batch jobs
can be overridden in environment or $HOME/.cime/config -->
<xs:element ref="CHARGE_ACCOUNT" minOccurs="0" maxOccurs="1"/>
<!-- SAVE_TIMING_DIR: (Acme only) directory to write timing output to -->
<!-- SAVE_TIMING_DIR: (Acme only) directory for archiving timing output -->
<xs:element ref="SAVE_TIMING_DIR" minOccurs="0" maxOccurs="1"/>
<!-- SAVE_TIMING_DIR_PROJECTS: (Acme only) projects whose jobs archive timing output -->
<xs:element ref="SAVE_TIMING_DIR_PROJECTS" minOccurs="0" maxOccurs="1"/>
<!-- CIME_OUTPUT_ROOT: Base directory for case output,
the bld and run directories are written below here -->
<xs:element ref="CIME_OUTPUT_ROOT" minOccurs="1" maxOccurs="1"/>
Expand Down
5 changes: 4 additions & 1 deletion config/xml_schemas/config_machines_template.xml
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,12 @@
can be overridden in environment or $HOME/.cime/config -->
<CHARGE_ACCOUNT></CHARGE_ACCOUNT>

<!-- SAVE_TIMING_DIR: (Acme only) directory to write timing output to -->
<!-- SAVE_TIMING_DIR: (Acme only) directory for archiving timing output -->
<SAVE_TIMING_DIR> </SAVE_TIMING_DIR>

<!-- SAVE_TIMING_DIR_PROJECTS: (Acme only) projects whose jobs archive timing output -->
<SAVE_TIMING_DIR_PROJECTS> </SAVE_TIMING_DIR_PROJECTS>

<!-- CIME_OUTPUT_ROOT: Base directory for case output,
the case/bld and case/run directories are written below here -->
<CIME_OUTPUT_ROOT>/glade/scratch/$USER</CIME_OUTPUT_ROOT>
Expand Down
3 changes: 2 additions & 1 deletion doc/source/users_guide/porting-cime.rst
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,8 @@ Each ``<machine>`` tag requires the following input:
- ``COMPILERS``: compilers supported on the machine, in comma-separated list, default first
- ``MPILIBS``: mpilibs supported on the machine, in comma-separated list, default first
- ``PROJECT``: a project or account number used for batch jobs; can be overridden in environment or in **$HOME/.cime/config**
- ``SAVE_TIMING_DIR``: (ACME only) target directory for writing timing output
- ``SAVE_TIMING_DIR``: (ACME only) target directory for archiving timing output
- ``SAVE_TIMING_DIR_PROJECTS``: (ACME only) projects whose jobs archive timing output
- ``CIME_OUTPUT_ROOT``: Base directory for case output; the **bld** and **run** directories are written below here
- ``DIN_LOC_ROOT``: location of the input data directory
- ``DIN_LOC_ROOT_CLMFORC``: optional input location for clm forcing data
Expand Down
17 changes: 17 additions & 0 deletions scripts/lib/CIME/case.py
Original file line number Diff line number Diff line change
Expand Up @@ -1457,3 +1457,20 @@ def create_clone(self, newcase, keepexe=False, mach_dir=None, project=None,
project=project, cime_output_root=cime_output_root,
exeroot=exeroot, rundir=rundir,
user_mods_dir=user_mods_dir)

def is_save_timing_dir_project(self,project):
"""
Check whether the project is permitted to archive performance data in the location
specified for the current machine
"""
save_timing_dir_projects = self.get_value("SAVE_TIMING_DIR_PROJECTS")
if not save_timing_dir_projects:
return False
else:
save_timing_dir_projects = save_timing_dir_projects.split(",")
for save_timing_dir_project in save_timing_dir_projects:
regex = re.compile(save_timing_dir_project)
if regex.match(project):
return True

return False
15 changes: 11 additions & 4 deletions scripts/lib/CIME/provenance.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,12 +94,16 @@ def save_build_provenance(case, lid=None):
_save_build_provenance_cesm(case, lid)

def _save_prerun_timing_acme(case, lid):
project = case.get_value("PROJECT", subgroup="case.run")
if not case.is_save_timing_dir_project(project):
return

timing_dir = case.get_value("SAVE_TIMING_DIR")
if timing_dir is None or not os.path.isdir(timing_dir):
logger.warning("SAVE_TIMING_DIR {} is not valid. E3SM requires a valid SAVE_TIMING_DIR to be set in order to archive timings. Skipping archive of timing data.".format(timing_dir))
logger.warning("SAVE_TIMING_DIR {} is not valid. E3SM requires a valid SAVE_TIMING_DIR to archive timing data.".format(timing_dir))
return

logger.info("timing dir is {}".format(timing_dir))
logger.info("Archiving timing data and associated provenance in {}.".format(timing_dir))
rundir = case.get_value("RUNDIR")
blddir = case.get_value("EXEROOT")
caseroot = case.get_value("CASEROOT")
Expand Down Expand Up @@ -258,15 +262,13 @@ def _save_postrun_provenance_cesm(case, lid):
save_timing = case.get_value("SAVE_TIMING")
if save_timing:
rundir = case.get_value("RUNDIR")
timing_dir = case.get_value("SAVE_TIMING_DIR")
timing_dir = os.path.join(timing_dir, case.get_value("CASE"))
shutil.move(os.path.join(rundir,"timing"),
os.path.join(timing_dir,"timing."+lid))

def _save_postrun_timing_acme(case, lid):
caseroot = case.get_value("CASEROOT")
rundir = case.get_value("RUNDIR")
timing_dir = case.get_value("SAVE_TIMING_DIR")

# tar timings
rundir_timing_dir = os.path.join(rundir, "timing." + lid)
Expand All @@ -282,6 +284,11 @@ def _save_postrun_timing_acme(case, lid):
timing_saved_file = "timing.%s.saved" % lid
touch(os.path.join(caseroot, "timing", timing_saved_file))

project = case.get_value("PROJECT", subgroup="case.run")
if not case.is_save_timing_dir_project(project):
return

timing_dir = case.get_value("SAVE_TIMING_DIR")
if timing_dir is None or not os.path.isdir(timing_dir):
return

Expand Down
9 changes: 9 additions & 0 deletions src/drivers/mct/cime_config/config_component_acme.xml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,15 @@
<desc>Where to auto archive timing data</desc>
</entry>

<entry id="SAVE_TIMING_DIR_PROJECTS">
<type>char</type>
<valid_values></valid_values>
<default_value></default_value>
<group>run_flags</group>
<file>env_run.xml</file>
<desc> A comma-separated list of projects that are allowed to auto archive timing data in SAVE_TIMING_DIR</desc>
</entry>

<entry id="TIMER_DETAIL">
<type>integer</type>
<default_value>12</default_value>
Expand Down
9 changes: 0 additions & 9 deletions src/drivers/mct/cime_config/config_component_cesm.xml
Original file line number Diff line number Diff line change
Expand Up @@ -46,15 +46,6 @@
<desc>logical to save timing files in rundir</desc>
</entry>

<entry id="SAVE_TIMING_DIR">
<type>char</type>
<valid_values></valid_values>
<default_value>timing</default_value>
<group>run_flags</group>
<file>env_run.xml</file>
<desc>Where to auto archive timing data</desc>
</entry>

<entry id="TPROF_TOTAL">
<type>integer</type>
<default_value>0</default_value>
Expand Down

0 comments on commit 26164a2

Please sign in to comment.