From 555fcc748d2cbb1c68d7175b7a10b2ed0147ad50 Mon Sep 17 00:00:00 2001
From: Patrick Worley <worleyph@ornl.gov>
Date: Tue, 31 Jan 2017 23:31:56 -0500
Subject: [PATCH] fix calculation of TOT Run Time in get_timing.py

get_timing.py is used to generate the performance summary
file acme_timing.$case.$lid from the raw global performance data
acme_timing_stats.$lid . The computation of TOT Run Time uses
the formula:

        tmax = tmax + wtmax + correction

where

        tmax  = self.gettime(' CPL:RUN_LOOP ')[1]
        wtmax = self.gettime(' CPL:TPROF_WRITE ')[1]
        correction = max(0, ocnrunitime - ocnwaittime)

Here tmax is the maximum time any process spends in the RUN
loop. wtmax is the maximum time any process spends in the phase where
checkpoint timing data is output, including the barrier waiting for
all processes to enter this phase. If one component is running on
nodes separate from the other components and takes very little time,
wtmax will reflect the time that this component is waiting at the
barrier while the other components are in the RUN loop, double
counting this time after they are summed. This error is not
significant during typical production runs, but it does affect short
benchmark runs where checkpoint  performance data is written
frequently, which is the type of runs used to evaluate PE layouts and
set performance optimization targets. As such it is important to fix
this as soon as possible.

The solution proposed here is to use

        wtmin = self.gettime(' CPL:TPROF_WRITE ')[0]
        tmax = tmax + wtmin + correction

Since CPL:TPROF_WRITE includes barriers before and after the performance
data write (t_prf), the minimum will capture the cost of the t_prf call
even if the process achieving the minimum is not the one that spends
the most time in t_prf.

Note that I do not understand the role of 'correction' in the above
formula - it is perhaps  extrapolating what the TOT time would be if
the OCN is simulated the same  amount of time as the ATM (it is
typically a little less), and do not know who wrote this script. In
the cases used to diagnose the 'tmax + wtmax' issue, 'correction' was
zero. With the current coupling frequency, I do not expect
'correction' to be very large in any case, but it would be worth while
querying the author as to the intent, but that is distinct from
resolving this issue with double counting RUN loop time.
---
 utils/python/CIME/get_timing.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/utils/python/CIME/get_timing.py b/utils/python/CIME/get_timing.py
index 8f1f592e168..b69b1637964 100644
--- a/utils/python/CIME/get_timing.py
+++ b/utils/python/CIME/get_timing.py
@@ -256,7 +256,7 @@ def getTiming(self):
 
         nmax  = self.gettime(' CPL:INIT ')[1]
         tmax  = self.gettime(' CPL:RUN_LOOP ')[1]
-        wtmax = self.gettime(' CPL:TPROF_WRITE ')[1]
+        wtmin = self.gettime(' CPL:TPROF_WRITE ')[0]
         fmax  = self.gettime(' CPL:FINAL ')[1]
         for k in components:
             if k != "CPL":
@@ -281,7 +281,7 @@ def getTiming(self):
 
         correction = max(0, ocnrunitime - ocnwaittime)
 
-        tmax = tmax + wtmax + correction
+        tmax = tmax + wtmin + correction
         ocn.tmax += ocnrunitime
 
         for m in self.models.values():