-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add benchmark scripts to scrape timing information from log files #319
Conversation
gcpy/benchmark/modules/benchmark_scrape_gcclassic_timers.py - Added this script to scrape benchmark timers into a table with to display timing info from Ref & Dev models. NOTE: More work will be needed to refine this in subsequent commits. Signed-off-by: Bob Yantosca <[email protected]>
gcpy/util.py - Added function "replace_whitespace", which replaces whitespace in a string with another character (default is "_"). This will be used to make sure the benchmark filenames and version labels do not have spaces. CHANGELOG.md - Updated accordingly Signed-off-by: Bob Yantosca <[email protected]>
gcpy/benchmark/modules/benchmark_scrape_gcclassic_timers.py - Benchmark module that can scrape the timers information from either gcclassic_timers.json or GEOS-Chem Classic log files. If multiple files are supplied as input, the timers information will be summed together. CHANGELOG.md - Updated accordingly Signed-off-by: Bob Yantosca <[email protected]>
gcpy/benchmark/modules/benchmark_scrape_gchp_timers.py - Script that scrapes the timing information at the end of the GCHP log file gcpy/benchmark/modules/benchmark_scrape_gcclassic_timers.py - Updated docstring comments CHANGELOG.md - Updated accordingly Signed-off-by: Bob Yantosca <[email protected]>
@lizziel @msulprizio: I've now modified this so that we scrape the GCHPchem and summary timers: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% GCHP Classic Benchmark Timing Information
%%%
%%% Ref = GCHP_14.4.0_list_input
%%% Dev = GCHP_14.4.0_str_input
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
GCHPchem Timer Ref [s] Dev [s]
-------------------------------------------------------------------------------
GCHPchem 250.080 125.040
--SetService 0.760 0.380
----generic 0.000 0.000
--Initialize 169.860 84.930
----INITIALIZE 169.860 84.930
------generic 10.020 5.010
--Record 0.000 0.000
----generic 0.000 0.000
--Run 73.180 36.590
----GenRunMine 73.180 36.590
------RUN 72.820 36.410
--------CP_BFRE 0.020 0.010
--------DO_CHEM 72.660 36.330
----------GC_CONV 3.500 1.750
----------GC_DRYDEP 0.080 0.040
----------GC_EMIS 1.900 0.950
----------GC_TURB 9.260 4.630
----------GC_CHEM 48.340 24.170
----------GC_WETDEP 3.700 1.850
----------GC_DIAGN 1.440 0.720
--------CP_AFTR 0.000 0.000
--Finalize 4.000 2.000
----generic 4.000 2.000
Summary Ref [s] Dev [s]
-------------------------------------------------------------------------------
All 522.512 261.256
--SetService 1.310 0.655
----GCHP 1.286 0.643
------GCHPctmEnv 0.016 0.008
------GCHPchem 0.776 0.388
------DYNAMICS 0.146 0.073
----HIST 0.000 0.000
----EXTDATA 0.000 0.000
--Initialize 234.744 117.372
----GCHP 195.176 97.588
------GCHPctmEnv 0.016 0.008
------GCHPchem 194.014 97.007
------DYNAMICS 0.014 0.007
----HIST 5.370 2.685
----EXTDATA 33.394 16.697
--Run 281.364 140.682
----EXTDATA 87.364 43.682
----GCHP 115.162 57.581
------GCHPctmEnv 0.382 0.191
------GCHPchem 83.698 41.849
------DYNAMICS 31.078 15.539
----HIST 6.686 3.343
--Finalize 4.900 2.450
----GCHP 4.870 2.435
------GCHPctmEnv 0.006 0.003
------GCHPchem 4.858 2.429
------DYNAMICS 0.004 0.002
----HIST 0.008 0.004
----EXTDATA 0.004 0.002 |
gcpy/benchmark/modules/benchmark_scrape_gchp_timers.py - Modified to scrape the GCHPchem timers as well as the summary timers. This involves: - Reading the file forwards (instead of backwards) - Using "." as the delimiter in the flattened dictionary - Rewriting the algorithm to parse timer lines from the log file - Now loop over dictionary keys in print_timers function Signed-off-by: Bob Yantosca <[email protected]>
gcpy/benchmark/modules/benchmark_scrape_gcclassic_timers.py - Rename driver program to "make_benchmark_gcclassic_timing_table" - Remove if __name__ == "__main__": block gcpy/benchmark/modules/benchmark_scrape_gchp_timers,py - Rename driver program to "make_benchmark_gchp_timing_table" - Add an error check to exit after the last summary timer is found (this only affects GCHP log files from AWS cloud benchmarks) - Remove if __name__ == "__main__": block gcpy/benchmark/modules/benchmark_utils.py - Added gcc_vs_gcc_dirs, gchp_vs_gcc_dirs, gchp_vs_gcc_dirs, and get_log_filenames to abstract repetitive code out of the 1-year benchmark scripts gcpy/benchmark/modules/run_1yr_fullchem_benchmark.py - Call make_benchmark_gcclassic_timing_table to produce the GCC vs. GCC timing table - Call make_benchmark_gchp_timing_table to produce the GCHP vs. GCHP timing table CHANGELOG.md - Updated accordingly TODO: Add a "% diff" column to the timing table output
gcpy/benchmark/modules/benchmark_scrape_gcclassic_timers.py gcpy/benchmark/modules/benchmark_scrape_gchp_timers.py - In function print_timer: - Compute % difference between ref & dev (or assign NaN if it would result in a div-by-zero) - Decrease width of timer name column from 25 to 22 spaces - Change format of Ref & Dev columns from 20.3f to 18.3f - Add percent diff column as 12.3e format (right-aligned) - In function display_timers: - Change width of column headers accordingly Signed-off-by: Bob Yantosca <[email protected]>
gcpy/benchmark/cloud/template.1hr_benchmark.yml gcpy/benchmark/cloud/template.1mo_benchmark.yml gcpy/benchmark/config/1mo_benchmark.yml gcpy/benchmark/config/1yr_fullchem_benchmark.yml gcpy/benchmark/config/1yr_tt_benchmark.yml - Add "logs_subdir" and "logs_template" tags to GCC/GCHP Ref & Dev - Add "timing_table" to "outputs" section CHANGELOG.md - Updated accordingly gcpy/benchmark/run_benchmark.py gcpy/benchmark/modules/run_1yr_fullchem_benchmark.py gcpy/benchmark/modules/run_1yr_tt_benchmark.py - Import & call make_benchmark_gcclassic_scrape_timers to create GCClassic vs. GCClassic timing information table Import & call make_benchmark_gchp_scrape_timers to create GCHP vs. GCHP timing information table Signed-off-by: Bob Yantosca <[email protected]>
gcpy/benchmark/cloud/template.1hr_benchmark.yml gcpy/benchmark/cloud/template.1mo_benchmark.yml - Change logs_subdir YAML tag to ".", since the log file is placed in the run directory but not in the OutputDir subdirectory gcpy/benchmark/modules/benchmark_utils.py - Place the "timestamps" argument in a list if there is only one timestamp before trying to iterate over it with a for loop. gcpy/benchmark/run_benchmark.py - We had mistakely placed the GCHP timing table in the GCHP vs. GCC section. Move this to the GCHP vs. GCHP section.
@msulprizio @lizziel: I have added a % diff column to the timing output: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% GEOS-Chem Classic Benchmark Timing Information
%%%
%%% Ref = gcc-14.3.0-rc.0
%%% Dev = gcc-14.3.0-rc.0
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Timer Ref [s] Dev [s] % Diff
-------------------------------------------------------------------------------
GEOS-Chem 97.297 97.297 0.000e+00
HEMCO 37.978 37.978 0.000e+00
All chemistry 10.909 10.909 0.000e+00
=> Gas-phase chem 7.007 7.007 0.000e+00
=> Photolysis 1.034 1.034 0.000e+00
=> Aerosol chem 2.018 2.018 0.000e+00
=> Linearized chem 0.148 0.148 0.000e+00
Transport 2.218 2.218 0.000e+00
Convection 3.283 3.283 0.000e+00
Boundary layer mixing 2.980 2.980 0.000e+00
Dry deposition 0.069 0.069 0.000e+00
Wet deposition 0.994 0.994 0.000e+00
Diagnostics 9.111 9.111 0.000e+00
Unit conversions 1.886 1.886 0.000e+00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% GCHP Benchmark Timing Information
%%%
%%% Ref = gchp-14.3.0-rc.0
%%% Dev = gchp-14.3.0-rc.0
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
GCHPchem Timer Ref [s] Dev [s] % Diff
-------------------------------------------------------------------------------
GCHPchem 47.400 47.400 0.000e+00
--SetService 0.290 0.290 0.000e+00
----generic 0.000 0.000 nan
--Initialize 32.550 32.550 0.000e+00
----INITIALIZE 32.550 32.550 0.000e+00
------generic 14.270 14.270 0.000e+00
--Record 0.000 0.000 nan
----generic 0.000 0.000 nan
--Run 8.860 8.860 0.000e+00
----GenRunMine 8.860 8.860 0.000e+00
------RUN 8.760 8.760 0.000e+00
--------CP_BFRE 0.000 0.000 nan
--------DO_CHEM 8.730 8.730 0.000e+00
----------GC_CONV 0.510 0.510 0.000e+00
----------GC_DRYDEP 0.010 0.010 0.000e+00
----------GC_EMIS 0.350 0.350 0.000e+00
----------GC_TURB 0.530 0.530 0.000e+00
----------GC_CHEM 6.060 6.060 0.000e+00
----------GC_WETDEP 0.580 0.580 0.000e+00
----------GC_DIAGN 0.040 0.040 0.000e+00
--------CP_AFTR 0.000 0.000 nan
--Finalize 3.290 3.290 0.000e+00
----generic 3.280 3.280 0.000e+00
Summary Ref [s] Dev [s] % Diff
-------------------------------------------------------------------------------
All 234.247 234.247 0.000e+00
--SetService 1.359 1.359 0.000e+00
----GCHP 1.335 1.335 0.000e+00
------GCHPctmEnv 0.000 0.000 nan
------GCHPchem 0.292 0.292 0.000e+00
------DYNAMICS 0.982 0.982 0.000e+00
----HIST 0.000 0.000 nan
----EXTDATA 0.000 0.000 nan
--Initialize 108.330 108.330 0.000e+00
----GCHP 70.908 70.908 0.000e+00
------GCHPctmEnv 0.013 0.013 0.000e+00
------GCHPchem 64.561 64.561 0.000e+00
------DYNAMICS 4.592 4.592 0.000e+00
----HIST 7.443 7.443 0.000e+00
----EXTDATA 28.791 28.791 0.000e+00
--Run 121.038 121.038 0.000e+00
----EXTDATA 83.261 83.261 0.000e+00
----GCHP 21.975 21.975 0.000e+00
------GCHPctmEnv 0.063 0.063 0.000e+00
------GCHPchem 14.193 14.193 0.000e+00
------DYNAMICS 7.717 7.717 0.000e+00
----HIST 4.342 4.342 0.000e+00
--Finalize 3.399 3.399 0.000e+00
----GCHP 3.374 3.374 0.000e+00
------GCHPctmEnv 0.009 0.009 0.000e+00
------GCHPchem 3.348 3.348 0.000e+00
------DYNAMICS 0.009 0.009 0.000e+00
----HIST 0.011 0.011 0.000e+00
----EXTDATA 0.011 0.011 0.000e+00 |
gcpy/benchmark/modules/benchmark_scrape_gcclassic_timers.py - Set timers that did not run to np.nan before trying to parse timing information gcpy/benchmark/modules/benchmark_scrape_gchp_timers.py - Fixed typo: "GCHP Classic" -> "GCHP" Signed-off-by: Bob Yantosca <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These updates look good to merge. I just have one question about the assumed log file name.
@@ -47,13 +47,17 @@ data: | |||
dir: GCC_ref | |||
outputs_subdir: OutputDir | |||
restarts_subdir: Restarts | |||
logs_subdir: . | |||
logs_template: "log.%Y%m%d" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to assume a specific log file name (which obviously the user can edit). Has the analogous change to the log file name been made to the sample run scripts in the geos-chem repository?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @msulprizio. I could name the log file here GC.log
to correspond to the 1-month benchmark default file name. This file isn't used as much since the 1-month benchmarks are done on the cloud (but sometimes we might need to run one manually).
gcpy/benchmark/config/1mo_benchmark.yml - Renamed "log.%Y%m%d" to "GC.log", which is the same name used in the geoschem.benchmark.run for GEOS-Chem Classic benchmarks. - NOTE: This YAML file is only needed when we run 1-month benchmarks manually. The automatic cloud benchmarks use the template files in gcpy/benchmark/cloud. Signed-off-by: Bob Yantosca <[email protected]>
Name and Institution (Required)
Name: Bob Yantosca
Institution: Harvard + GCST
Confirm you have reviewed the following documentation
Describe the update
This PR adds 2 scripts to scrape the GEOS-Chem timers information from benchmark logs:
gcpy/benchmark/modules/benchmark_scrape_gcclassic_timers.py
gcpy/benchmark/modules/benchmark_scrape_gchp_timers.py
The scripts can take as input one or more log files. If more than one log file is supplied, the scripts will compute the total timing information. This will allow us to include timing tables in the standard benchmark outputs.
Here is an example of what the output looks like, using log files taken from integration test output.
GEOS-Chem Classic
GCHP
Expected changes
This is a zero-diff output that just adds new functionality.
Related Github Issue(s)
Tagging @msulprizio @lizziel