Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmark scripts to scrape timing information from log files #319

Merged
merged 11 commits into from
May 13, 2024

Conversation

yantosca
Copy link
Contributor

@yantosca yantosca commented May 3, 2024

Name and Institution (Required)

Name: Bob Yantosca
Institution: Harvard + GCST

Confirm you have reviewed the following documentation

Describe the update

This PR adds 2 scripts to scrape the GEOS-Chem timers information from benchmark logs:

  • gcpy/benchmark/modules/benchmark_scrape_gcclassic_timers.py
  • gcpy/benchmark/modules/benchmark_scrape_gchp_timers.py

The scripts can take as input one or more log files. If more than one log file is supplied, the scripts will compute the total timing information. This will allow us to include timing tables in the standard benchmark outputs.

Here is an example of what the output looks like, using log files taken from integration test output.

GEOS-Chem Classic

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% GEOS-Chem Classic Benchmark Timing Information
%%%
%%% Ref = GCC_14.4.0_json
%%% Dev = GCC_14.4.0_log
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


Timer                                   Ref [s]               Dev [s]
-------------------------------------------------------------------------------
GEOS-Chem                               389.250               194.625
HEMCO                                   109.376                54.688
All chemistry                            60.750                30.375
=> Gas-phase chem                        39.000                19.500
=> Photolysis                            10.500                 5.250
=> Aerosol chem                           8.500                 4.250
=> Linearized chem                        0.624                 0.312
Transport                                19.124                 9.562
Convection                                5.376                 2.688
Boundary layer mixing                    12.250                 6.125
Dry deposition                            0.376                 0.188
Wet deposition                            4.376                 2.188
Diagnostics                              23.624                11.812
Unit conversions                          2.000                 1.000

GCHP

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% GCHP Classic Benchmark Timing Information
%%%
%%% Ref = GCHP_14.4.0_list_input
%%% Dev = GCHP_14.4.0_str_input
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


Timer                                   Ref [s]               Dev [s]
-------------------------------------------------------------------------------
All                                     522.512               261.256
--SetService                              1.310                 0.655
----GCHP                                  1.286                 0.643
------GCHPctmEnv                          0.016                 0.008
------GCHPchem                            0.776                 0.388
------DYNAMICS                            0.146                 0.073
--Initialize                            234.744               117.372
----GCHP                                195.176                97.588
------GCHPctmEnv                          0.016                 0.008
------DYNAMICS                            0.014                 0.007
----EXTDATA                              33.394                16.697
----HIST                                  5.370                 2.685
--Run                                   281.364               140.682
----GCHP                                115.162                57.581
------GCHPctmEnv                          0.382                 0.191
------GCHPchem                           83.698                41.849
------DYNAMICS                           31.078                15.539
----EXTDATA                              87.364                43.682
----HIST                                  6.686                 3.343
--Finalize                                4.900                 2.450
----GCHP                                  4.870                 2.435
------GCHPctmEnv                          0.006                 0.003
------GCHPchem                            4.858                 2.429
------DYNAMICS                            0.004                 0.002
----EXTDATA                               0.004                 0.002
----HIST                                  0.008                 0.004

Expected changes

This is a zero-diff output that just adds new functionality.

Related Github Issue(s)

Tagging @msulprizio @lizziel

gcpy/benchmark/modules/benchmark_scrape_gcclassic_timers.py
- Added this script to scrape benchmark timers into a table
  with to display timing info from Ref & Dev models.

NOTE: More work will be needed to refine this in subsequent commits.

Signed-off-by: Bob Yantosca <[email protected]>
gcpy/util.py
- Added function "replace_whitespace", which replaces whitespace in
  a string with another character (default is "_").  This will be used
  to make sure the benchmark filenames and version labels do not have
  spaces.

CHANGELOG.md
- Updated accordingly

Signed-off-by: Bob Yantosca <[email protected]>
gcpy/benchmark/modules/benchmark_scrape_gcclassic_timers.py
- Benchmark module that can scrape the timers information from either
  gcclassic_timers.json or GEOS-Chem Classic log files.  If multiple
  files are supplied as input, the timers information will be summed
  together.

CHANGELOG.md
- Updated accordingly

Signed-off-by: Bob Yantosca <[email protected]>
gcpy/benchmark/modules/benchmark_scrape_gchp_timers.py
- Script that scrapes the timing information at the end of the
  GCHP log file

gcpy/benchmark/modules/benchmark_scrape_gcclassic_timers.py
- Updated docstring comments

CHANGELOG.md
- Updated accordingly

Signed-off-by: Bob Yantosca <[email protected]>
@yantosca yantosca added category: Feature Request New feature or request topic: Benchmark Plots and Tables Issues pertaining to generating plots/tables from benchmark output labels May 3, 2024
@yantosca yantosca added this to the 1.5.0 milestone May 3, 2024
@yantosca yantosca self-assigned this May 3, 2024
@yantosca
Copy link
Contributor Author

yantosca commented May 6, 2024

@lizziel @msulprizio: I've now modified this so that we scrape the GCHPchem and summary timers:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% GCHP Classic Benchmark Timing Information
%%%
%%% Ref = GCHP_14.4.0_list_input
%%% Dev = GCHP_14.4.0_str_input
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


GCHPchem Timer                          Ref [s]               Dev [s]
-------------------------------------------------------------------------------
GCHPchem                                250.080               125.040
--SetService                              0.760                 0.380
----generic                               0.000                 0.000
--Initialize                            169.860                84.930
----INITIALIZE                          169.860                84.930
------generic                            10.020                 5.010
--Record                                  0.000                 0.000
----generic                               0.000                 0.000
--Run                                    73.180                36.590
----GenRunMine                           73.180                36.590
------RUN                                72.820                36.410
--------CP_BFRE                           0.020                 0.010
--------DO_CHEM                          72.660                36.330
----------GC_CONV                         3.500                 1.750
----------GC_DRYDEP                       0.080                 0.040
----------GC_EMIS                         1.900                 0.950
----------GC_TURB                         9.260                 4.630
----------GC_CHEM                        48.340                24.170
----------GC_WETDEP                       3.700                 1.850
----------GC_DIAGN                        1.440                 0.720
--------CP_AFTR                           0.000                 0.000
--Finalize                                4.000                 2.000
----generic                               4.000                 2.000


Summary                                 Ref [s]               Dev [s]
-------------------------------------------------------------------------------
All                                     522.512               261.256
--SetService                              1.310                 0.655
----GCHP                                  1.286                 0.643
------GCHPctmEnv                          0.016                 0.008
------GCHPchem                            0.776                 0.388
------DYNAMICS                            0.146                 0.073
----HIST                                  0.000                 0.000
----EXTDATA                               0.000                 0.000
--Initialize                            234.744               117.372
----GCHP                                195.176                97.588
------GCHPctmEnv                          0.016                 0.008
------GCHPchem                          194.014                97.007
------DYNAMICS                            0.014                 0.007
----HIST                                  5.370                 2.685
----EXTDATA                              33.394                16.697
--Run                                   281.364               140.682
----EXTDATA                              87.364                43.682
----GCHP                                115.162                57.581
------GCHPctmEnv                          0.382                 0.191
------GCHPchem                           83.698                41.849
------DYNAMICS                           31.078                15.539
----HIST                                  6.686                 3.343
--Finalize                                4.900                 2.450
----GCHP                                  4.870                 2.435
------GCHPctmEnv                          0.006                 0.003
------GCHPchem                            4.858                 2.429
------DYNAMICS                            0.004                 0.002
----HIST                                  0.008                 0.004
----EXTDATA                               0.004                 0.002

yantosca added 5 commits May 6, 2024 14:05
gcpy/benchmark/modules/benchmark_scrape_gchp_timers.py
- Modified to scrape the GCHPchem timers as well as the
  summary timers.  This involves:
  - Reading the file forwards (instead of backwards)
  - Using "." as the delimiter in the flattened dictionary
  - Rewriting the algorithm to parse timer lines from the log file
  - Now loop over dictionary keys in print_timers function

Signed-off-by: Bob Yantosca <[email protected]>
gcpy/benchmark/modules/benchmark_scrape_gcclassic_timers.py
- Rename driver program to "make_benchmark_gcclassic_timing_table"
- Remove if __name__ == "__main__": block

gcpy/benchmark/modules/benchmark_scrape_gchp_timers,py
- Rename driver program to "make_benchmark_gchp_timing_table"
- Add an error check to exit after the last summary timer is found
  (this only affects GCHP log files from AWS cloud benchmarks)
- Remove if __name__ == "__main__": block

gcpy/benchmark/modules/benchmark_utils.py
- Added gcc_vs_gcc_dirs, gchp_vs_gcc_dirs, gchp_vs_gcc_dirs, and
  get_log_filenames to abstract repetitive code out of the
  1-year benchmark scripts

gcpy/benchmark/modules/run_1yr_fullchem_benchmark.py
- Call make_benchmark_gcclassic_timing_table to produce the
  GCC vs. GCC timing table
- Call make_benchmark_gchp_timing_table to produce the
  GCHP vs. GCHP timing table

CHANGELOG.md
- Updated accordingly

TODO: Add a "% diff" column to the timing table output
gcpy/benchmark/modules/benchmark_scrape_gcclassic_timers.py
gcpy/benchmark/modules/benchmark_scrape_gchp_timers.py
- In function print_timer:
  - Compute % difference between ref & dev (or assign NaN if
    it would result in a div-by-zero)
  - Decrease width of timer name column from 25 to 22 spaces
  - Change format of Ref & Dev columns from 20.3f to 18.3f
  - Add percent diff column as 12.3e format (right-aligned)
- In function display_timers:
  - Change width of column headers accordingly

Signed-off-by: Bob Yantosca <[email protected]>
gcpy/benchmark/cloud/template.1hr_benchmark.yml
gcpy/benchmark/cloud/template.1mo_benchmark.yml
gcpy/benchmark/config/1mo_benchmark.yml
gcpy/benchmark/config/1yr_fullchem_benchmark.yml
gcpy/benchmark/config/1yr_tt_benchmark.yml
- Add "logs_subdir" and "logs_template" tags to GCC/GCHP Ref & Dev
- Add "timing_table" to "outputs" section

CHANGELOG.md
- Updated accordingly

gcpy/benchmark/run_benchmark.py
gcpy/benchmark/modules/run_1yr_fullchem_benchmark.py
gcpy/benchmark/modules/run_1yr_tt_benchmark.py
- Import & call make_benchmark_gcclassic_scrape_timers to create
  GCClassic vs. GCClassic timing information table
 Import & call make_benchmark_gchp_scrape_timers to create
  GCHP vs. GCHP timing information table

Signed-off-by: Bob Yantosca <[email protected]>
gcpy/benchmark/cloud/template.1hr_benchmark.yml
gcpy/benchmark/cloud/template.1mo_benchmark.yml
- Change logs_subdir YAML tag to ".", since the log file is placed
  in the run directory but not in the OutputDir subdirectory

gcpy/benchmark/modules/benchmark_utils.py
- Place the "timestamps" argument in a list if there is only one
  timestamp before trying to iterate over it with a for loop.

gcpy/benchmark/run_benchmark.py
- We had mistakely placed the GCHP timing table in the GCHP vs. GCC
  section.  Move this to the GCHP vs. GCHP section.
@yantosca
Copy link
Contributor Author

yantosca commented May 10, 2024

@msulprizio @lizziel: I have added a % diff column to the timing output:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% GEOS-Chem Classic Benchmark Timing Information
%%%
%%% Ref = gcc-14.3.0-rc.0
%%% Dev = gcc-14.3.0-rc.0
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


Timer                              Ref [s]             Dev [s]         % Diff
-------------------------------------------------------------------------------
GEOS-Chem                           97.297              97.297      0.000e+00
HEMCO                               37.978              37.978      0.000e+00
All chemistry                       10.909              10.909      0.000e+00
=> Gas-phase chem                    7.007               7.007      0.000e+00
=> Photolysis                        1.034               1.034      0.000e+00
=> Aerosol chem                      2.018               2.018      0.000e+00
=> Linearized chem                   0.148               0.148      0.000e+00
Transport                            2.218               2.218      0.000e+00
Convection                           3.283               3.283      0.000e+00
Boundary layer mixing                2.980               2.980      0.000e+00
Dry deposition                       0.069               0.069      0.000e+00
Wet deposition                       0.994               0.994      0.000e+00
Diagnostics                          9.111               9.111      0.000e+00
Unit conversions                     1.886               1.886      0.000e+00

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% GCHP Benchmark Timing Information
%%%
%%% Ref = gchp-14.3.0-rc.0
%%% Dev = gchp-14.3.0-rc.0
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


GCHPchem Timer                     Ref [s]             Dev [s]         % Diff
-------------------------------------------------------------------------------
GCHPchem                            47.400              47.400      0.000e+00
--SetService                         0.290               0.290      0.000e+00
----generic                          0.000               0.000            nan
--Initialize                        32.550              32.550      0.000e+00
----INITIALIZE                      32.550              32.550      0.000e+00
------generic                       14.270              14.270      0.000e+00
--Record                             0.000               0.000            nan
----generic                          0.000               0.000            nan
--Run                                8.860               8.860      0.000e+00
----GenRunMine                       8.860               8.860      0.000e+00
------RUN                            8.760               8.760      0.000e+00
--------CP_BFRE                      0.000               0.000            nan
--------DO_CHEM                      8.730               8.730      0.000e+00
----------GC_CONV                    0.510               0.510      0.000e+00
----------GC_DRYDEP                  0.010               0.010      0.000e+00
----------GC_EMIS                    0.350               0.350      0.000e+00
----------GC_TURB                    0.530               0.530      0.000e+00
----------GC_CHEM                    6.060               6.060      0.000e+00
----------GC_WETDEP                  0.580               0.580      0.000e+00
----------GC_DIAGN                   0.040               0.040      0.000e+00
--------CP_AFTR                      0.000               0.000            nan
--Finalize                           3.290               3.290      0.000e+00
----generic                          3.280               3.280      0.000e+00


Summary                            Ref [s]             Dev [s]         % Diff
-------------------------------------------------------------------------------
All                                234.247             234.247      0.000e+00
--SetService                         1.359               1.359      0.000e+00
----GCHP                             1.335               1.335      0.000e+00
------GCHPctmEnv                     0.000               0.000            nan
------GCHPchem                       0.292               0.292      0.000e+00
------DYNAMICS                       0.982               0.982      0.000e+00
----HIST                             0.000               0.000            nan
----EXTDATA                          0.000               0.000            nan
--Initialize                       108.330             108.330      0.000e+00
----GCHP                            70.908              70.908      0.000e+00
------GCHPctmEnv                     0.013               0.013      0.000e+00
------GCHPchem                      64.561              64.561      0.000e+00
------DYNAMICS                       4.592               4.592      0.000e+00
----HIST                             7.443               7.443      0.000e+00
----EXTDATA                         28.791              28.791      0.000e+00
--Run                              121.038             121.038      0.000e+00
----EXTDATA                         83.261              83.261      0.000e+00
----GCHP                            21.975              21.975      0.000e+00
------GCHPctmEnv                     0.063               0.063      0.000e+00
------GCHPchem                      14.193              14.193      0.000e+00
------DYNAMICS                       7.717               7.717      0.000e+00
----HIST                             4.342               4.342      0.000e+00
--Finalize                           3.399               3.399      0.000e+00
----GCHP                             3.374               3.374      0.000e+00
------GCHPctmEnv                     0.009               0.009      0.000e+00
------GCHPchem                       3.348               3.348      0.000e+00
------DYNAMICS                       0.009               0.009      0.000e+00
----HIST                             0.011               0.011      0.000e+00
----EXTDATA                          0.011               0.011      0.000e+00

gcpy/benchmark/modules/benchmark_scrape_gcclassic_timers.py
- Set timers that did not run to np.nan before trying to parse
  timing information

gcpy/benchmark/modules/benchmark_scrape_gchp_timers.py
- Fixed typo: "GCHP Classic" -> "GCHP"

Signed-off-by: Bob Yantosca <[email protected]>
@yantosca yantosca marked this pull request as ready for review May 10, 2024 20:03
@yantosca yantosca requested review from msulprizio and lizziel May 10, 2024 20:03
Copy link
Contributor

@msulprizio msulprizio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These updates look good to merge. I just have one question about the assumed log file name.

@@ -47,13 +47,17 @@ data:
dir: GCC_ref
outputs_subdir: OutputDir
restarts_subdir: Restarts
logs_subdir: .
logs_template: "log.%Y%m%d"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to assume a specific log file name (which obviously the user can edit). Has the analogous change to the log file name been made to the sample run scripts in the geos-chem repository?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @msulprizio. I could name the log file here GC.log to correspond to the 1-month benchmark default file name. This file isn't used as much since the 1-month benchmarks are done on the cloud (but sometimes we might need to run one manually).

gcpy/benchmark/config/1mo_benchmark.yml
- Renamed "log.%Y%m%d" to "GC.log", which is the same name used in
  the geoschem.benchmark.run for GEOS-Chem Classic benchmarks.
- NOTE: This YAML file is only needed when we run 1-month benchmarks
  manually.  The automatic cloud benchmarks use the template
  files in gcpy/benchmark/cloud.

Signed-off-by: Bob Yantosca <[email protected]>
@yantosca yantosca merged commit 4be43b5 into dev May 13, 2024
@yantosca yantosca deleted the feature/scrape-benchmark-timers branch May 13, 2024 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Feature Request New feature or request topic: Benchmark Plots and Tables Issues pertaining to generating plots/tables from benchmark output
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: Add a script to scrape timing info from benchmark simulation log files
2 participants