Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Now allow specification of the number of cores for benchmark plotting jobs #246

Merged
merged 7 commits into from
Aug 30, 2023

Conversation

yantosca
Copy link
Contributor

@yantosca yantosca commented Aug 2, 2023

Description

This PR does the following:

  1. Adds the options.n_cores YAML tag to benchmark YAML configuration files, in the benchmark/config and benchmark/cloud folders, e.g.:
#
# options: Customizes the benchmark plot output.
#
options:
  #
  # bmk_type: Specifies the type of benchmark.
  #
  bmk_type: FullChemBenchmark
  #
  # comparisons: Specifies the comparisons to perform.
  #
  comparisons:
  ... etc not shown ...
  #
  # outputs: Specifies the plots and tables to generate.
  #
  outputs:
  ... etc not shown ...
  #
  # n_cores: Specify the number of cores to use:
  # -1: Use as many cores as possible
  #  N: Use N cores
  #  1: Disable parallelization (use a single core)
  #
  n_cores: -1
  1. Reads options.n_cores into the config["options"]["n_cores"] dict entry.

  2. Passes n_job=config["options"]["n_cores"] to the relevant benchmark plotting routines in the following modules:

  • benchmark/run_benchmark.py
  • benchmark/modules/run_1yr_fullchem_benchmark,py
  • benchmark/modules/run_1yr_tt_benchmark.py

Related issues:

benchmark/cloud/template.1hr_benchmark.yml
benchmark/cloud/template.1mo_benchmark.yml
benchmark/config/1mo_benchmark.yml
benchmark/config/1yr_ch4_benchmark.yml
benchmark/config/1yr_fullchem_benchmark.yml
benchmark/config/1yr_tt_benchmark.yml
- Update comments
- Add n_cores: -1 which will by default turn on parallelization
  and use as many cores as possible.

Signed-off-by: Bob Yantosca <[email protected]>
benchmark_modules/run_1yr_fullchem_benchmark.py
benchmark_modules/run_1yr_tt_benchmark.py
benchmark/run.benchmark.py
- Read the number of cores to use in plot jobs from the YAML config
  files into the config["options"]["n_cores"] dict entry.
- Pass "n_job=config["options"]["n_cores"] into plotting routines
  so that the number of cores to use in parallel can be specified
  (or to disable parallelization completely)

CHANGELOG.md
- Updated accordingly

Signed-off-by: Bob Yantosca <[email protected]>
@yantosca yantosca added category: Feature Request New feature or request topic: Benchmark Plots and Tables Issues pertaining to generating plots/tables from benchmark output labels Aug 2, 2023
@yantosca yantosca added this to the 1.4.0 milestone Aug 2, 2023
@yantosca yantosca requested a review from msulprizio August 2, 2023 15:47
@yantosca yantosca self-assigned this Aug 2, 2023
benchmark/modules/run_1yr_tt_benchmark.py:
- Added missing comments after the cmpres argument in a few calls
  to plotting code.

Signed-off-by: Bob Yantosca <[email protected]>
benchmark/cloud/template.1hr_benchmark.yml
benchmark/cloud/template.1mo_benchmark.yml
benchmark/config/1mo_benchmark.yml
benchmark/config/1yr_ch4_benchmark.yml
benchmark/config/1yr_fullchem_benchmark.yml
benchmark/config/1yr_tt_benchmark.yml
- Fix comments, now state that n_cores = -1 will use all available cores,
  ncores = -N will use N cores (it must be negative), and ncores=1
  will disable parallelization
- Indent comments to line up with YAML tags where necessary

Signed-off-by: Bob Yantosca <[email protected]>
@msulprizio msulprizio requested a review from lizziel August 7, 2023 12:08
@yantosca
Copy link
Contributor Author

@laestrada and I have discovered that the joblib Parallel function can potentially mask errors in other areas of code. For this reason, I propose to rewrite the code so that we use an if block to execute without parallelization if n_job=1, or to call the joblib Parallel function otherwise. Stay tuned.

This merge brings updates from PR #251 (Update gcpy_env environment
to install with MambaForge and to use latest package versions; Update
scripts accordingly, by @yantosca) into the
feature/disable-parallelization branch.

This will facilitate merging into dev at a later time.

Signed-off-by: Bob Yantosca <[email protected]>

--
gcpy.benchmark.py
benchmark/modules/run_1yr_fullchem_benchmark.py
benchmark/modules/run_1yr_tt_benchmark.py
- Instead of relying on joblib.Parallel(n_jobs=1) to disable executing
  commands in parallel, we have added if blocks that execute non-parallel
  code when n_jobs=1.  This will help with debugging because error
  messages from upstream packages often get obfuscated by the
  joblib.Parallel() command.

Signed-off-by: Bob Yantosca <[email protected]>
@yantosca
Copy link
Contributor Author

We have now added if blocks around the calls to joblib.Parallel() so that separate, non-parallelized code will execute when n_job==1, such as shown in the example below:

        # ==================================================================
        # GCC vs GCC operations budgets tables
        # ==================================================================
        if config["options"]["outputs"]["ops_budget_table"]:
            print("\n%%% Creating GCC vs. GCC operations budget tables %%%")

            def gcc_vs_gcc_ops_budg(mon):
                """
                Create budget table for each benchmark month m in parallel
                """

                # Filepaths
                refpath = get_filepath(
                    gcc_vs_gcc_refdir,
                    "Budget",
                    bmk_mons_ref[mon]
                )
                devpath = get_filepath(
                    gcc_vs_gcc_devdir,
                    "Budget",
                    bmk_mons_dev[mon]
                )

                # Create tables
                bmk.make_benchmark_operations_budget(
                    config["data"]["ref"]["gcc"]["version"],
                    refpath,
                    config["data"]["dev"]["gcc"]["version"],
                    devpath,
                    sec_per_month_ref[mon],
                    sec_per_month_dev[mon],
                    benchmark_type=bmk_type,
                    label=f"at 01{bmk_mon_yr_strs_dev[mon]}",
                    dst=gcc_vs_gcc_tablesdir,
                )

            # Create tables in parallel
            # Turn off parallelization if n_jobs==1
            if n_jobs != 1:
                results = Parallel(n_jobs=config["options"]["n_cores"])(
                    delayed(gcc_vs_gcc_ops_budg)(mon) \
                    for mon in range(bmk_n_months)
                )
            else:
                for mon in range(bmk_n_months):
                    results = gcc_vs_gcc_ops_budg(mon)

@laestrada and I have found that joblib.Parallel() often obfuscates error messages from other Python packages. For the sake of easy debugging, we no longer rely on joblib.Parallel(n_jobs=1) to disable parallelization.

@yantosca
Copy link
Contributor Author

This PR should be merged before PR #250.

@yantosca yantosca removed the request for review from lizziel August 17, 2023 21:22
benchmark/cloud/template,1hr_benchmark.yml
benchmark/cloud/template.1mo_benchmark.yml
benchmark/config/1yr_ch4_benchmark.yml
benchmark/config/1yr_fullchem_benchmark.yml
benchmark/config/1yr_tt_benchmark.yml
- Update the comments about n_cores to more accurately reflect
  its functionality.
    -1 uses all $OMP_NUM_THREADS cores,
    -N uses $OMP_NUM_THREADS - (N-1) cores
     1 disables parallelization entirely.

Signed-off-by: Bob Yantosca <[email protected]>
@yantosca yantosca requested a review from msulprizio August 30, 2023 15:51
@yantosca yantosca merged commit 0ec45b6 into dev Aug 30, 2023
@yantosca yantosca deleted the feature/disable-parallelization branch August 30, 2023 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Feature Request New feature or request topic: Benchmark Plots and Tables Issues pertaining to generating plots/tables from benchmark output
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants