From dc64199329be4dce46b24339b2872a4213782a8d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michael=20J=C3=A4hn?= Date: Thu, 16 May 2024 11:35:45 +0200 Subject: [PATCH 01/17] Remove header --- create_scaling_table_per_exp.py | 13 ------------- 1 file changed, 13 deletions(-) diff --git a/create_scaling_table_per_exp.py b/create_scaling_table_per_exp.py index bc22906..dae00c6 100755 --- a/create_scaling_table_per_exp.py +++ b/create_scaling_table_per_exp.py @@ -1,17 +1,4 @@ #!/usr/bin/python -# -# Script to parse all the slurm or echam6.log files for one experiment (runned on different number of nodes) -# to extract the wallclock time. It creates a table containing wallclock time and associated scaling data -# (Efficiency, Speed-up, NH,...). -# -# -#Example : create_scaling_table_per_exp.py -e my_exp -m icon -y 1 -# -# C. Siegenthaler (C2SM) , July 2015 -# C. Siegenthaler (C2SM) : adaptation for ICON, November 2017 -# C. Siegenthaler (C2SM) : modifications, December 2019 -# -############################################################################################ import numpy as np import os From 935beb37aae5f878a5888fc0b14ca1e94baeb2c6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michael=20J=C3=A4hn?= Date: Thu, 16 May 2024 11:47:34 +0200 Subject: [PATCH 02/17] Remove ECHAM info in README --- README.md | 69 +++++-------------------------------------------------- 1 file changed, 6 insertions(+), 63 deletions(-) diff --git a/README.md b/README.md index 2c78b43..80b27dc 100644 --- a/README.md +++ b/README.md @@ -1,31 +1,12 @@ -# Toolset to perform scaling analysis of ICON(-HAM), ECHAM(-HAM) and MPI-ESM(-HAM) +# Toolset to perform scaling analysis of ICON It has been tested on Piz Daint (CSCS) to produce the technical part of production projects at CSCS. -On Euler (ETHZ) only limited functionality is provided for the analysis of Icon. + +On Euler (ETHZ), only limited functionality is provided for the analysis of Icon. See [Limitations on Euler](#limitations-on-euler) for more information. Below is a description of each script and a recipe. -- Original devleopment: Colombe Siegenthaler (2020-01) -- Maintainted by Michael Jähn from 2021-03 on - -## Table of contents - - [Recipe for scaling analysis with ECHAM/ICON-(HAM)](#recipe-for-scaling-analysis-with-echamicon-ham) - - [1. Configure and compile your model as usual.](#1-configure-and-compile-your-model-as-usual) - - [2. Prepare your running script](#2-prepare-your-running-script) - - [ICON](#icon) - - [ECHAM](#echam) - - [3. Create and launch different running scripts based on my_exp, but using different numbers of nodes.](#3-create-and-launch-different-running-scripts-based-on-my_exp-but-using-different-numbers-of-nodes) - - [ICON](#icon) - - [ECHAM](#echam) - - [4. When all the runs are finished, read all the slurm/log files to get the Wallclock for each run, and put them into a table:](#4-when-all-the-runs-are-finished-read-all-the-slurmlog-files-to-get-the-wallclock-for-each-run-and-put-them-into-a-table) - - [ICON](#icon) - - [ECHAM](#echam) - - [5. Create a summary plot and table of the variable you wish (Efficiency, NH, Wallclock) for different experiments with respect to the number of nodes.](#5-create-a-summary-plot-and-table-of-the-variable-you-wish-efficiency-nh-wallclock-for-different-experiments-with-respect-to-the-number-of-nodes) - - [Limitations on Euler](#limitations-on-euler) - -## Recipe for scaling analysis with ECHAM/ICON-(HAM) - ### 1. Configure and compile your model as usual. ### 2. Prepare your running script @@ -39,25 +20,13 @@ $ conda env create -f environment.yaml To load your environment, simply type: ```console -$ conda env create -f environment.yaml +$ conda activate scaling_analysis ``` - -#### ICON - -Prepare your machine-independent setting file "my_exp" (e.g. exp.atm_amip, without the '.run'). - -#### ECHAM -Prepare your setting file as usual with the jobscript toolkit: - -```console -$ prepare_run -r [path_to_your_setting_folder] my_exp -``` +Prepare your machine-independent setting file `my_exp` (e.g. `exp.atm_amip`, without the `'.run`'). ### 3. Create and launch different running scripts based on my_exp, but using different numbers of nodes. -#### ICON - Use `send_several_run_ncpus_perf_ICON.py`. For example for running `my_exp` on 1, 10, 12 and 16 nodes: @@ -81,22 +50,8 @@ The script `send_analyse_different_exp_at_once_ICON.py` (n_step = 1) is a wrappe The script `send_analyse_different_exp_at_once_ICON.py` (n_step = 2) is a wrapper which gets the wallclocks from the log files for different experiments (for example different set-ups, or compilers) (point 4 of this README). -#### ECHAM - -Use `send_several_run_ncpus_perf.py` which creates and sends several running scripts using the option -o of the jobsubm_echam script. -For example, sending the my_exp run on 1, 10, 12 and 15 nodes: - -```console -$ python [path_to_scaling_analysis_tool]/send_several_run_ncpus_perf.py -b [path_to_echam-ham_folder]/my_experiments/my_exp -n 1 10 12 15 -``` - -With the command above, 4 running folders will be created based on the running folder `my_exp` -(`my_exp_cpus12`, `my_exp_cpus120`, `my_exp_cpus144and my_exp_cpus180`), and each of them will be launched. - ### 4. When all the runs are finished, read all the slurm/log files to get the Wallclock for each run, and put them into a table: -#### ICON - Use the option `-m icon`: ```console @@ -105,16 +60,6 @@ $ python [path_to_scaling_analysis_tool]/create_scaling_table_per_exp.py -e my_e or for different experiments at once: `send_analyse_different_exp_at_once_ICON.py` (n_step = 2) (cf point 3) -#### ECHAM - -Use the option `-m icon` - -```console -$ python [path_to_scaling_analysis_tool]/create_scaling_table_per_exp.py -e my_exp -m echam-ham -``` - -For both model types, this creates a table `my_exp.csv`, which contains the wallclock, efficiency and NH for each run. - ### 5. Create a summary plot and table of the variable you wish (Efficiency, NH, Wallclock) for different experiments with respect to the number of nodes. If needed, you can define the line properties of each experiment in `def_exps_plot.py`. @@ -126,9 +71,7 @@ $ python [path_to_scaling_analysis_tool]/plot_perfs.py ## Limitations on Euler * The scaling analysis tools were tested for Icon only. -* Because of differing nodes-architectures on Euler, the number of nodes passed via the -n option -corresponds to the number of Euler-cores. -* Parsing the logfiles only works using the --no_sys_report option. +* Because of differing nodes-architectures on Euler, the number of nodes passed via the -n option corresponds to the number of Euler-cores. * In order to have nice plots, the number of Euler-cores needs to be divided by 12. * Automatic runtime-specification is not as smooth as on Daint -> a minimum of 20 min is requested in any case. From 09f46b753871f75fdd2ce53335abc954a2858e7c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michael=20J=C3=A4hn?= Date: Thu, 16 May 2024 11:55:35 +0200 Subject: [PATCH 03/17] Remove no_sys_report option --- create_scaling_table_per_exp.py | 162 ++++++++------------------------ 1 file changed, 38 insertions(+), 124 deletions(-) diff --git a/create_scaling_table_per_exp.py b/create_scaling_table_per_exp.py index dae00c6..7aa06a3 100755 --- a/create_scaling_table_per_exp.py +++ b/create_scaling_table_per_exp.py @@ -52,9 +52,6 @@ type = int,\ help = 'factor to multiply for getting NH per year') - parser.add_argument('--no_sys_report', action='store_true',\ - help = 'no time report provided by the system, per default, the wallclock will be taken from this report. If this option enabled, the wallclock will computed in a different way') - parser.add_argument('--no_x', action='store_false',\ help = 'some model logs have a "set -x" in the first line, therefore the "Script run successfully: OK" string is contained twice in the logfile. Passing this argument assumes NO "set -x" set.') @@ -257,67 +254,6 @@ def get_date_from_echam_slurm_file(filename): return date_run - def get_wallclock_Nnodes_gen_daint(filename, - string_sys_report="Elapsed", - use_timer_report=False): - # Find report - summary_in_file = grep(string_sys_report, filename) - if summary_in_file['success']: - summary_line = summary_in_file["line"][0] - summary_iline = summary_in_file["iline"][0] - - f = open(filename) - lines = f.readlines() - - line_labels = [s.strip() for s in summary_line.split()] - ind_start = line_labels.index('Start') - ind_end = line_labels.index('End') - - # For summary_iline + x had to be subtracted by one - line_time = [ - lines[summary_iline + 2].split()[i] - for i in [ind_start, ind_end] - ] - time_arr = [ - datetime.datetime.strptime(s.strip(), '%Y-%m-%dT%H:%M:%S') - for s in line_time - ] - - if use_timer_report: - string_timer_report = '# calls' - timer_in_file = grep(string_timer_report, filename) - if timer_in_file['success']: - timer_line = timer_in_file["line"][0] - timer_iline = timer_in_file["iline"][0] - string_timer_firstrow = 'total ' - first_row = grep(string_timer_firstrow, filename) - first_row_line = first_row["line"][0] - first_row_iline = first_row["iline"][0] - time_str = lines[first_row_iline].split()[-1] - wallclock = datetime.timedelta(seconds=float(time_str)) - else: - wallclock, nnodes, time_arr = set_default_error_slurm_file( - "Warning : Timer output report is not present or the word {} is not found" - .format(filename, string_timer_report)) - else: - # find index of "start" and "end" in the report line - wallclock = time_arr[-1] - time_arr[0] - - # Nnodes - line_labels_n = [ - s.strip() for s in lines[summary_iline + 7].split() - ] - ind_nodes = line_labels_n.index('NNodes') - nodes = int(lines[summary_iline + 9].split()[ind_nodes]) - - f.close() - else: - wallclock, nnodes, time_arr = set_default_error_slurm_file( - "Warning : Batch summary report is not present or the word {} is not found" - .format(filename, string_sys_report)) - - return {"n": nodes, "wc": wallclock, "st": time_arr[0]} - # security. If not file found, exit if len(slurm_files) == 0: print("No slurm file founded with this basis name") @@ -332,24 +268,17 @@ def get_wallclock_Nnodes_gen_daint(filename, if args.mod.upper() == "ICON": if check_icon_finished(filename) or args.ignore_errors: # get # nodes and wallclock - if args.no_sys_report: - # infer nnodes from MPI-procs in ICON output - nodes_line = grep( - "mo_mpi::start_mpi ICON: Globally run on", - filename)["line"][0] - nnodes = int(nodes_line.split(' ')[6]) - - nnodes = nnodes // args.cpu_per_node - - wallclock = get_wallclock_icon( - filename, args.no_x)["wc"].total_seconds() - date_run = get_wallclock_icon(filename, args.no_x)["st"] - else: - n_wc_st = get_wallclock_Nnodes_gen_daint( - filename, use_timer_report=True) - nnodes = n_wc_st["n"] - wallclock = n_wc_st["wc"].total_seconds() - date_run = n_wc_st["st"] + # infer nnodes from MPI-procs in ICON output + nodes_line = grep( + "mo_mpi::start_mpi ICON: Globally run on", + filename)["line"][0] + nnodes = int(nodes_line.split(' ')[6]) + + nnodes = nnodes // args.cpu_per_node + + wallclock = get_wallclock_icon( + filename, args.no_x)["wc"].total_seconds() + date_run = get_wallclock_icon(filename, args.no_x)["st"] else: wallclock, nnodes, date_run = set_default_error_slurm_file( "Warning : Run did not finish properly") @@ -361,31 +290,24 @@ def get_wallclock_Nnodes_gen_daint(filename, if check_icon_finished(filename, success_message) or args.ignore_errors: # get # nodes and wallclock - if args.no_sys_report: - # infer nnodes from MPI-procs in ICON output - nodes_line = grep( - "mo_mpi::start_mpi ICON: Globally run on", - filename)["line"][0] - nnodes = int(nodes_line.split(' ')[6]) - - nnodes = nnodes // args.cpu_per_node - - wallclock = get_wallclock_icon( - filename, - args.no_x, - num_ok=1, - success_message=success_message)["wc"].total_seconds() - date_run = get_wallclock_icon( - filename, - args.no_x, - num_ok=1, - success_message=success_message)["st"] - else: - n_wc_st = get_wallclock_Nnodes_gen_daint( - filename, use_timer_report=True) - nnodes = n_wc_st["n"] - wallclock = n_wc_st["wc"].total_seconds() - date_run = n_wc_st["st"] + # infer nnodes from MPI-procs in ICON output + nodes_line = grep( + "mo_mpi::start_mpi ICON: Globally run on", + filename)["line"][0] + nnodes = int(nodes_line.split(' ')[6]) + + nnodes = nnodes // args.cpu_per_node + + wallclock = get_wallclock_icon( + filename, + args.no_x, + num_ok=1, + success_message=success_message)["wc"].total_seconds() + date_run = get_wallclock_icon( + filename, + args.no_x, + num_ok=1, + success_message=success_message)["st"] else: wallclock, nnodes, date_run = set_default_error_slurm_file( "Warning : Run did not finish properly") @@ -395,24 +317,17 @@ def get_wallclock_Nnodes_gen_daint(filename, print(jobnumber) if args.mod.upper() == "ICON-HAM": # get # nodes and wallclock - if args.no_sys_report: - # infer nnodes from MPI-procs in ICON output - nodes_line = grep("mo_mpi::start_mpi ICON: Globally run on", - filename)["line"][0] - nnodes = int(nodes_line.split(' ')[6]) + # infer nnodes from MPI-procs in ICON output + nodes_line = grep("mo_mpi::start_mpi ICON: Globally run on", + filename)["line"][0] + nnodes = int(nodes_line.split(' ')[6]) - nnodes = nnodes // args.cpu_per_node + nnodes = nnodes // args.cpu_per_node - wallclock = get_wallclock_icon(filename, args.no_x, - num_ok=0)["wc"].total_seconds() - date_run = get_wallclock_icon(filename, args.no_x, - num_ok=0)["st"] - else: - n_wc_st = get_wallclock_Nnodes_gen_daint(filename, - use_timer_report=True) - nnodes = n_wc_st["n"] - wallclock = n_wc_st["wc"].total_seconds() - date_run = n_wc_st["st"] + wallclock = get_wallclock_icon(filename, args.no_x, + num_ok=0)["wc"].total_seconds() + date_run = get_wallclock_icon(filename, args.no_x, + num_ok=0)["st"] # get job number jobnumber = float(filename.split('.')[-2]) @@ -471,4 +386,3 @@ def get_wallclock_Nnodes_gen_daint(filename, index=False, float_format="%.2f") -################################################################################ From d2ce25f2bf86ff825830e1c0953d6da4d9fba14d Mon Sep 17 00:00:00 2001 From: github-actions Date: Thu, 16 May 2024 09:56:00 +0000 Subject: [PATCH 04/17] GitHub Action: Apply Pep8-formatting --- create_scaling_table_per_exp.py | 18 +++++++----------- 1 file changed, 7 insertions(+), 11 deletions(-) diff --git a/create_scaling_table_per_exp.py b/create_scaling_table_per_exp.py index 7aa06a3..d354ee3 100755 --- a/create_scaling_table_per_exp.py +++ b/create_scaling_table_per_exp.py @@ -269,9 +269,8 @@ def get_date_from_echam_slurm_file(filename): if check_icon_finished(filename) or args.ignore_errors: # get # nodes and wallclock # infer nnodes from MPI-procs in ICON output - nodes_line = grep( - "mo_mpi::start_mpi ICON: Globally run on", - filename)["line"][0] + nodes_line = grep("mo_mpi::start_mpi ICON: Globally run on", + filename)["line"][0] nnodes = int(nodes_line.split(' ')[6]) nnodes = nnodes // args.cpu_per_node @@ -291,9 +290,8 @@ def get_date_from_echam_slurm_file(filename): success_message) or args.ignore_errors: # get # nodes and wallclock # infer nnodes from MPI-procs in ICON output - nodes_line = grep( - "mo_mpi::start_mpi ICON: Globally run on", - filename)["line"][0] + nodes_line = grep("mo_mpi::start_mpi ICON: Globally run on", + filename)["line"][0] nnodes = int(nodes_line.split(' ')[6]) nnodes = nnodes // args.cpu_per_node @@ -319,15 +317,14 @@ def get_date_from_echam_slurm_file(filename): # get # nodes and wallclock # infer nnodes from MPI-procs in ICON output nodes_line = grep("mo_mpi::start_mpi ICON: Globally run on", - filename)["line"][0] + filename)["line"][0] nnodes = int(nodes_line.split(' ')[6]) nnodes = nnodes // args.cpu_per_node wallclock = get_wallclock_icon(filename, args.no_x, - num_ok=0)["wc"].total_seconds() - date_run = get_wallclock_icon(filename, args.no_x, - num_ok=0)["st"] + num_ok=0)["wc"].total_seconds() + date_run = get_wallclock_icon(filename, args.no_x, num_ok=0)["st"] # get job number jobnumber = float(filename.split('.')[-2]) @@ -385,4 +382,3 @@ def get_date_from_echam_slurm_file(filename): sep=';', index=False, float_format="%.2f") - From 9978627ebd04ed68331aa2e0ffbf20a6e10cc48a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michael=20J=C3=A4hn?= Date: Thu, 16 May 2024 13:10:21 +0200 Subject: [PATCH 05/17] Remove ECHAM from script --- create_scaling_table_per_exp.py | 67 ++------------------------------- 1 file changed, 4 insertions(+), 63 deletions(-) diff --git a/create_scaling_table_per_exp.py b/create_scaling_table_per_exp.py index 7aa06a3..1badd1f 100755 --- a/create_scaling_table_per_exp.py +++ b/create_scaling_table_per_exp.py @@ -39,8 +39,8 @@ help='resolution(with ocean) eg T63L31GR15 ') parser.add_argument('--mod','-m', dest = 'mod',\ - default='echam-ham',\ - help='model type (echam-ham, icon, icon-ham)') + default='icon',\ + help='model type (icon, icon-ham, icon-clm)') parser.add_argument('--cpu_per_node', dest = 'cpu_per_node',\ default = 12,\ @@ -97,14 +97,6 @@ for n in nodes_to_proceed ] slurm_files = list(itertools.chain.from_iterable(slurm_files_ar)) - elif args.mod.upper() == "ECHAM-HAM": - slurm_files_ar = [ - glob.glob("{}/{}_cpus{}/slurm*".format(path_exps_dir, - args.basis_name, - n * args.cpu_per_node)) - for n in nodes_to_proceed - ] - slurm_files = list(itertools.chain.from_iterable(slurm_files_ar)) # 3rd possibility : use all the slurm files containing the basis name if (not l_cpus_def): @@ -114,9 +106,6 @@ elif args.mod.upper().startswith("ICON"): slurm_files = glob.glob("{}/LOG.exp.{}*.run.*".format( path_exps_dir, args.basis_name, args.basis_name)) - elif args.mod.upper() == "ECHAM-HAM": - slurm_files = glob.glob("{}/{}*/slurm_{}*".format( - path_exps_dir, args.basis_name, args.basis_name)) # fill up array #----------------------------------------------------------------------------------------------- @@ -217,42 +206,6 @@ def set_default_error_slurm_file(txt_message="Problem in the slurm file"): return (wallclock, nnodes, date_run) - def get_date_from_echam_slurm_file(filename): - string_timer_report = 'Submit Eligible' - summary_in_file = grep(string_timer_report, filename) - if summary_in_file['success']: - summary_line = summary_in_file["line"][0] - summary_iline = summary_in_file["iline"][0] - f = open(filename) - lines = f.readlines() - - line_labels = [s.strip() for s in summary_line.split()] - ind_start = line_labels.index('Start') - ind_end = line_labels.index('End') - - line_time = [ - lines[summary_iline + 2].split()[i] - for i in [ind_start, ind_end] - ] - first_row = grep(string_timer_report, filename) - first_row_line = first_row["line"][0] - first_row_iline = first_row["iline"][0] - ind_start = line_labels.index('Start') - ind_end = line_labels.index('End') - line_time = [ - lines[first_row_iline + 2].split()[i] - for i in [ind_start, ind_end] - ] - time_arr = [ - datetime.datetime.strptime(s.strip(), '%Y-%m-%dT%H:%M:%S') - for s in line_time - ] - date_run = time_arr[0] - else: - print("Warning: Cannot get date from slurm file %s" % filename) - date_run = default_wallclock['date_run'] - - return date_run # security. If not file found, exit if len(slurm_files) == 0: @@ -285,7 +238,7 @@ def get_date_from_echam_slurm_file(filename): # get job number jobnumber = float(filename.split('.')[-2]) - if args.mod.upper() == "ICON-CLM": + elif args.mod.upper() == "ICON-CLM": success_message = "ICON experiment FINISHED" if check_icon_finished(filename, success_message) or args.ignore_errors: @@ -315,7 +268,7 @@ def get_date_from_echam_slurm_file(filename): # get job number jobnumber = filename[-8:] print(jobnumber) - if args.mod.upper() == "ICON-HAM": + elif args.mod.upper() == "ICON-HAM": # get # nodes and wallclock # infer nnodes from MPI-procs in ICON output nodes_line = grep("mo_mpi::start_mpi ICON: Globally run on", @@ -332,18 +285,6 @@ def get_date_from_echam_slurm_file(filename): # get job number jobnumber = float(filename.split('.')[-2]) - elif args.mod.upper() == "ECHAM-HAM": - ncpus_line = grep("Total number of PEs", filename)["line"][0] - ncpus = int(ncpus_line.split(':')[1].split()[0].strip()) - nnodes = ncpus / float(args.cpu_per_node) - - wallclock_line = grep("Wallclock", filename)["line"][0] - wallclock = float(wallclock_line.split(':')[1].strip()[:-1]) - - date_run = get_date_from_echam_slurm_file(filename) - - jobnumber = float(filename.replace('_', '.').split('.')[-2]) - # fill array in np_2print.append([nnodes, wallclock, jobnumber, date_run]) From d79d9f20227f5fada3b8e0ffa64a3e9b1a2da9cb Mon Sep 17 00:00:00 2001 From: github-actions Date: Thu, 16 May 2024 11:11:00 +0000 Subject: [PATCH 06/17] GitHub Action: Apply Pep8-formatting --- create_scaling_table_per_exp.py | 1 - 1 file changed, 1 deletion(-) diff --git a/create_scaling_table_per_exp.py b/create_scaling_table_per_exp.py index dc59c74..5eca2b0 100755 --- a/create_scaling_table_per_exp.py +++ b/create_scaling_table_per_exp.py @@ -206,7 +206,6 @@ def set_default_error_slurm_file(txt_message="Problem in the slurm file"): return (wallclock, nnodes, date_run) - # security. If not file found, exit if len(slurm_files) == 0: print("No slurm file founded with this basis name") From 6c80d0d762d5f7cceac2f3bee5a608545d8d370f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michael=20J=C3=A4hn?= Date: Thu, 16 May 2024 14:41:27 +0200 Subject: [PATCH 07/17] Remaining bugfixes --- create_scaling_table_per_exp.py | 225 +++++++++++++++++--------------- 1 file changed, 120 insertions(+), 105 deletions(-) diff --git a/create_scaling_table_per_exp.py b/create_scaling_table_per_exp.py index dc59c74..37cca5c 100755 --- a/create_scaling_table_per_exp.py +++ b/create_scaling_table_per_exp.py @@ -15,8 +15,109 @@ 'date_run': datetime.datetime(1900, 1, 1).strftime("%Y-%m-%d %H:%M:%S") } -if __name__ == "__main__": +def grep(string, filename): + # returns lines of file_name where string appears + # mimic the "grep" function + + # initialisation + # list of lines where string is found + list_line = [] + list_iline = [] + lo_success = False + file = open(filename, 'r') + count = 0 + while True: + try: # Some lines are read in as binary with the pgi compilation + line = file.readline() + count += 1 + if string in line: + list_line.append(line) + list_iline.append(count) + lo_success = True + if not line: + break + except Exception as e: + continue + file.close() + return {"success": lo_success, "iline": list_iline, "line": list_line} + + +def extract_line(filename, line_number): + # Open the file in read mode + with open(filename, 'r') as file: + # Read all lines into a list + lines = file.readlines() + + # Check if the line number is valid + if 1 <= line_number <= len(lines): + # Extract the content of the specified line + content = lines[line_number - 1] + return content.strip() # Strip any leading/trailing whitespace + else: + print("Error: Line number is out of range.") + return None + + +def extract_job_id(filename, prefix="slurm-", suffix=".out"): + # Find the starting index of "slurm-" and ".out" + start_index = filename.find(prefix) + len(prefix) + end_index = filename.find(suffix) + + # Extract the job ID substring + if start_index != -1 and end_index != -1: + job_id = filename[start_index:end_index] + return job_id + else: + print("Error: Filename format is incorrect.") + return None + + +def get_wallclock_icon(filename, no_x, num_ok=1, success_message=None): + + required_ok_streams = num_ok + if success_message: + OK_streams = grep(success_message, filename)["line"] + else: + OK_streams = grep('Script run successfully: OK', filename)["line"] + + if len(OK_streams) >= required_ok_streams: + total_grep = grep("total ", filename)["line"] + wallclock = float(total_grep[0].split()[-2]) + line_times = grep(" Elapsed", filename)["iline"][0] + 2 + date_run = extract_line(filename, line_times).split()[2] + else: + print("file {} did not finish properly".format(filename)) + print("Set Wallclock = 0") + wallclock = datetime.timedelta(0) + + return wallclock, date_run +def check_icon_finished(filename, + string_sys_report='Script run successfully: OK'): + # return True if icon finished properly + + # initilisation + lo_finished_ok = False + + # look for ok_line + if grep(string_sys_report, filename)['success']: + lo_finished_ok = True + + return (lo_finished_ok) + +def set_default_error_slurm_file(txt_message="Problem in the slurm file"): + # error in the slurm file, set default values + + wallclock = default_wallclock['wallclock'] + nnodes = default_wallclock['nnodes'] + date_run = default_wallclock['date_run'] + print(txt_message) + print("Set Wallclock = {} , and nodes = {}".format(wallclock, nnodes)) + + return (wallclock, nnodes, date_run) + + +if __name__ == "__main__": # parsing arguments parser = argparse.ArgumentParser() parser.add_argument('--exp', '-e', dest = 'basis_name',\ @@ -42,10 +143,10 @@ default='icon',\ help='model type (icon, icon-ham, icon-clm)') - parser.add_argument('--cpu_per_node', dest = 'cpu_per_node',\ - default = 12,\ + parser.add_argument('--mpi_procs_per_node', dest = 'mpi_procs_per_node',\ + default = 1,\ type = int,\ - help = 'numper of CPUs per node') + help = 'numper of MPI procs per node') parser.add_argument('--fact_nh_yr', '-y', dest = 'factor_nh_year',\ default = 12,\ @@ -85,7 +186,7 @@ if l_cpus_def: if args.mod.upper().startswith("ICON-CLM"): slurm_files_ar = [ - glob.glob("{}/{}_nnodes{}_*.o*".format(path_exps_dir, + glob.glob("{}/{}_nnodes{}/slurm-*.out".format(path_exps_dir, args.basis_name, n)) for n in nodes_to_proceed ] @@ -101,8 +202,8 @@ # 3rd possibility : use all the slurm files containing the basis name if (not l_cpus_def): if args.mod.upper().startswith("ICON-CLM"): - slurm_files = glob.glob("{}/*{}*.o*".format( - path_exps_dir, args.basis_name, args.basis_name)) + slurm_files = sorted(glob.glob("{}/{}_nnodes*/slurm-*.out".format( + path_exps_dir, args.basis_name, args.basis_name))) elif args.mod.upper().startswith("ICON"): slurm_files = glob.glob("{}/LOG.exp.{}*.run.*".format( path_exps_dir, args.basis_name, args.basis_name)) @@ -127,96 +228,18 @@ #ilin = 0 - def grep(string, filename): - # returns lines of file_name where string appears - # mimic the "grep" function - - # initialisation - # list of lines where string is found - list_line = [] - list_iline = [] - lo_success = False - file = open(filename, 'r') - count = -1 - while True: - try: # Some lines are read in as binary with the pgi compilation - line = file.readline() - count += 1 - if string in line: - list_line.append(line) - list_iline.append(count) - lo_success = True - if not line: - break - except Exception as e: - continue - file.close() - return {"success": lo_success, "iline": list_iline, "line": list_line} - - def get_wallclock_icon(filename, no_x, num_ok=1, success_message=None): - - required_ok_streams = num_ok - if success_message: - OK_streams = grep(success_message, filename)["line"] - else: - OK_streams = grep('Script run successfully: OK', filename)["line"] - - if len(OK_streams) >= required_ok_streams: - timezone = 'CEST' - time_grep = grep(timezone, filename)["line"] - if not time_grep: - timezone = 'CET' - time_grep = grep(timezone, filename)["line"] - - time_arr = [ - datetime.datetime.strptime( - s.strip(), '%a %b %d %H:%M:%S ' + timezone + ' %Y') - for s in time_grep - ] - - wallclock = time_arr[-1] - time_arr[0] - else: - print("file {} did not finish properly".format(filename)) - print("Set Wallclock = 0") - wallclock = datetime.timedelta(0) - - return {"wc": wallclock, "st": time_arr[0]} - - def check_icon_finished(filename, - string_sys_report='Script run successfully: OK'): - # return True if icon finished properly - - # initilisation - lo_finished_ok = False - - # look for ok_line - if grep(string_sys_report, filename)['success']: - lo_finished_ok = True - - return (lo_finished_ok) - - def set_default_error_slurm_file(txt_message="Problem in the slurm file"): - # error in the slurm file, set default values - - wallclock = default_wallclock['wallclock'] - nnodes = default_wallclock['nnodes'] - date_run = default_wallclock['date_run'] - print(txt_message) - print("Set Wallclock = {} , and nodes = {}".format(wallclock, nnodes)) - - return (wallclock, nnodes, date_run) # security. If not file found, exit if len(slurm_files) == 0: - print("No slurm file founded with this basis name") + print("No slurm file found with this basis name") print("Exiting") exit() # loop over number of cpus to be lauched for filename in slurm_files: - print("Read file : {}".format(os.path.basename(filename))) + print(f"Read file: {filename}") # read nnodes and wallclock from file if args.mod.upper() == "ICON": if check_icon_finished(filename) or args.ignore_errors: @@ -225,8 +248,7 @@ def set_default_error_slurm_file(txt_message="Problem in the slurm file"): nodes_line = grep("mo_mpi::start_mpi ICON: Globally run on", filename)["line"][0] nnodes = int(nodes_line.split(' ')[6]) - - nnodes = nnodes // args.cpu_per_node + nnodes = nnodes // args.mpi_procs_per_node wallclock = get_wallclock_icon( filename, args.no_x)["wc"].total_seconds() @@ -238,7 +260,7 @@ def set_default_error_slurm_file(txt_message="Problem in the slurm file"): # get job number jobnumber = float(filename.split('.')[-2]) elif args.mod.upper() == "ICON-CLM": - success_message = "ICON experiment FINISHED" + success_message = "----- ICON finished" if check_icon_finished(filename, success_message) or args.ignore_errors: # get # nodes and wallclock @@ -246,34 +268,27 @@ def set_default_error_slurm_file(txt_message="Problem in the slurm file"): nodes_line = grep("mo_mpi::start_mpi ICON: Globally run on", filename)["line"][0] nnodes = int(nodes_line.split(' ')[6]) + nnodes = nnodes // args.mpi_procs_per_node - nnodes = nnodes // args.cpu_per_node - - wallclock = get_wallclock_icon( + wallclock, date_run = get_wallclock_icon( filename, args.no_x, num_ok=1, - success_message=success_message)["wc"].total_seconds() - date_run = get_wallclock_icon( - filename, - args.no_x, - num_ok=1, - success_message=success_message)["st"] + success_message=success_message) + print(f"Simulation on {nnodes} nodes launched at: {date_run}") else: wallclock, nnodes, date_run = set_default_error_slurm_file( "Warning : Run did not finish properly") # get job number - jobnumber = filename[-8:] - print(jobnumber) + jobnumber = extract_job_id(filename) elif args.mod.upper() == "ICON-HAM": # get # nodes and wallclock # infer nnodes from MPI-procs in ICON output nodes_line = grep("mo_mpi::start_mpi ICON: Globally run on", filename)["line"][0] nnodes = int(nodes_line.split(' ')[6]) - - nnodes = nnodes // args.cpu_per_node + nnodes = nnodes // args.mpi_procs_per_node wallclock = get_wallclock_icon(filename, args.no_x, num_ok=0)["wc"].total_seconds() @@ -317,8 +332,8 @@ def set_default_error_slurm_file(txt_message="Problem in the slurm file"): perf_sorted.to_csv(filename_out, columns=[ 'Date', 'Jobnumber', 'N_Nodes', 'Wallclock', - 'Wallclock_hum', 'Speedup', 'Node_hours', - 'Efficiency', 'NH_year' + 'Wallclock_hum', 'Speedup', 'Efficiency', + 'Node_hours', 'NH_year' ], sep=';', index=False, From 0f6e0c12fe516becb3ae3cee7c42840aaf9e5fd9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michael=20J=C3=A4hn?= Date: Thu, 16 May 2024 14:52:08 +0200 Subject: [PATCH 08/17] Add gitignore --- .gitignore | 3 +++ 1 file changed, 3 insertions(+) create mode 100644 .gitignore diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..385eead --- /dev/null +++ b/.gitignore @@ -0,0 +1,3 @@ +*.csv +*.pdf +__pycache__/ \ No newline at end of file From d1abbb2cbd9bafc091964722341e8665caffc0e1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michael=20J=C3=A4hn?= Date: Thu, 16 May 2024 14:52:23 +0200 Subject: [PATCH 09/17] Fix plotting and adapt for ICON-CLM --- def_exps_plot.py | 4 ++-- plot_perfs.py | 15 +++++++-------- 2 files changed, 9 insertions(+), 10 deletions(-) diff --git a/def_exps_plot.py b/def_exps_plot.py index 780f1d5..7ea48bf 100644 --- a/def_exps_plot.py +++ b/def_exps_plot.py @@ -31,8 +31,8 @@ def __init__(self, color='#253494', linestyle='-') -daint_01 = experiment(name='icon_cordex_12km_era5_gpu_20230222', - label='CORDEX-12km', +daint_01 = experiment(name='icon-clm_scaling', + label='EUR-12km', bestconf=36, marker='>', color='#253494', diff --git a/plot_perfs.py b/plot_perfs.py index 3d7f60d..6c9bce0 100755 --- a/plot_perfs.py +++ b/plot_perfs.py @@ -150,14 +150,13 @@ best_n = exp.bestconf if best_n in dt.N_Nodes.values: if var_to_plot == 'Efficiency': - perf_chosen = float( - dt[dt.N_Nodes == best_n].Efficiency) - if var_to_plot == 'Wallclock': - perf_chosen = float(dt[dt.N_Nodes == best_n].Wallclock) - if var_to_plot == 'Speedup': - perf_chosen = float(dt[dt.N_Nodes == best_n].Speedup) - if var_to_plot == 'NH_year': - perf_chosen = float(dt[dt.N_Nodes == best_n].NH_year) + perf_chosen = float(dt.loc[dt.N_Nodes == best_n, 'Efficiency'].iloc[0]) + elif var_to_plot == 'Wallclock': + perf_chosen = float(dt.loc[dt.N_Nodes == best_n, 'Wallclock'].iloc[0]) + elif var_to_plot == 'Speedup': + perf_chosen = float(dt.loc[dt.N_Nodes == best_n, 'Speedup'].iloc[0]) + elif var_to_plot == 'NH_year': + perf_chosen = float(dt.loc[dt.N_Nodes == best_n, 'NH_year'].iloc[0]) ax.scatter(best_n, perf_chosen, s=120., From 123ea267892b2ca9152ea89f845563fc33557875 Mon Sep 17 00:00:00 2001 From: github-actions Date: Thu, 16 May 2024 12:52:59 +0000 Subject: [PATCH 10/17] GitHub Action: Apply Pep8-formatting --- create_scaling_table_per_exp.py | 16 +++++++++------- plot_perfs.py | 12 ++++++++---- 2 files changed, 17 insertions(+), 11 deletions(-) diff --git a/create_scaling_table_per_exp.py b/create_scaling_table_per_exp.py index f02c929..44e8833 100755 --- a/create_scaling_table_per_exp.py +++ b/create_scaling_table_per_exp.py @@ -15,6 +15,7 @@ 'date_run': datetime.datetime(1900, 1, 1).strftime("%Y-%m-%d %H:%M:%S") } + def grep(string, filename): # returns lines of file_name where string appears # mimic the "grep" function @@ -41,7 +42,7 @@ def grep(string, filename): file.close() return {"success": lo_success, "iline": list_iline, "line": list_line} - + def extract_line(filename, line_number): # Open the file in read mode with open(filename, 'r') as file: @@ -92,6 +93,7 @@ def get_wallclock_icon(filename, no_x, num_ok=1, success_message=None): return wallclock, date_run + def check_icon_finished(filename, string_sys_report='Script run successfully: OK'): # return True if icon finished properly @@ -105,6 +107,7 @@ def check_icon_finished(filename, return (lo_finished_ok) + def set_default_error_slurm_file(txt_message="Problem in the slurm file"): # error in the slurm file, set default values @@ -186,8 +189,8 @@ def set_default_error_slurm_file(txt_message="Problem in the slurm file"): if l_cpus_def: if args.mod.upper().startswith("ICON-CLM"): slurm_files_ar = [ - glob.glob("{}/{}_nnodes{}/slurm-*.out".format(path_exps_dir, - args.basis_name, n)) + glob.glob("{}/{}_nnodes{}/slurm-*.out".format( + path_exps_dir, args.basis_name, n)) for n in nodes_to_proceed ] slurm_files = list(itertools.chain.from_iterable(slurm_files_ar)) @@ -202,8 +205,9 @@ def set_default_error_slurm_file(txt_message="Problem in the slurm file"): # 3rd possibility : use all the slurm files containing the basis name if (not l_cpus_def): if args.mod.upper().startswith("ICON-CLM"): - slurm_files = sorted(glob.glob("{}/{}_nnodes*/slurm-*.out".format( - path_exps_dir, args.basis_name, args.basis_name))) + slurm_files = sorted( + glob.glob("{}/{}_nnodes*/slurm-*.out".format( + path_exps_dir, args.basis_name, args.basis_name))) elif args.mod.upper().startswith("ICON"): slurm_files = glob.glob("{}/LOG.exp.{}*.run.*".format( path_exps_dir, args.basis_name, args.basis_name)) @@ -227,8 +231,6 @@ def set_default_error_slurm_file(txt_message="Problem in the slurm file"): # performs the analysis (create a csv file) #ilin = 0 - - # security. If not file found, exit if len(slurm_files) == 0: print("No slurm file found with this basis name") diff --git a/plot_perfs.py b/plot_perfs.py index 6c9bce0..67484a5 100755 --- a/plot_perfs.py +++ b/plot_perfs.py @@ -150,13 +150,17 @@ best_n = exp.bestconf if best_n in dt.N_Nodes.values: if var_to_plot == 'Efficiency': - perf_chosen = float(dt.loc[dt.N_Nodes == best_n, 'Efficiency'].iloc[0]) + perf_chosen = float(dt.loc[dt.N_Nodes == best_n, + 'Efficiency'].iloc[0]) elif var_to_plot == 'Wallclock': - perf_chosen = float(dt.loc[dt.N_Nodes == best_n, 'Wallclock'].iloc[0]) + perf_chosen = float(dt.loc[dt.N_Nodes == best_n, + 'Wallclock'].iloc[0]) elif var_to_plot == 'Speedup': - perf_chosen = float(dt.loc[dt.N_Nodes == best_n, 'Speedup'].iloc[0]) + perf_chosen = float(dt.loc[dt.N_Nodes == best_n, + 'Speedup'].iloc[0]) elif var_to_plot == 'NH_year': - perf_chosen = float(dt.loc[dt.N_Nodes == best_n, 'NH_year'].iloc[0]) + perf_chosen = float(dt.loc[dt.N_Nodes == best_n, + 'NH_year'].iloc[0]) ax.scatter(best_n, perf_chosen, s=120., From 5594425b817dd319f545f49ea8ae0e92c78058cc Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michael=20J=C3=A4hn?= Date: Thu, 16 May 2024 14:54:42 +0200 Subject: [PATCH 11/17] Remove old config --- def_exps_plot_2019.py | 117 ------------------------------------------ 1 file changed, 117 deletions(-) delete mode 100644 def_exps_plot_2019.py diff --git a/def_exps_plot_2019.py b/def_exps_plot_2019.py deleted file mode 100644 index 869b00d..0000000 --- a/def_exps_plot_2019.py +++ /dev/null @@ -1,117 +0,0 @@ -# definition of the object "experiment". It contains mostly the potting properties - -import numpy as np - - -class experiment: - - def __init__(self, - name, - label=None, - bestconf=np.nan, - linewidth=1., - **kwargs): - self.name = name - if label is None: - self.label = name - else: - self.label = label - self.bestconf = bestconf - - self.line_appareance = kwargs - self.line_appareance['linewidth'] = linewidth - - -icon_amip_intel = experiment( - name='atm_amip_intel', - label='ICON intel', - bestconf=64, - marker='<', - color='Red') #,linestyle = '-')#, marker = 'd', linestyle = '-') -icon_amip_6h_intel = experiment( - name='atm_amip_intel_6h', - label='ICON intel 6h', - bestconf=64, - marker='.', - color='Red') #,linestyle = '--')#, marker = 'd', linestyle = '-') -icon_amip_1m_intel = experiment(name='atm_amip_intel_1m', - label='ICON intel 1m', - bestconf=40, - marker='*', - color='Red') #, marker = 'd', linestyle = '-') - -icon_amip_6h_cray = experiment( - name='atm_amip_6h', - label='ICON cray 6h', - bestconf=24, - marker='.', - color='Green', - linestyle='--') #, marker = 'c', linestyle = '-') -icon_amip_1m_cray = experiment( - name='atm_amip_1m', - label='ICON cray 1m', - bestconf=24, - marker='*', - color='LightGreen') #, marker = '*', linestyle = '-') -icon_lam_init_cray = experiment(name='ICON_limarea_Bernhard_init', - label='ICON-LAM init cray', - bestconf=24, - color='Green', - marker='.') #, linestyle = '-') -icon_lam_cray = experiment(name='ICON_limarea_Bernhard_7d', - label='ICON-LAM cray', - bestconf=128, - color='Green', - marker='s') #, linestyle = '-') - -icon_amip_6h_final = experiment( - name='atm_amip_intel_6h', - label='ICON 6h', - bestconf=64, - marker='.', - color='Magenta', - linestyle='--') #, marker = 'd', linestyle = '-') -icon_amip_1m_final = experiment( - name='atm_amip_intel_1m', - label='ICON 1m', - bestconf=40, - marker='.', - color='Purple') #, marker = 'd', linestyle = '-') -icon_lam_final = experiment(name='ICON_limarea_Bernhard_7d', - label='ICON-LAM', - bestconf=128, - color='Red', - marker='>', - markersize=4, - linestyle='--') -icon_ham_final = experiment(name='atm_amip_hammoz_marc', - label='ICON-HAM', - bestconf=24, - marker='.', - color='Blue', - linestyle='-') -e63h23_1m_final = experiment(name='e63ham_1m', - label='ECHAM-HAM', - bestconf=36, - color='Green', - marker='*', - linestyle='-') -esmham_1m_final = experiment(name='mpiesm-ham_1m', - label='MPI-ESM-HAM', - bestconf=48, - color='LightGreen', - marker='d', - markersize=4, - linestyle='-') - -icon_amip_6h_pgi = experiment(name='atm_amip_pgi_6h', - label='ICON PGI 6h', - bestconf=24, - marker='.', - color='darkviolet', - linestyle='--') -icon_amip_1m_pgi = experiment(name='atm_amip_pgi_1m', - label='ICON PGI 1m', - bestconf=24, - marker='*', - color='plum') From 76b4abe343c9a4600321268fbb2b6b22dd6a09e3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michael=20J=C3=A4hn?= Date: Thu, 16 May 2024 14:56:20 +0200 Subject: [PATCH 12/17] Remove craypat parser --- parse_craypat.py | 197 ----------------------------------------------- 1 file changed, 197 deletions(-) delete mode 100644 parse_craypat.py diff --git a/parse_craypat.py b/parse_craypat.py deleted file mode 100644 index d03157e..0000000 --- a/parse_craypat.py +++ /dev/null @@ -1,197 +0,0 @@ -#!/usr/bin/python - -# Parse the craypat analysis files to extract the info CSCS ask and create a unique csv file -# The script will read recursively all the files named "summary*.txt" in the current directory - -# Colombe Siegenthaler C2SM (ETHZ) , 2018-10 - -import numpy as np -import pandas as pd # needs pythn modules to be loaded: module load cray-python/3.6.5.1 PyExtensions/3.6.5.1-CrayGNU-18.08 -import os -import glob -import argparse -import subprocess -import sys - - -def decide_summary_filename(runtime_file, path_dir_for_sumfile, exp_name): - # create filename of teh summary file - # for MPI-ESM, there are two models running in paralell, each of them create a runtime file, - # so each of them needs a summary file - - if len(glob.glob('{}/*+*'.format(path_dir_for_sumfile))) > 1: - mod = os.path.relpath(runtime_file, path_dir_for_sumfile).split('+')[0] - filename_sum = 'summary_{}_{}.txt'.format(exp_name, mod) - else: - filename_sum = 'summary_{}.txt'.format(exp_name) - - out_summary_file = os.path.join(path_dir_for_sumfile, filename_sum) - - return (out_summary_file) - - -def create_summary_file(runtime_file, path_dir_for_sumfile, exp_name): - # Create summary_[label_model_name].txt from RUNTIME (written by craypat tool) file - # input is runtime file - - # final summary file for this exp - out_summary_file = decide_summary_filename(runtime_file, - path_dir_for_sumfile, exp_name) - - if not os.path.isfile(runtime_file): - print('Warning: Runtime file is not a proper file : {}'.format( - runtime_file)) - - # copy part of runtime file into summary file - with open(runtime_file) as fin, open(out_summary_file, 'w') as fout: - for line in fin: - # get starting point - if line.startswith("#"): - continue - - # copy the line into fout - fout.write(line) - - # ending point - if line.startswith('I/O Write Rate'): - break - - return (out_summary_file) - - -def extract_dir_exp(runtime_file): - # get the general path to exp dir - - path_dir = os.path.dirname(runtime_file.split('+')[0]) - exp_name = os.path.basename(path_dir) - - return (path_dir, exp_name) - - -def get_slurm_file_dep_mod(path_dir): - # get the path to the slurm file depending on the model family - - gen_mod_family = os.path.join(path_dir).split('/')[-2].upper() - if gen_mod_family.startswith('ICON'): - slurm_file_path = glob.glob('{}/LOG*.o'.format(path_dir)) - elif (gen_mod_family.startswith('ECHAM') - or gen_mod_family.startswith('MPI-ESM')): - slurm_file_path = glob.glob('{}/slurm*.txt'.format(path_dir)) - else: - print("Warning: No rule for finding the slurm filefor the file {}.". - format(filename)) - print( - "Rules for finding slurm files are only defined for module family : ECHAM, MPI-ESM or ICON " - ) - print('The family model found is : {}'.format(gen_mod_family)) - slurm_file_path = [] - - return (slurm_file_path) - - -def get_jobnumber_from_slurmfile(slurm_file_path): - # get the jobnumber from the slurm filename - - if not len(slurm_file_path) == 1: - print("Warning, several or no slurm file.") - print("The following files are found:{}".format( - "\n".join(slurm_file_path))) - print("Set job number to 0") - jobnumber = "0" - else: - jobnumber = os.path.basename(slurm_file_path[0]).split('.')[-2] - - # remove the submission number (especially for echam run) - jobnumber = jobnumber.split('_')[-1] - - return (jobnumber) - - -def get_jobnumber(path_dir): - - # add look for Jobnumber of the craypat run - > need to find slurm file - slurm_file_path = get_slurm_file_dep_mod(path_dir) - - # get the jobnumber from the slurm filename - jobnumber = get_jobnumber_from_slurmfile(slurm_file_path) - - return (jobnumber) - - -if __name__ == "__main__": - # parsing arguments - parser = argparse.ArgumentParser() - parser.add_argument('--exclude', '-e', dest = 'exclude_dir',\ - default = [],\ - nargs = '*',\ - help='folders to exclude.') - parser.add_argument('--out_f', '-o', dest = 'out_f',\ - default = 'Craypat_table',\ - help = 'filename of the output.') - args = parser.parse_args() - - # get current directory - pwd = os.getcwd() - - # find all the summary files - all_files = glob.glob('{}/**/RUNTIME.rpt'.format(pwd), recursive=True) - - # definition of teh directories to exclude - #exclude_dir = ['before_update_Oct2018'] - files_to_exclude = [] - for filename in all_files: - if any([s in filename for s in args.exclude_dir]): - files_to_exclude.append(filename) - - # exclude files - for f in files_to_exclude: - all_files.remove(f) - - # define dataframe for output - data_global = pd.DataFrame(columns=['Variable']) - - # parse each file of the list - for ifile, filename in enumerate(all_files): - print( - '----------------------------------------------------------------------' - ) - print('Parsing file {}'.format(filename)) - - path_dir, exp_name = extract_dir_exp(filename) - - # creation of a summary file from the report file (written by Craypat tool) - summary_file_exp = create_summary_file(filename, path_dir, exp_name) - - # read file - data_single = pd.read_csv(summary_file_exp, sep=':', header=None) - - # rename first column into 'Variable' - data_single.rename(columns={0: "Variable"}, inplace=True) - - # retrieve number of columns - ncol = len(data_single.columns) - - # combine all the columns instead Variable together (the HH:MM:SS were separated by mistake ) - data_single[exp_name] = data_single[1] - del data_single[1] - for icol in np.arange(2, ncol): - data_single[exp_name] = data_single[exp_name] + ':' + data_single[ - icol].fillna("") - del data_single[icol] - - # delete '::' in case it was added in teh column combination - data_single[exp_name] = data_single[exp_name].str.rstrip(':') - - # get the jobnumber from the slurm filename in the directory - jobnumber = get_jobnumber(path_dir) - - # add the jobnumber in the dtaaframe as a new line - data_single.loc[len(data_single)] = ['Job Number', jobnumber] - - # fill the outter dataframe - data_global = pd.merge(data_global, \ - data_single.rename(columns={exp_name:exp_name.replace('_',' ')}), \ - how='outer', on=['Variable']) - - # write out the global dataframe - data_global.to_csv('{}.csv'.format(args.out_f), sep=',', index=False) From 6b4d7433354cd7b99f8092a70189fecb7a2a355d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michael=20J=C3=A4hn?= Date: Thu, 16 May 2024 14:56:50 +0200 Subject: [PATCH 13/17] Remove old config --- def_exps_plot_2021.py | 70 ------------------------------------------- 1 file changed, 70 deletions(-) delete mode 100644 def_exps_plot_2021.py diff --git a/def_exps_plot_2021.py b/def_exps_plot_2021.py deleted file mode 100644 index f7b9cb4..0000000 --- a/def_exps_plot_2021.py +++ /dev/null @@ -1,70 +0,0 @@ -# definition of the object "experiment". It contains mostly the potting properties - -import numpy as np - - -class experiment: - - def __init__(self, - name, - label=None, - bestconf=np.nan, - linewidth=1., - **kwargs): - self.name = name - if label is None: - self.label = name - else: - self.label = label - self.bestconf = bestconf - - self.line_appareance = kwargs - self.line_appareance['linewidth'] = linewidth - - -# Color palette -#fdcc8a -#fc8d59 -#d7301f -#bdc9e1 -#67a9cf -#02818a - -# Definition of each experiment properties (colors, labels,ect) - -echam_ham_amip_T63L47 = experiment(name='ECHAM-HAM_amip_T63L47', - label='ECHAM-HAM 1M (cpu, intel)', - bestconf=36, - marker='>', - color='#fdcc8a', - linestyle='-') -icon_ham_amip = experiment(name='ICON-HAM_amip', - label='ICON-HAM 1M (cpu, pgi)', - bestconf=19, - marker='<', - color='#fc8d59', - linestyle='-') -icon_cpu_gcc_amip = experiment(name='ICON_cpu_gcc_amip', - label='ICON 1M (cpu, gcc)', - bestconf=43, - marker='x', - color='#bdc9e1', - linestyle='--') -icon_cpu_pgi_amip = experiment(name='ICON_cpu_pgi_amip', - label='ICON 1M (cpu, pgi)', - bestconf=42, - marker='x', - color='#67a9cf', - linestyle='-') -icon_gpu_pgi_amip_rte = experiment(name='ICON_gpu_pgi_amip_rte', - label='ICON 1M (gpu, pgi)', - bestconf=4, - marker='v', - color='#02818a', - linestyle='-') -icon_r2b9 = experiment(name='ICON_R2B9', - label='ICON@R2B9 1h (gpu, pgi)', - bestconf=1692, - marker='.', - color='#b30000', - linestyle='-') From e2bffa0576bf45bbc4923b4ba9ff72217d658d2a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michael=20J=C3=A4hn?= Date: Thu, 16 May 2024 15:00:43 +0200 Subject: [PATCH 14/17] Remove ECHAM script --- send_several_run_ncpus_perf.py | 174 --------------------------------- 1 file changed, 174 deletions(-) delete mode 100755 send_several_run_ncpus_perf.py diff --git a/send_several_run_ncpus_perf.py b/send_several_run_ncpus_perf.py deleted file mode 100755 index 16a0d32..0000000 --- a/send_several_run_ncpus_perf.py +++ /dev/null @@ -1,174 +0,0 @@ -#!/usr/bin/python - -# Wrapper to send several ECHAM-(HAM) runs using the jobscriptoolkit -# for performance anaylsis with different number of cpus -# -# Usage: go into your echam run directory, prepare the basis run set-up -# call the present script -# -# C. Siegenthaler (C2SM) , July 2015 -# -############################################################################################ - -import numpy as np -import os -import datetime -import argparse - -if __name__ == "__main__": - - # parsing arguments - parser = argparse.ArgumentParser() - parser.add_argument('--basis_folder', '-b', dest = 'basis_folder',\ - help='basis folder run containing the configuration as to use as template. The name has to finish with "_cpusXX".') - parser.add_argument('--ncpus_incr', dest = 'cpus_incr',\ - default = 16,\ - type = int,\ - help = 'increment of cpus number between each simulation.') - parser.add_argument('--niter', dest = 'niter',\ - default = 10,\ - type = int,\ - help = 'number of iterations (niter simulations will be performed with the number of cpus for each simulation is [1,2,....,niter-1]*ncpus_incr.') - - parser.add_argument('--nbeg_iter', dest = 'nbeg_iter',\ - default = 1,\ - type = int,\ - help = 'begining of the iteration (the simulations will be performed with the number of cpus for each simulation is [nbeg_iter,nbeg_iter+1....,niter-1]*ncpus_incr.') - - parser.add_argument('--ncpus', dest = 'cpus_to_proceed',\ - default = [],\ - type = int,\ - nargs = '*',\ - help = 'cups number of the simulation to analyse.This have priority over -ncpus_incr, -niter and -nbeg_iter') - parser.add_argument('--nnodes', '-n', dest = 'nodes_to_proceed',\ - default = [],\ - type = int,\ - nargs = '*',\ - help = 'nodes number of the simulation to analyse. This have priority over -ncpus_incr, -niter and -nbeg_iter') - parser.add_argument('--cpu_per_node', dest = 'cpu_per_node',\ - default = 12,\ - type = int,\ - help = 'numper of CPUs per node') - parser.add_argument('-d', action='store_true',\ - help = 'perform dry run, i.e. run the script competely, but do not send the jobs to the batch queue') - parser.add_argument('-dw', action='store_true',\ - help = 'redifine walltime') - - args = parser.parse_args() - - # define number of cpus for which experiment should be sent - #------------------------------------------------------------- - l_cpus_def = False - - if (len(args.cpus_to_proceed) > 0) and (len(args.nodes_to_proceed) > 0): - print( - 'You can specify either the number of cpus or the number of nodes, not both.' - ) - print('Exiting') - exit(1) - - if (len(args.nodes_to_proceed) > 0): - args.cpus_to_proceed = args.cpu_per_node * np.array( - args.nodes_to_proceed) - l_cpus_def = True - - if len(args.cpus_to_proceed) > 0: - l_cpus_def = True - - if not l_cpus_def: - args.cpus_to_proceed = (np.arange(args.nbeg_iter, args.niter) * - args.cpus_incr) - l_cpus_def = True - - # define new experiment name - #-------------------------------------------------------------- - # experiment name basis exp - exp_name_bas_exp = os.path.basename(args.basis_folder) - - # setting filename basis exp - setting_bas_exp = os.path.join(args.basis_folder, - 'settings_{}'.format(exp_name_bas_exp)) - - # check if the basis name is finishing by "cpusXX" and assign kernel name of teh new experiments - if (exp_name_bas_exp.split('_')[-1].startswith('cpus')): - exp_name_nucl = '_'.join( - exp_name_bas_exp.split('_')[:-1]) # name of the new experiments - else: - exp_name_nucl = exp_name_bas_exp - - # get walltime and cpus from basis exp, for computning later the new walltime - #-------------------------------------------------------------- - def grep(string, filename): - # returns lines of file_name where string appears - # mimic the "grep" function - - # list of lines where string is found - list_line = [] - - for line in open(filename): - if string in line: - list_line.append(line) - return list_line - - def value_string_file(string, filename): - # returns the value of a variable defined in a file - # e.g. for walltime, returns 8:00:00 if if filename walltime=8:00:00 - - #initialisation - values = [] - - # list of occurences found by "grep" - occurences = grep(string, setting_bas_exp) - - for occ in occurences: - - #remove comments - line_wo_comment = occ.split('#')[0] - - # get value - def_split = [s.strip() for s in line_wo_comment.split('=')] - - # do not consider variable if string is in the middle of another word - if string == def_split[0]: - values.append(def_split[1]) - return (values) - - walltime_bas = value_string_file("walltime", setting_bas_exp)[0].strip('"') - ncpus_bas = int(value_string_file("ncpus", setting_bas_exp)[0].strip('"')) - - #time in datetime format - basis_day = "2000-01-01" - walltime_datetime = datetime.datetime.strptime('{} {}'.format(basis_day,walltime_bas), '%Y-%m-%d %H:%M:%S') - \ - datetime.datetime.strptime(basis_day, '%Y-%m-%d') - - # send experiments - #--------------------------------------------------------------- - - # change directory to be in the basis folder - # os.chdir(args.basis_folder) - - # loop over number of cpus to be lauched - for ncpus in args.cpus_to_proceed: - - # define name of the new experiment - new_exp_name = '%s_cpus%i' % (exp_name_nucl, ncpus - ) # new experiment name - - # new walltime - comp_walltime = (walltime_datetime * ncpus_bas / ncpus) - new_walltime = datetime.timedelta( - seconds=round(comp_walltime.total_seconds())) - - # job definition and submission - string_to_overwrite = "ncpus={};exp={}".format(ncpus, new_exp_name) - if args.dw: - string_to_overwrite += ";walltime={}".format(new_walltime) - - print('jobsubm_echam.sh -o "{}" {}'.format(string_to_overwrite, - setting_bas_exp)) - print( - '--------------------------------------------------------------------------------------------------------' - ) - if not args.d: - os.system('jobsubm_echam.sh -o "{}" {}'.format( - string_to_overwrite, setting_bas_exp)) From df77b57432ea4d33b9aeaee4e55a21eb67cb8288 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michael=20J=C3=A4hn?= Date: Thu, 16 May 2024 15:00:58 +0200 Subject: [PATCH 15/17] Remove echam from script --- send_analyse_different_exp_at_once_ICON.py | 3 --- 1 file changed, 3 deletions(-) diff --git a/send_analyse_different_exp_at_once_ICON.py b/send_analyse_different_exp_at_once_ICON.py index 1cf8093..9b2bdd5 100755 --- a/send_analyse_different_exp_at_once_ICON.py +++ b/send_analyse_different_exp_at_once_ICON.py @@ -70,9 +70,6 @@ def __init__(self, name, path, mod=None, factor=None, comp=None): '-e', exp.name,'-o', exp.comp, \ '-NH','6', '-n', '1', '12','16','36','48']) elif lo_send_batch: - print( - 'WARNING : Sending different experiments with different numbers of nodes for ECHAM_HAM has not been implemented yet' - ) print('The experiment {} is not done asssociated is : {}'.format( os.path.join(exp.path, 'run', exp.name), exp.mod.upper())) else: From 9b07185c385c2b65ba8337aa3258617dc0e53ae8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michael=20J=C3=A4hn?= Date: Thu, 16 May 2024 15:01:39 +0200 Subject: [PATCH 16/17] Remove icon from script filenames --- ...t_exp_at_once_ICON.py => send_analyse_different_exp_at_once.py | 0 ...veral_run_ncpus_perf_ICON.py => send_several_run_ncpus_perf.py | 0 2 files changed, 0 insertions(+), 0 deletions(-) rename send_analyse_different_exp_at_once_ICON.py => send_analyse_different_exp_at_once.py (100%) rename send_several_run_ncpus_perf_ICON.py => send_several_run_ncpus_perf.py (100%) diff --git a/send_analyse_different_exp_at_once_ICON.py b/send_analyse_different_exp_at_once.py similarity index 100% rename from send_analyse_different_exp_at_once_ICON.py rename to send_analyse_different_exp_at_once.py diff --git a/send_several_run_ncpus_perf_ICON.py b/send_several_run_ncpus_perf.py similarity index 100% rename from send_several_run_ncpus_perf_ICON.py rename to send_several_run_ncpus_perf.py From 69f94e3ec93aaa6545f450a9b674e9cd2065e7f0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michael=20J=C3=A4hn?= Date: Thu, 16 May 2024 15:02:49 +0200 Subject: [PATCH 17/17] Adaptations for README --- README.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 80b27dc..4c5cf9c 100644 --- a/README.md +++ b/README.md @@ -27,27 +27,27 @@ Prepare your machine-independent setting file `my_exp` (e.g. `exp.atm_amip`, wit ### 3. Create and launch different running scripts based on my_exp, but using different numbers of nodes. -Use `send_several_run_ncpus_perf_ICON.py`. +Use `send_several_run_ncpus_perf.py`. For example for running `my_exp` on 1, 10, 12 and 16 nodes: ```console -$ python [path_to_scaling_analysis_tool]/send_several_run_ncpus_perf_ICON.py -e my_exp -n 1 10 12 15 +$ python [path_to_scaling_analysis_tool]/send_several_run_ncpus_perf.py -e my_exp -n 1 10 12 15 ``` With the command above, 4 running scripts will be created (`exp.my_exp_nnodes1.run`, `exp.my_exp_nnodes10.run`, `exp.my_exp_nnodes12.run` and `exp.my_exp_nnodes15.run`), and each of them will be launched. -To send several experiments on different node numbers at once, use: `send_analyse_different_exp_at_once_ICON.py` +To send several experiments on different node numbers at once, use: `send_analyse_different_exp_at_once.py` form inside ``: ```console -$ python send_analyse_different_exp_at_once_ICON.py +$ python send_analyse_different_exp_at_once.py ``` -The script `send_analyse_different_exp_at_once_ICON.py` (n_step = 1) is a wrapper which calls -`send_several_run_ncpus_perf_ICON.py` for different experiments (for example different set-ups, or compilers). +The script `send_analyse_different_exp_at_once.py` (n_step = 1) is a wrapper which calls +`send_several_run_ncpus_perf.py` for different experiments (for example different set-ups, or compilers). -The script `send_analyse_different_exp_at_once_ICON.py` (n_step = 2) is a wrapper which gets +The script `send_analyse_different_exp_at_once.py` (n_step = 2) is a wrapper which gets the wallclocks from the log files for different experiments (for example different set-ups, or compilers) (point 4 of this README). ### 4. When all the runs are finished, read all the slurm/log files to get the Wallclock for each run, and put them into a table: @@ -58,7 +58,7 @@ Use the option `-m icon`: $ python [path_to_scaling_analysis_tool]/create_scaling_table_per_exp.py -e my_exp -m icon ``` -or for different experiments at once: `send_analyse_different_exp_at_once_ICON.py` (n_step = 2) (cf point 3) +or for different experiments at once: `send_analyse_different_exp_at_once.py` (n_step = 2) (cf point 3) ### 5. Create a summary plot and table of the variable you wish (Efficiency, NH, Wallclock) for different experiments with respect to the number of nodes.