Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chpl experiment #22873

Open
wants to merge 59 commits into
base: main
Choose a base branch
from
Open

Chpl experiment #22873

wants to merge 59 commits into from

Conversation

stonea
Copy link
Contributor

@stonea stonea commented Aug 3, 2023

Introduction to chplExperiment

This PR introduces a framework for running experiments (of Chapel programs or the chpl compiler itself) and producing plots. This framework was designed with internal use in mind, though it may well develop into something that folks outside the Chapel's core implementors might find useful as well.

My hope is this framework ends up being generally useful to folks beyond my self, but it was specifically motivated by the fact that I kept on having to regenerate this plot:
image
Why did I have to do that so often? Well, sometimes because we want to measure the impact of:

  • new chapel versions,
  • new experimental features, or
  • a new system or GPU.

Adding to the complexity we want to consider:

  • performance across multiple vendors (NVIDIA and AMD), and
  • multiple memory modes (array_on_device and unified).

There are other similar plots we'd like to generate. Our nightly performance graph infrastructure works well for tracking performance-over-time of of an individual test with a fixed set of parameters, but it doesn't do well for other types of plots such as those looking at scalability or exploring across one or more parameters.

So, traditionally, for a scalability plot like the one above we would generate it by either:

  • manually gathering data and then passing that along to matplotlib/excel/gnuplot --- which is tedious, error prone, and hard to replicate, or
  • creating ad-hoc scripts --- which helps to automate things but often comes with its own issues (where to post it? will people find it? is it really reproducible?: will the scripts work across machines? are hidden assumptions baked into the scripts?).

So, it would be nice if we had a more general framework for creating scripts to run these sorts of experiments and generate plots.

That's what this PR does: it adds in a set of scripts forming a general framework for running experiments and generating plots. Under this framework, users are expected to write three kinds of scripts:

  • (1) a driver script,
  • (2) a gather script, and
  • (3) a paint script that.

These scripts, respectively:

  • define the set of experiments to run,
  • conduct the experiment (under the assumption that Chapel has already been configured and built), and
  • produce the plot using the data the previous script produced and extracted.

Before digging into more detail, if you want to see some example scroll to the bottom of the PR and look under the "Want some more examples" section.

Architecture

So what is this framework? Well, here's an illustration of the architecture (considering each arrow as showing what uses what or in the case of the gray boxes what produces/consumes what):

image

The orange boxes show what scripts users need to create, and the green boxes shows various tools/scripts/libraries that form the "chplExperiment" framework. To create a new experiment/plot users are expected to write three scripts:

  • The drive script -- which defines a set of experiments to run
  • The gather script -- which assumes that Chapel has already been configured and built for a given experiment and then does whatever is necessary to conduct the experiment and dump the results to a table stored in a .dat file.
  • The paint script -- reads in the resulting .dat files and produces one or more .png files containing the plots of interest.

I'll go into more details about what these three scripts are meant to look like. But first, I'll overview some of the underlying framework scripts/tools that are provided to the user:

  • chplExperiment takes as a series of command line arguments forming a "table" of experiments to run. Each "row" of this table is an experiment and defines: (1) the name of the experiment, (2) a set of features that want enabled in Chapel (e.g. amd-gpu support, gasnet, etc.), and (3) a script to run to gather results.
  • chplSetup takes a list of features (for example those that were passed to chplExperiment for a particular experiment) and sets the users environment up so that Chapel can be built with those features enabled. chplSetup works based on a series of "cascading setup scripts", which define how to enable these features and tailor them for specific machines. This means if the user runs chplSetup nvidia (to set up NVIDIA GPU support) on a given machine it will set machine specific variables like CHPL_GPU_ARCH based on the machine its run on. For more information on chplSetup, see https://github.hpe.com/hpe/chplSetup
  • chpl_plot is a Python library that takes data output from a gather script and produces a plot. The chpl_plot library has various functions to aid with manipulating data and formatting/tweaking the to-be-generated plot.

The drive script

To get a sense of how this all fits together let's consider the scripts for generating the stream plot shown above stream.plot.drive.bash:

#!/usr/bin/env bash

set -x -e
"$CHPL_HOME/util/test/chplExperiment" \
   --skip-if-config-error --paint-with ./stream.plot.paint.py \
   \
  `#name           features  options          command`                         \
  `#-------------------------------------------------------------------------` \
   cuda_baseline   nvidia   --no-build-chpl                                    \
                            --skip-if-errs "nvcc --version"                    \
                                             ./stream.plot.gather.cuda.bash    \
                                                                               \
   hip_baseline    amd      --no-build-chpl                                    \
                            --skip-if-errs "hipcc --version"                   \
                                             ./stream.plot.gather.hip.bash     \
                                                                               \
   nvidia          nvidia                    ./stream.plot.gather.chpl.bash    \
                                                                               \
   nvidia_aod      nvidia:aod                ./stream.plot.gather.chpl.bash    \
                                                                               \
   amd             amd                       ./stream.plot.gather.chpl.bash    \
                                                                               \
   amd_aod         amd   --prebuild "export CHPL_GPU_MEM_STRATEGY=array_on_device" \
                                             ./stream.plot.gather.chpl.bash    \

Notice that chplExperiment is passed a number of command line options.

  • --skip-if-config-error is used to "skip" an experiment (rather than erroring out) if printchplenv errors out after confiuring Chapel. This is necessary because we want this script to run on machines that may have NVIDIA GPUs or AMD GPUs or both, and want to gather whatever subset of experiments is applicable.
  • --paint-with ./stream.plot.paint.py tells the script that once all experiments have finished, it should run this Python script to produce the plots. These arguments can be omitted and the "./stream.plot.paint.py" script can be run manually, but it is provided here as a convenience.
  • The next two lines of the form `# ... text` serve a kind of comment within the BASH command (these can be removed without having any affect). These comments are meant to serve as column headers, illustrating the "tabular" nature of the chplExperiment command.
  • The next several commands form "rows" in the experiments table. Each row has: (1) an experiment name, (2) a colon-separated list of features to pass to chplSetup, (3) some optional arguments, and (4) a script to use to conduct the experiment. As far as optional arguments:
    • The --prebuild argument takes a command to run before building Chapel. In this example, I show using --prebuild to set the envrionment variable CHPL_GPU_MEM_STRATEGY=array_on_device. In this particular example I could just as well have used the feature set amd:aod (similar to what's done above for nvidia) but I do it using this alternative method to demonstrate that the feature is available.
    • The --skip-if-errs argument gives an optional command to run before building Chapel. If the command returns a non-zero value then the experiment will be skipped. In this example we use this to skip the nvidia/hip baselines based on the absence of nvcc/hipcc commands.

You'll notice this script dispatches to various gather scripts, specifically: stream.plot.gather.cuda.bash, stream.plot.gather.hip.bash, and stream.plot.gather.chpl.bash.

The gather script

Let's consider stream.plot.gather.chpl.bash:

#!/usr/bin/env bash

# sets 'datFile', 'logDir', 'experimentName', and 'runLog'
source $CHPL_HOME/util/test/chplExperimentGatherUtils/boilerplate.bash $@

sizes=( 1 2 4 8 16 32 64 128)

# -----------------------------------------------------------------------------
# Build Chapel code
# -----------------------------------------------------------------------------
chpl stream.chpl --fast -M../../../release/examples/benchmarks/hpcc

# -----------------------------------------------------------------------------
# Run Chapel trials
# -----------------------------------------------------------------------------

echo "" > $runLog
for x in "${sizes[@]}"; do
  size=$((x * 1024 * 1024))
  ./stream --useGpuDiags=false --SI=false --doValidation=false --m=$size | tee -a $runLog
done

# -----------------------------------------------------------------------------
# Collect data; store in results.dat
# -----------------------------------------------------------------------------

chpl_data=$(cat $runLog | sed -r -n 's/Performance \(GiB\/s\) = (.*)/\1/p')

set +x
echo -e "\t$experimentName" > $datFile
paste \
  <(printf "%s\n" "${sizes[@]}") \
  <(printf "%s\n" "${chpl_data[@]}") >> $datFile

Whenever a gather script is run it is passed the following as command line arguments:

  • $1: The path to a dat file we expect the script to output
  • $2: The path to a directory we expect the script to dump any log or intermediate files it wants to have
  • $3: The name of the experiment

It's good practice to then:

  • put these values into variable names
  • give default values for these variables if the script is run individually without the arguments being passed (sometimes it's useful to do this rather than going through the drive script).
  • produce the logs directory if it hasn't already been produced (if the script is invoked via chplExperiment this directory will already exist)
  • calculate the name for a "runlog" file that will store the output from any invocations of the program we're trying to time/analyze
  • start logging and cause the script to fail if any command the script invokes fails (e.g. set -e -x).

Given that these steps will be done in most gather scripts, these have been factored out into a "boilerplate" script that is sourced like such: source $CHPL_HOME/util/test/chplExperimentGatherUtils/boilerplate.bash $@.

The next notable thing to point out is that when running the experiment we:

  • Pipe the output of our runs with | tee -a $runLog, which pipes the output of the command into a running log
  • We later cat $runLog, and run it through sed to extract the data we're gathering
  • Finally we dump this data into $datFile. The dat file is meant to contain a space separated table of values where the first column is unlabeled.

Coming out of the script the resulting .dat file might look like this:

        amd
1       16.9227
2       17.6425
4       18.0499
8       18.4969
16      18.4767
32      18.5464
64      18.5996
128     18.5687

If the gather script is run through chplExperiment it will also "decorate" this .dat file with comments showing the output from printchplenv --all --anonymize along with some other information that may be useful for logging purposes.

The paint script

The simplest possible paint script looks like this:

#!/usr/bin/env python3

import sys, os
sys.path.append(os.path.join(os.environ['CHPL_HOME'], 'util', 'test'))
from chpl_plot import *

plot(load_tables())

This script finds every .dat file under the logs/ directory and produces an associated .png.

Often times you'll want to manipulate data or do other things to customize the plot. In the case of the "stream" plot, things are indeed more complicated:

#!/usr/bin/env python3

import sys, os
sys.path.append(os.path.join(os.environ['CHPL_HOME'], 'util', 'test'))
from chpl_plot import *

tbls = load_tables()  # loads tables from '.dat' files under 'logs/'
tbl_amd = None
tbl_nvidia = None

# If we gathered AMD data build the AMD table (and so forth with NVIDIA)
if 'amd' in tbls:
  tbl_amd = join(tbls['hip_baseline'], tbls['amd'], tbls['amd_aod']).with_title(
    'Stream on %s AMD GPU' % os.getenv('CHPL_GPU_ARCH'))
if 'nvidia' in tbls:
  tbl_nvidia = join(tbls['cuda_baseline'], tbls['nvidia'], tbls['nvidia_aod']).with_title(
    'Stream on NVIDIA GPU')

# We use the following function below to avoid duplicating code for
# each of our plots (the NVIDIA one and the AMD one)
def report(tbl, filename):
  if tbl is None:
    return None

  # print table in markdown format
  print()
  print(tbl.md("%0.1f"))  # "%0.1f" here is an optional format string specifying precision

  # start producing plot, give explicit xlabel and ylabel.
  # the title is inherited from the table (though we can also
  # explicitly give it here if we want).
  p = tbl.plot(
    xlabel="Number of Elements (M)",
    ylabel="Throughput\n(GiB/s)",
    save=False)

  # Adjust ticks and tick labels on 'x' axis
  p.set_xticks(p._x_data[5:])
  p.set_xticklabels([str(int(l)) for l in p._x_data[5:]])

  # Finally save the plot
  p.save(f'logs/{filename}')

report(tbl_amd, "stream_amd.png")
report(tbl_nvidia, "nvidia_amd.png")

With this script load_tables() will load a TableCollection that contains a Table for each .dat file in the logs/ directory. If all our experiments ran we'll have tables for: amd_aod.dat, amd.dat, cuda_baseline.dat, hip_baseline.dat, nvidia_aod.dat, nvidia.dat, but really what we want is to produce up to two plots: one for all the nvidia results and one for all the amd results.

To do this we use join and pass it the list of tables that contain the data we wish to show in the same plot. So the line: join(tbls['hip_baseline'], tbls['amd'], tbls['amd_aod']) creates a new table that is the "join" of the three smaller tables.
We follow this command with: with_title('Stream on %s AMD GPU' % os.getenv('CHPL_GPU_ARCH')), which gives a title to the table/plot.

We create a convenience function report, which takes a table and filename and produces a plot of that table (if available). This function does some manipulation of the plot to give an xlabel, ylabel, and only show the last 5 "tick" values on the x axis (this is done because otherwise the x-tick labels appear to close to one another and overlap). For more information about the interface to chpl_plot see the comments in the chpl_plot.py file.

A few other notable things:

  • Every time you run chplExperiment, log files are backed up -- When chplExperiment starts if it sees 'logs/' already exists it will move all the files under 'logs'/ into a new 'logs/backup/nnn' directory where 'nnn' is the smallest non-negative integer where there isn't an existing 'logs/backup/nnn'.
  • You can use this to print tables too -- Tables in the plot script can be dumped to markdown format like such: print(tbl.md())
  • There's a couple shortcuts for enabling/disabling experiments --
    • If experiment names passed to chplExperiment are prefixed with __reuse__ then the experiment will not be run if an existing .dat file exists for the experiment from the previous run (instead the .dat file will be copied). This is useful if the user wishes to rerun a particular experiment as changes are being made without rerunning all experiments.
    • Similarly an individual experiment may be skipped by prefixing the name with __skip__.
  • chplSetup might be useful outside of this framework -- chplExperiment calls out to chplSetup to configure the user's environment (for a given set of features) before building Chapel. This is something everyone on our team needs to frequently do (and we've probably ended up with tons of individual, ad-hoc, scripts to do this). Furthermore this is something our nightly testing infrastructure has to do as well as something my paratest launching tool has to do. chplSetup itself might end up being a valuable tool that could be used in a variety of contexts. For more info see: https://github.hpe.com/hpe/chplSetup
  • You might find individual parts of this framework useful even if/when you don't want to use the "whole package" --
    • find it easier to use Excel to manipulate data and generate plots? Fine: just use the tool up to generating .dat files and pick things up manually from there.
    • don't want to script configuring and building Chapel? Sure, no problem, you can always do that part manually and run gather scripts directly.

Want some more examples?

Look at the these plots and the scripts used to produce them. Also check out the "log" files.
log.txt shows the output from running the drive script. Various "running logs" show the executions of the program being measured. The .dat files show results that are then gathered to produce the plots

Stream

Table: Stream on NVIDIA GPU

cuda_baseline nvidia nvidia_aod
1.0 225.5 174.9 167.5
2.0 233.2 201.2 192.9
4.0 238.0 211.6 209.3
8.0 240.4 202.7 225.3
16.0 241.5 201.3 220.7
32.0 242.3 216.4 231.5
64.0 242.4 227.8 236.6
128.0 242.4 235.0 236.6

Plot:

image

CHOP

join of: (chpl, cuda_only)

chpl cuda_only
15.0 0.335330 0.299000
16.0 1.873480 1.597000
17.0 13.584900 11.399000

join of: (chpl, cuda_only) normalized to cuda_only

chpl
15.0 1.121505
16.0 1.173125
17.0 1.191762

Plots:

chpl__cuda_only
normalized

Language Benchmarks Game

compile time

flat gasnet cpu-as-device gpu GPU w/ specialization
binarytrees 3.03 3.72 3.75 10.03 5.27
chameneosredux-fast 3.11 3.92 3.92 11.06 5.51
chameneosredux 3.08 3.89 3.90 11.03 5.54
fannkuchredux 3.01 3.71 3.78 10.70 5.42
fasta 3.57 4.46 4.54 12.93 6.30
knucleotide 4.26 5.08 5.26 15.86 7.34
mandelbrot-fast 3.42 4.15 4.16 11.96 5.87
mandelbrot 3.30 3.96 4.00 11.36 5.73
meteor-fast 3.85 4.55 4.63 13.57 6.39
meteor 3.92 4.86 5.01 14.78 6.75
nbody 2.96 3.70 3.69 10.17 5.29
pidigits-fast 2.78 3.50 3.49 9.58 5.00
pidigits 2.79 3.51 3.45 9.59 5.02
regexdna-redux 3.21 4.01 4.07 11.37 5.70
regexdna 3.19 4.04 4.05 11.51 5.69
revcomp-fast 3.14 3.94 3.98 11.30 5.66
revcomp 3.01 3.78 3.83 10.58 5.42
spectralnorm 2.97 3.72 3.74 10.56 5.34

execution time

flat gasnet cpu-as-device gpu GPU w/ specialization
binarytrees 0.04 0.35 0.03 1.46 1.45
chameneosredux-fast 0.04 0.34 0.02 1.44 1.43
chameneosredux 0.04 0.35 0.03 1.42 1.44
fannkuchredux 0.04 0.35 0.03 1.44 1.44
fasta 0.04 0.34 0.03 1.44 1.43
knucleotide 0.20 0.43 0.13 163.09 163.11
mandelbrot-fast 0.04 0.35 0.03 1.45 1.42
mandelbrot 0.04 0.35 0.03 1.44 1.42
meteor-fast 0.12 0.37 0.06 1.46 1.48
meteor 0.19 0.51 0.28 1.56 1.74
nbody 0.04 0.35 0.03 1.42 1.43
pidigits-fast 0.04 0.35 0.01 1.38 1.40
pidigits 0.04 0.35 0.03 1.39 1.39
regexdna-redux 0.05 0.35 0.03 1.42 1.42
regexdna 0.05 0.36 0.04 1.44 1.44
revcomp-fast 0.04 0.35 0.02 1.43 1.43
revcomp 0.04 0.35 0.03 1.43 1.42
spectralnorm 0.08 0.36 0.04 1.42 1.45

compile time ratio (over time w/ flat locale model)

gasnet cpu-as-device gpu GPU w/ specialization
binarytrees 1.23 1.24 3.31 1.74
chameneosredux-fast 1.26 1.26 3.56 1.77
chameneosredux 1.26 1.27 3.58 1.80
fannkuchredux 1.23 1.25 3.55 1.80
fasta 1.25 1.27 3.62 1.76
knucleotide 1.19 1.23 3.72 1.72
mandelbrot-fast 1.22 1.22 3.50 1.72
mandelbrot 1.20 1.21 3.45 1.74
meteor-fast 1.18 1.20 3.53 1.66
meteor 1.24 1.28 3.77 1.72
nbody 1.25 1.25 3.44 1.79
pidigits-fast 1.26 1.26 3.45 1.80
pidigits 1.26 1.24 3.44 1.80
regexdna-redux 1.25 1.27 3.54 1.78
regexdna 1.27 1.27 3.61 1.78
revcomp-fast 1.26 1.27 3.60 1.80
revcomp 1.25 1.27 3.52 1.80
spectralnorm 1.25 1.26 3.55 1.80

execution time ratio (over time w/ flat locale model)

gasnet cpu-as-device gpu GPU w/ specialization
binarytrees 7.82 0.69 32.40 32.29
chameneosredux-fast 7.64 0.51 32.02 31.67
chameneosredux 8.26 0.62 33.88 34.21
fannkuchredux 8.29 0.71 34.24 34.36
fasta 8.60 0.65 36.10 35.65
knucleotide 2.17 0.67 815.46 815.56
mandelbrot-fast 8.12 0.58 33.67 33.00
mandelbrot 9.00 0.77 36.85 36.46
meteor-fast 3.07 0.50 12.07 12.24
meteor 2.68 1.49 8.24 9.13
nbody 8.36 0.76 33.79 33.95
pidigits-fast 8.82 0.17 34.50 35.00
pidigits 9.78 0.69 38.47 38.53
regexdna-redux 7.59 0.59 30.89 30.87
regexdna 7.10 0.73 28.31 28.31
revcomp-fast 9.35 0.59 38.59 38.59
revcomp 8.12 0.60 33.21 33.00
spectralnorm 4.49 0.46 17.80 18.12

Plots:

01__compile_time
02__exec_time
03__nrm_compile_time
04__nrm_exec_time

Copy link
Contributor

@e-kayrakli e-kayrakli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I left some superficial comments inline, I don't know if reading the diff is an efficient way of reviewing this. At a high level, I am already aware of how this is architected and I like it. My two big-ticket wishes for this work's future:

util/test/chplExperimentGatherUtils/boilerplate.bash Outdated Show resolved Hide resolved
@@ -0,0 +1,33 @@
#!/usr/bin/env python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you know my wish already, but it'd be really great if this could read the performance keys that are reported by the test itself, rather than looking at what start_test results. Which are only relevant for shootouts as far as I know.

p.save(f'logs/{filename}')

report(tbl_amd, "stream_amd.png")
report(tbl_nvidia, "nvidia_amd.png")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvidia_amd -> stream_nvidia?

rm -fr ./cuda-stream
git clone https://github.com/bcumming/cuda-stream.git
cd cuda-stream
make
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default arch is pretty old (sm_35) and some of the systems I've run on don't support that. Passing in CHPL_GPU_ARCH works for me though. I did this with:

make ARCH=$(printchplenv --all --internal 2>/dev/null | grep CHPL_GPU_ARCH | sed 's/^.*: //')

import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to note in some of the docs that matplotlib needs to be installed. Maybe even a requirements file to help with the install?

@stonea stonea force-pushed the chplExperiment branch 3 times, most recently from 3a5b06d to af52583 Compare September 22, 2023 15:51
@stonea stonea force-pushed the chplExperiment branch 2 times, most recently from 4d02e20 to 673e5d3 Compare September 29, 2023 22:37
stonea added 28 commits August 30, 2024 15:16
---
Signed-off-by: Andy Stone <[email protected]>
---
Signed-off-by: Andy Stone <[email protected]>
---
Signed-off-by: Andy Stone <[email protected]>
---
Signed-off-by: Andy Stone <[email protected]>
---
Signed-off-by: Andy Stone <[email protected]>
---
Signed-off-by: Andy Stone <[email protected]>
---
Signed-off-by: Andy Stone <[email protected]>
---
Signed-off-by: Andy Stone <[email protected]>
---
Signed-off-by: Andy Stone <[email protected]>
---
Signed-off-by: Andy Stone <[email protected]>
---
Signed-off-by: Andy Stone <[email protected]>
---
Signed-off-by: Andy Stone <[email protected]>
---
Signed-off-by: Andy Stone <[email protected]>
---
Signed-off-by: Andy Stone <[email protected]>
---
Signed-off-by: Andy Stone <[email protected]>
---
Signed-off-by: Andy Stone <[email protected]>
---
Signed-off-by: Andy Stone <[email protected]>
---
Signed-off-by: Andy Stone <[email protected]>
---
Signed-off-by: Andy Stone <[email protected]>
---
Signed-off-by: Andy Stone <[email protected]>
---
Signed-off-by: Andy Stone <[email protected]>
---
Signed-off-by: Andy Stone <[email protected]>
---
Signed-off-by: Andy Stone <[email protected]>
---
Signed-off-by: Andy Stone <[email protected]>
---
Signed-off-by: Andy Stone <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants