-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chpl experiment #22873
base: main
Are you sure you want to change the base?
Chpl experiment #22873
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I left some superficial comments inline, I don't know if reading the diff is an efficient way of reviewing this. At a high level, I am already aware of how this is architected and I like it. My two big-ticket wishes for this work's future:
-
near-term: can we add a doc under https://github.com/chapel-lang/chapel/tree/main/doc/rst/developer/bestPractices explaining how to use this tool? Probably a good chunk of the PR message can be reused for that purpose
-
long-term: can we incorporate at least parts of it to
start_test
/sub_test
? It would be really cool if I could dostart_test --experiment hello.chpl
.
@@ -0,0 +1,33 @@ | |||
#!/usr/bin/env python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you know my wish already, but it'd be really great if this could read the performance keys that are reported by the test itself, rather than looking at what start_test results. Which are only relevant for shootouts as far as I know.
p.save(f'logs/{filename}') | ||
|
||
report(tbl_amd, "stream_amd.png") | ||
report(tbl_nvidia, "nvidia_amd.png") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvidia_amd
-> stream_nvidia
?
rm -fr ./cuda-stream | ||
git clone https://github.com/bcumming/cuda-stream.git | ||
cd cuda-stream | ||
make |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default arch is pretty old (sm_35
) and some of the systems I've run on don't support that. Passing in CHPL_GPU_ARCH
works for me though. I did this with:
make ARCH=$(printchplenv --all --internal 2>/dev/null | grep CHPL_GPU_ARCH | sed 's/^.*: //')
import numpy as np | ||
import matplotlib as mpl | ||
import matplotlib.pyplot as plt | ||
import matplotlib.ticker as ticker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to note in some of the docs that matplotlib needs to be installed. Maybe even a requirements file to help with the install?
e0540f4
to
15629ce
Compare
3a5b06d
to
af52583
Compare
4d02e20
to
673e5d3
Compare
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
…r scripts --- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
…to dat --- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
--- Signed-off-by: Andy Stone <[email protected]>
Introduction to chplExperiment
This PR introduces a framework for running experiments (of Chapel programs or the
chpl
compiler itself) and producing plots. This framework was designed with internal use in mind, though it may well develop into something that folks outside the Chapel's core implementors might find useful as well.My hope is this framework ends up being generally useful to folks beyond my self, but it was specifically motivated by the fact that I kept on having to regenerate this plot:
Why did I have to do that so often? Well, sometimes because we want to measure the impact of:
Adding to the complexity we want to consider:
There are other similar plots we'd like to generate. Our nightly performance graph infrastructure works well for tracking performance-over-time of of an individual test with a fixed set of parameters, but it doesn't do well for other types of plots such as those looking at scalability or exploring across one or more parameters.
So, traditionally, for a scalability plot like the one above we would generate it by either:
So, it would be nice if we had a more general framework for creating scripts to run these sorts of experiments and generate plots.
That's what this PR does: it adds in a set of scripts forming a general framework for running experiments and generating plots. Under this framework, users are expected to write three kinds of scripts:
These scripts, respectively:
Before digging into more detail, if you want to see some example scroll to the bottom of the PR and look under the "Want some more examples" section.
Architecture
So what is this framework? Well, here's an illustration of the architecture (considering each arrow as showing what uses what or in the case of the gray boxes what produces/consumes what):
The orange boxes show what scripts users need to create, and the green boxes shows various tools/scripts/libraries that form the "chplExperiment" framework. To create a new experiment/plot users are expected to write three scripts:
.dat
file..dat
files and produces one or more.png
files containing the plots of interest.I'll go into more details about what these three scripts are meant to look like. But first, I'll overview some of the underlying framework scripts/tools that are provided to the user:
chplExperiment
takes as a series of command line arguments forming a "table" of experiments to run. Each "row" of this table is an experiment and defines: (1) the name of the experiment, (2) a set of features that want enabled in Chapel (e.g. amd-gpu support, gasnet, etc.), and (3) a script to run to gather results.chplSetup
takes a list of features (for example those that were passed tochplExperiment
for a particular experiment) and sets the users environment up so that Chapel can be built with those features enabled.chplSetup
works based on a series of "cascading setup scripts", which define how to enable these features and tailor them for specific machines. This means if the user runschplSetup nvidia
(to set up NVIDIA GPU support) on a given machine it will set machine specific variables likeCHPL_GPU_ARCH
based on the machine its run on. For more information onchplSetup
, see https://github.hpe.com/hpe/chplSetupchpl_plot
is a Python library that takes data output from a gather script and produces a plot. Thechpl_plot
library has various functions to aid with manipulating data and formatting/tweaking the to-be-generated plot.The drive script
To get a sense of how this all fits together let's consider the scripts for generating the stream plot shown above
stream.plot.drive.bash
:Notice that
chplExperiment
is passed a number of command line options.--skip-if-config-error
is used to "skip" an experiment (rather than erroring out) ifprintchplenv
errors out after confiuring Chapel. This is necessary because we want this script to run on machines that may have NVIDIA GPUs or AMD GPUs or both, and want to gather whatever subset of experiments is applicable.--paint-with ./stream.plot.paint.py
tells the script that once all experiments have finished, it should run this Python script to produce the plots. These arguments can be omitted and the "./stream.plot.paint.py" script can be run manually, but it is provided here as a convenience.`# ... text`
serve a kind of comment within the BASH command (these can be removed without having any affect). These comments are meant to serve as column headers, illustrating the "tabular" nature of thechplExperiment
command.chplSetup
, (3) some optional arguments, and (4) a script to use to conduct the experiment. As far as optional arguments:--prebuild
argument takes a command to run before building Chapel. In this example, I show using--prebuild
to set the envrionment variableCHPL_GPU_MEM_STRATEGY=array_on_device
. In this particular example I could just as well have used the feature setamd:aod
(similar to what's done above for nvidia) but I do it using this alternative method to demonstrate that the feature is available.--skip-if-errs
argument gives an optional command to run before building Chapel. If the command returns a non-zero value then the experiment will be skipped. In this example we use this to skip the nvidia/hip baselines based on the absence of nvcc/hipcc commands.You'll notice this script dispatches to various gather scripts, specifically:
stream.plot.gather.cuda.bash
,stream.plot.gather.hip.bash
, andstream.plot.gather.chpl.bash
.The gather script
Let's consider
stream.plot.gather.chpl.bash
:Whenever a gather script is run it is passed the following as command line arguments:
$1
: The path to adat
file we expect the script to output$2
: The path to a directory we expect the script to dump any log or intermediate files it wants to have$3
: The name of the experimentIt's good practice to then:
set -e -x
).Given that these steps will be done in most gather scripts, these have been factored out into a "boilerplate" script that is sourced like such:
source $CHPL_HOME/util/test/chplExperimentGatherUtils/boilerplate.bash $@
.The next notable thing to point out is that when running the experiment we:
| tee -a $runLog
, which pipes the output of the command into a running logcat $runLog
, and run it through sed to extract the data we're gathering$datFile
. The dat file is meant to contain a space separated table of values where the first column is unlabeled.Coming out of the script the resulting
.dat
file might look like this:If the gather script is run through
chplExperiment
it will also "decorate" this.dat
file with comments showing the output fromprintchplenv --all --anonymize
along with some other information that may be useful for logging purposes.The paint script
The simplest possible paint script looks like this:
This script finds every
.dat
file under thelogs/
directory and produces an associated.png
.Often times you'll want to manipulate data or do other things to customize the plot. In the case of the "stream" plot, things are indeed more complicated:
With this script
load_tables()
will load a TableCollection that contains a Table for each.dat
file in thelogs/
directory. If all our experiments ran we'll have tables for:amd_aod.dat
,amd.dat
,cuda_baseline.dat
,hip_baseline.dat
,nvidia_aod.dat
,nvidia.dat
, but really what we want is to produce up to two plots: one for all the nvidia results and one for all the amd results.To do this we use
join
and pass it the list of tables that contain the data we wish to show in the same plot. So the line:join(tbls['hip_baseline'], tbls['amd'], tbls['amd_aod'])
creates a new table that is the "join" of the three smaller tables.We follow this command with:
with_title('Stream on %s AMD GPU' % os.getenv('CHPL_GPU_ARCH'))
, which gives a title to the table/plot.We create a convenience function
report
, which takes a table and filename and produces a plot of that table (if available). This function does some manipulation of the plot to give an xlabel, ylabel, and only show the last 5 "tick" values on the x axis (this is done because otherwise the x-tick labels appear to close to one another and overlap). For more information about the interface tochpl_plot
see the comments in thechpl_plot.py
file.A few other notable things:
chplExperiment
, log files are backed up -- When chplExperiment starts if it sees 'logs/' already exists it will move all the files under 'logs'/ into a new 'logs/backup/nnn' directory where 'nnn' is the smallest non-negative integer where there isn't an existing 'logs/backup/nnn'.print(tbl.md())
chplExperiment
are prefixed with__reuse__
then the experiment will not be run if an existing.dat
file exists for the experiment from the previous run (instead the.dat
file will be copied). This is useful if the user wishes to rerun a particular experiment as changes are being made without rerunning all experiments.__skip__
..dat
files and pick things up manually from there.Want some more examples?
Look at the these plots and the scripts used to produce them. Also check out the "log" files.
log.txt
shows the output from running the drive script. Various "running logs" show the executions of the program being measured. The.dat
files show results that are then gathered to produce the plotsStream
Table: Stream on NVIDIA GPU
Plot:
CHOP
join of: (chpl, cuda_only)
join of: (chpl, cuda_only) normalized to cuda_only
Plots:
Language Benchmarks Game
compile time
execution time
compile time ratio (over time w/ flat locale model)
execution time ratio (over time w/ flat locale model)
Plots: