Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated benchmark runner script #207

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,15 +108,17 @@ Then configure the `--sheet` and `--tab` arguments in benchmark_config.yaml.

### Running all of the Queries

The included `benchmark_runner.py` script will run all queries sequentially. Configuration for this type of end-to-end run is specified in `benchmark_runner/benchmark_config.yaml`.
The included `benchmark_runner.sh` script will run all queries sequentially. Configuration for this type of end-to-end run is specified in `benchmark_runner/benchmark_config.yaml`.

First, set `GPU_BDB_HOME` in the bash script to the location of this repository. This is the same environment variable mentioned in the configuration above.

To run all queries, cd to `gpu_bdb/` and:

```python
python benchmark_runner.py --config_file benchmark_runner/benchmark_config.yaml
bash benchmark_runner.sh
```

By default, this will run each Dask query once, and, if BlazingSQL queries are enabled in `benchmark_config.yaml`, each BlazingSQL query once. You can control the number of repeats by changing the `N_REPEATS` variable in the script.
By default, this will run each Dask query once. If BlazingSQL queries are enabled with `INCLUDE_BLAZING` in `benchmark_runner.sh` and in `benchmark_config.yaml`, this will run each BlazingSQL query once. You can control the number of repeats by changing the `N_REPEATS` variable in the script.


## BlazingSQL
Expand Down
96 changes: 0 additions & 96 deletions gpu_bdb/benchmark_runner.py

This file was deleted.

42 changes: 42 additions & 0 deletions gpu_bdb/benchmark_runner.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#!/bin/bash

USERNAME=$(whoami)

if [ -z "$GPU_BDB_HOME" ]
then
GPU_BDB_HOME=/raid/$USERNAME/prod/gpu-bdb
else
GPU_BDB_HOME=$GPU_BDB_HOME
fi

INCLUDE_DASK=true
INCLUDE_BLAZING=false
N_REPEATS=1

# Dask queries
if $INCLUDE_DASK; then
for qnum in {01..30}
do
cd $GPU_BDB_HOME/gpu_bdb/queries/q$qnum/
for j in $(seq 1 $N_REPEATS)
do
python gpu_bdb_query_$qnum.py --config_file ../../benchmark_runner/benchmark_config.yaml
sleep 3
done
sleep 3
done
fi

# BlazingSQL Queries
if $INCLUDE_BLAZING; then
for qnum in {01..30}
do
cd $GPU_BDB_HOME/gpu_bdb/queries/q$qnum/
for j in $(seq 1 $N_REPEATS)
do
python gpu_bdb_query_$qnum\_sql.py --config_file ../../benchmark_runner/benchmark_config.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that launching a process for each script will add library import-related overheads which for 30 queries quickly add up for the whole benchmark.

IIRC they add to like 30-50s of the whole run time which is significant

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point, and something I had forgotten. Let's discuss this more offline

sleep 3
done
sleep 3
done
fi
6 changes: 3 additions & 3 deletions gpu_bdb/benchmark_runner/slurm/run_bench.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
set -e pipefail

USERNAME=$(whoami)
GPU_BDB_HOME=$HOME/gpu-bdb
LOGDIR=$HOME/dask-local-directory/logs
STATUS_FILE=${LOGDIR}/status.txt

Expand All @@ -16,6 +15,8 @@ CONDA_ENV_PATH="/opt/conda/etc/profile.d/conda.sh"
source $CONDA_ENV_PATH
conda activate $CONDA_ENV_NAME

export GPU_BDB_HOME=$HOME/gpu-bdb

if [[ "$SLURM_NODEID" -eq 0 ]]; then
bash $GPU_BDB_HOME/gpu_bdb/cluster_configuration/cluster-startup-slurm.sh SCHEDULER &
echo "STARTED SCHEDULER"
Expand All @@ -29,8 +30,7 @@ if [[ "$SLURM_NODEID" -eq 0 ]]; then
# echo "Starting load test.."
# python queries/load_test/gpu_bdb_load_test.py --config_file benchmark_runner/benchmark_config.yaml > $LOGDIR/load_test.log
echo "Starting E2E run.."
python benchmark_runner.py --config_file benchmark_runner/benchmark_config.yaml > $LOGDIR/benchmark_runner.log

bash benchmark_runner.sh
echo "FINISHED" > ${STATUS_FILE}
else
sleep 15 # Sleep and wait for the scheduler to spin up
Expand Down