This repository shows how to distribute explanations with KernelSHAP one a single node or a Kubernetes cluster using ray
. The predictions of a logistic regression model on 2560
instances from the Adult
dataset are explained using KernelSHAP configured with a background set of 100
samples from the same dataset. The data preprocessing and model fitting steps are available in the scripts/
folder, but both the data and the model will be automatically downloaded by the benchmarking scripts.
- Install
conda
- Create a virtual environment with
conda create --name shap python=3.7
- Activate the environment with
conda activate shap
- Execute
pip install .
in order to install the dependencies needed to run the benchmarking scripts
Two code versions are available:
- One using a parallel pool of
ray
actors, which consume small subsets of the2560
dataset to be explained - One using
ray serve
instead of the parallel pool
The two methods can be run from the repository root, using the scripts benchmarks/ray_pool.py
and bechmarks/serve_explanations.py
, respectively. Options that can be configured are:
- number of actors/replicas that the task is going to be distributed on (e.g.,
--workers 5
(pool),--replicas 5
(ray serve)) - if a benchmark (i.e., redistributing the task over an increasingly large pool or number of replicas) is to be performed (
-benchmark 0
to disable orbenchmark 1
to enable) - the number of times the task is run for the same configuration in benchmarking mode (e.g,
--nruns 5
) - how many instances can be sent to an actor/replica at once (this is a required argument) (e.g.,
-b 1 5 10
(pool)-batch 1 5 10
(ray serve)). If more than one value is passed after the argument name, the task (or benchmarking) will be executed for different batch sizes
This requires you to have access to a Kubernetes cluster and have kubectl
installed. Don't forget to export the path to the cluster configuration .yaml
file in your KUBECONFIG
environment variable, as described here before moving on to the next steps.
The ray_pool.py
and serve_explanations.py
have been modified to be deployable in the kubernetes and prefixed by k8s_
. The benchmark experiments can be run via the bash
scripts in the benchmarks/
folder. These scripts:
- Apply the appropriate k8s manifest in
cluster/
to the k8s cluster - Upload a
k8s*.py
file to it - Run the script
- Pull the results and save them in the
results
directory
Specifically:
- Calling
bash benchmarks/k8s_benchmark_pool.sh 10 20
will run the benchmark with increasing number of workers (the cluster is reset as the number of workers is increased). By default the experiment is run with batches of sizes1 5
and10
. This can be changed by updating the value ofBATCH
incluster/Makefile.pool
- Calling
bash benchmarks/k8s_benchmark_serve.sh 10 20 ray
will run the benchmark with increasing number of workers and batch size of1 5
and10
for each worker. The batch size setting can be modified from the.sh
script itself. Theray
argument means thatray
is able to batch single requests together and dispatch them to the same worker. If replaced bydefault
, minibatches will be distributed to each worker
The experiments were run on a compute-optimized dedicated machine in Digital Ocean with 32vCPUs. This explains why the performance gains attenuation below.
The results obtained running the task using the ray
parallel pool are below:
Distributing using ray serve yields similar results:
The experiments were run on a cluster consisting of two compute-optimized dedicated machine in Digital Ocean with 32vCPUs each. This explains why the performance gains attenuation below.
The results obtained running the task using the ray
parallel pool over a two-node cluster are shown below:
Distributing using ray serve yields similar results: