ShapPack is a Python package for interpretable machine learning based on Shapley values.
ShapPack is currently in beta and under active development!
$ pip install shappack
The usage of ShapPack is almost the same as that of slundberg/shap.
import shappack
import numpy as np
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVR
SEED = 123
np.random.seed(SEED)
# Prepare dataset
boston = load_boston()
X_train, X_test, y_train, y_test = train_test_split(boston["data"], boston["target"], test_size=0.2, random_state=SEED)
scaler = StandardScaler()
X_train_std = scaler.fit_transform(X_train)
X_test_std = scaler.transform(X_test)
# Prepare model
model = SVR(kernel="rbf")
model.fit(X_train_std, y_train)
# Coumpute SHAP value
i = 2
explainer = shappack.KernelExplainer(model.predict, X_train_std[:100])
shap_value = explainer.shap_values(X_test_std[i], n_workers=-1)
For now, ShapPack does not have own visualization mechanism, so it is necessary to use slundberg/shap for visualization.
import shap
shap.initjs()
shap.force_plot(explainer.base_val[0], shap_value, X_test[i], boston.feature_names)
An important difference from slundberg/shap is that the shap_values
function in ShapPack has three new arguments: n_workers
, skip_features
, and characteristic_func
, which contribute to faster computation and scalability.
The usage of each of these arguments is described below.
Specify the number of processes used for the calculation of SHAP values.
n_workers=-1
means using all processors.
If the program is running on a multi-core server, we can expect a reduction in computation time.
shap_value = explainer.shap_values(X_test_std[i], n_workers=-1)
We can skips the calculation of SHAP values for the features specified in skip_features
.
The features to be skipped can be specified by feature name or index number.
Note that we need to pass a list of feature names to KernelExplainer's feature_names
argument when specifying skip_features
by feature names.
explainer = shappack.KernelExplainer(model.predict, X_train_std[:100], feature_names=boston.feature_names)
skip_features=["PTRATIO", "TAX"]
shap_value = explainer.shap_values(X_test_std[i], skip_features=skip_features, n_workers=-1)
feature_names = np.delete(boston.feature_names, explainer.skip_idx)
x_test = np.delete(X_test[i], explainer.skip_idx)
shap.force_plot(explainer.base_val[0], shap_value, x_test, feature_names)
We can incorporate own implemented characteristic functions into the characteristic_func
argument.
The example below is a function that replaces the expected value calculation in the original Kernel SHAP's characteristic function with a minimum value calculation.
def my_characteristic_func(instance, subsets, model, data):
n_subsets = subsets.shape[0]
n_data = data.shape[0]
synth_data = np.tile(data, (n_subsets, 1))
for i, subset in enumerate(subsets):
offset = i * n_data
features_idx = np.where(subset == 1.0)[0]
synth_data[offset : offset + n_data, features_idx] = instance[:, features_idx][0]
model_preds = model(synth_data)
ey = np.zeros(n_subsets)
for i in range(n_subsets):
ey[i] = np.min(model_preds[i * n_data : i * n_data + n_data])
return ey
shap_value = explainer.shap_values(X_test_std[i], characteristic_func=my_characteristic_func)
This project is licensed under the terms of the MIT license, see LICENSE.