[REVIEW] Add weighted K-Means sampling for SHAP #4051

Nanthini10 · 2021-07-13T15:57:33Z

Adding sampling method for SHAP using k-means, adapted from https://github.com/slundberg/shap/blob/9411b68e8057a6c6f3621765b89b24d82bee13d4/shap/utils/_legacy.py

Moving the code from interpret-community package for easier maintenance.

Chose not to add comparison with SHAP as it will add a dependency to SHAP not sure if we want that.

Closes #4000

…sampling

python/cuml/explainer/sampling.py

codecov-commenter · 2021-07-27T02:07:47Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.08@c9abba1). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-21.08    #4051   +/-   ##
===============================================
  Coverage                ?   85.80%           
===============================================
  Files                   ?      232           
  Lines                   ?    18314           
  Branches                ?        0           
===============================================
  Hits                    ?    15714           
  Misses                  ?     2600           
  Partials                ?        0

Flag	Coverage Δ
dask	`48.12% <0.00%> (?)`
non-dask	`78.31% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c9abba1...7b6c472. Read the comment docs.

dantegd · 2021-07-28T18:54:58Z

python/cuml/explainer/sampling.py

+    if output_dtype == cudf.DataFrame:
+        group_names = X.columns
+        X = X.values
+    elif output_dtype == cudf.Series:
+        group_names = X.name
+        X = X.values.reshape(-1, 1)
+    elif output_dtype == pd.DataFrame:
+        group_names = X.columns
+        X = cp.array(X.values)
+    elif output_dtype == pd.Series:
+        group_names = X.name
+        X = cp.array(X.values.reshape(-1, 1))
+    else:
+        # it's either numpy, cupy or numba
+        if output_dtype == cuda.devicearray.DeviceNDArrayBase:
+            X = cp.array(X)
+        elif output_dtype == np.ndarray:
+            X = cp.array(X)
+        try:
+            # more than one column
+            group_names = [str(i) for i in range(X.shape[1])]
+        except IndexError:
+            # one column
+            X = X.reshape(-1, 1)
+            group_names = ['0']


This code probably can be simplified further, but we can do that as a follow up PR for 21.10

Opened an issue: #4121

dantegd · 2021-07-28T18:55:11Z

@gpucibot merge

Adding sampling method for SHAP using k-means, adapted from https://github.com/slundberg/shap/blob/9411b68e8057a6c6f3621765b89b24d82bee13d4/shap/utils/_legacy.py Moving the code from interpret-community package for easier maintenance. Chose not to add comparison with SHAP as it will add a dependency to SHAP not sure if we want that. Closes rapidsai#4000 Authors: - Nanthini (https://github.com/Nanthini10) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4051

Nanthini10 added 6 commits July 9, 2021 00:55

Initial script

2b54c99

Merge remote-tracking branch 'upstream/branch-21.08' into add-kmeans-…

47e8b42

…sampling

Add k means sampling; todo: add tests

5e0e337

Update init

7dd1b2d

ADD tests for kmeans

1767045

STYLE changes

a092a3f

Nanthini10 requested a review from a team as a code owner July 13, 2021 15:57

github-actions bot added the Cython / Python Cython or Python issue label Jul 13, 2021

Nanthini10 added 3 - Ready for Review Ready for review by team non-breaking Non-breaking change feature request New feature or request labels Jul 13, 2021

Add copyright for test file

12863ea

dantegd requested changes Jul 19, 2021

View reviewed changes

python/cuml/explainer/sampling.py Outdated Show resolved Hide resolved

python/cuml/explainer/sampling.py Outdated Show resolved Hide resolved

python/cuml/explainer/sampling.py Outdated Show resolved Hide resolved

python/cuml/explainer/sampling.py Outdated Show resolved Hide resolved

dantegd added 4 - Waiting on Author Waiting for author to respond to review and removed 3 - Ready for Review Ready for review by team labels Jul 19, 2021

Nanthini10 added 4 commits July 26, 2021 18:31

support cpu types, input_utils, random state update

427a20c

doc update

a2723a2

Style fix

8648c64

ADD API decorator

7b6c472

Nanthini10 requested a review from dantegd July 26, 2021 20:52

Nanthini10 added 4 - Waiting on Reviewer Waiting for reviewer to review or respond and removed 4 - Waiting on Author Waiting for author to respond to review labels Jul 26, 2021

dantegd approved these changes Jul 28, 2021

View reviewed changes

rapids-bot bot merged commit 6242984 into rapidsai:branch-21.08 Jul 28, 2021

Nanthini10 mentioned this pull request Jul 28, 2021

[BUG] Simplify K-Means sampling code #4121

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] Add weighted K-Means sampling for SHAP #4051

[REVIEW] Add weighted K-Means sampling for SHAP #4051

Nanthini10 commented Jul 13, 2021 •

edited

Loading

codecov-commenter commented Jul 27, 2021

dantegd Jul 28, 2021

Nanthini10 Jul 28, 2021

dantegd commented Jul 28, 2021

[REVIEW] Add weighted K-Means sampling for SHAP #4051

[REVIEW] Add weighted K-Means sampling for SHAP #4051

Conversation

Nanthini10 commented Jul 13, 2021 • edited Loading

codecov-commenter commented Jul 27, 2021

Codecov Report

dantegd Jul 28, 2021

Choose a reason for hiding this comment

Nanthini10 Jul 28, 2021

Choose a reason for hiding this comment

dantegd commented Jul 28, 2021

Nanthini10 commented Jul 13, 2021 •

edited

Loading