cuml.experimental SHAP improvements #3433

dantegd · 2021-01-29T17:03:45Z

Closes #1739

Addresses most items of #3224

…18-fea-kshap-opt

cpp/src/explainer/permutation_shap.cu

python/cuml/experimental/explainer/permutation_shap.pyx

JohnZed

Looks good! Only small comments

python/cuml/experimental/explainer/kernel_shap.pyx

python/cuml/experimental/explainer/permutation_shap.pyx

codecov-io · 2021-02-01T22:55:17Z

Codecov Report

Merging #3433 (3d7f6ec) into branch-0.18 (550121b) will increase coverage by 0.06%.
The diff coverage is n/a.

@@               Coverage Diff               @@
##           branch-0.18    #3433      +/-   ##
===============================================
+ Coverage        71.48%   71.55%   +0.06%     
===============================================
  Files              207      212       +5     
  Lines            16748    17082     +334     
===============================================
+ Hits             11973    12223     +250     
- Misses            4775     4859      +84

Impacted Files	Coverage Δ
python/cuml/neighbors/ann.pxd	`68.23% <0.00%> (-18.83%)`	⬇️
python/cuml/common/timing_utils.py	`42.85% <0.00%> (-7.15%)`	⬇️
.../dask/feature_extraction/text/tfidf_transformer.py	`37.50% <0.00%> (-6.05%)`	⬇️
python/cuml/dask/preprocessing/label.py	`34.00% <0.00%> (-4.89%)`	⬇️
python/cuml/dask/neighbors/nearest_neighbors.py	`25.97% <0.00%> (-4.52%)`	⬇️
python/cuml/dask/naive_bayes/naive_bayes.py	`37.68% <0.00%> (-4.43%)`	⬇️
python/cuml/dask/cluster/kmeans.py	`50.00% <0.00%> (-4.00%)`	⬇️
python/cuml/dask/decomposition/base.py	`36.58% <0.00%> (-2.95%)`	⬇️
...ython/cuml/feature_extraction/_tfidf_vectorizer.py	`85.36% <0.00%> (-2.87%)`	⬇️
...ython/cuml/dask/neighbors/kneighbors_classifier.py	`19.80% <0.00%> (-2.53%)`	⬇️
... and 71 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c9c8619...3d7f6ec. Read the comment docs.

dantegd · 2021-02-02T20:43:30Z

rerun tests

dantegd · 2021-02-03T00:20:02Z

rerun tests

dantegd · 2021-02-03T12:36:55Z

rerun tests

dantegd · 2021-02-03T12:42:37Z

rerun tests

JohnZed

Looks good! I had only small suggestions, mostly stuff that can be deferred to future PRs since this is still experimental. My only concern is that the number of variations supported in datatypes (sklearn model with pandas background data or cuml with f-ordered numpy or ...) makes it hard to test all paths of the base shap initialization. Let's look at codecov there for additional test ideas and be open to simplifying the options if necessary.

JohnZed · 2021-02-04T05:01:13Z

python/cuml/experimental/explainer/base.pyx

+
+    void shap_main_effect_dataset "ML::Explainer::shap_main_effect_dataset"(
+        const handle_t& handle,
+        float* dataset,


(in the underlying API) should dataset be const?

dataset is where the output is generated, maybe I should change the name to avoid the confusion?

JohnZed · 2021-02-04T05:07:34Z

python/cuml/experimental/explainer/base.pyx

+    ----------
+    model : function
+        Function that takes a matrix of samples (n_samples, n_features) and
+        computes the output for those samples with shape (n_samples). Function


Ah, bummer so there is no way to use the tags api because we need to take the function rather than the model

already use the tags API by getting the owning object of the function (if it exists) and getting tags from that:

cuml/python/cuml/experimental/explainer/common.py

Line 21 in 39c7262

def get_tag_from_model_func(func, tag, default=None):

python/cuml/experimental/explainer/base.pyx

JohnZed · 2021-02-04T05:11:29Z

python/cuml/experimental/explainer/base.pyx

+                                                   default=np.float32)
+        else:
+            if dtype in [np.float32, np.float64]:
+                self.dtype = np.dtype(dtype)


out of curiosity why do you have to convert to np.dtype?

I was doing the wrong order of things, I use the dtype function of numpy so that we accept string description of the dtypes without additional work

python/cuml/experimental/explainer/base.pyx

python/cuml/test/experimental/test_explainer_permutation_shap.py

python/cuml/test/experimental/test_explainer_kernel_shap.py

…otting calls

…fea-kshap-opt

JohnZed

Changes look good - just some doc and test suggestions. I think there is still a california_housing test coming? We could split that to the next PR too.

JohnZed · 2021-02-05T23:58:06Z

python/cuml/experimental/explainer/base.pyx

@@ -213,13 +198,17 @@ class SHAPBase():
            )
        )

+        # public attribute saved as NumPy for compatibility with the legacy
+        # SHAP potting functions
+        self.expected_value = cp.asnumpy(self._expected_value)


This makes sense, but it's a deviation from our standard approach... can you add a docstring to explain this? Can be a follow up PR.

Also would be really good to have a test of compatibility with SHAP plotting so we never break this (again, follow on PR ok)

… condition

…d other small fixes

dantegd · 2021-02-08T15:45:32Z

rerun tests

JohnZed

Pre-approving with some small suggestions/questions. Looks great!

JohnZed · 2021-02-08T16:42:58Z

cpp/src_prims/linalg/lstsq.cuh

+  // gemv, which could cause a very sporadic race condition in Pascal and
+  // Turing GPUs that caused it to give the wrong results. Details:
+  // https://github.com/rapidsai/cuml/issues/1739
+  rmm::device_uvector<math_t> tmp_vector(n_cols, stream);


how about something like tmp_gemv_result or otherwise indicating its use?

JohnZed · 2021-02-08T20:22:48Z

python/cuml/experimental/explainer/common.py

 def output_list_shap_values(X, dimensions, output_type):
    if output_type == 'cupy':
        if dimensions == 1:
            return X[0]
        else:
-            return X
+            res = []


Super picky but this seems like either a list comprehension or just list(X) would be nicer

JohnZed · 2021-02-08T20:26:42Z

python/cuml/test/experimental/test_explainer_kernel_shap.py

@@ -399,6 +416,11 @@ def test_l1_regularization(exact_tests_dataset, l1_type):
     0.00088981]
 ]

+housing_regression_result = np.array(


Was this obtained by running shap? Would be good to note in a comment what you did to get it and what version you used.

JohnZed · 2021-02-09T21:53:40Z

rerun tests

JohnZed · 2021-02-09T21:53:47Z

@gpucibot merge

dantegd added 6 commits January 29, 2021 10:49

ENH Permutation SHAP C++ comments to make code clearer

124413b

FEA Add main effects calculation and other improvements to base

68af7f2

ENH Improvements to perm shap and use common base explain

7ec511c

ENH Improvements to kern shap and use common base explain

4ba9adc

ENH test updates and add iterator to test stability of kshap in CI

394e52e

Merge branch 'branch-0.18' of https://github.com/rapidsai/cuml into 0…

0b0280e

…18-fea-kshap-opt

dantegd added 2 - In Progress Currenty a work in progress CUDA / C++ CUDA issue Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function breaking Breaking change labels Jan 29, 2021

dantegd requested review from a team as code owners January 29, 2021 17:03

github-actions bot added the libcuml label Jan 29, 2021

FIX Copyright year

27d984f

lowener reviewed Jan 29, 2021

View reviewed changes

cpp/src/explainer/permutation_shap.cu Outdated Show resolved Hide resolved

python/cuml/experimental/explainer/permutation_shap.pyx Outdated Show resolved Hide resolved

FIX PEP8 fixes

72d3dfd

JohnZed requested changes Jan 29, 2021

View reviewed changes

dantegd added 2 commits February 1, 2021 13:26

wqMerge remote-tracking branch 'origin' into 018-fea-kshap-opt

feee40a

FIX Re commit cython shap base class that was left out by accident

b2051ad

dantegd added 2 commits February 3, 2021 19:14

ENH multiple enhancements from PR review feedback and demo

dcf97f0

FIX Update copyright year

6d27b74

JohnZed requested changes Feb 4, 2021

View reviewed changes

dantegd added 3 commits February 5, 2021 16:29

ENH multiple enhancements from PR review feedback and interop with pl…

25bd20d

…otting calls

Merge branch '018-fea-kshap-opt' of github.com:dantegd/cuml into 018-…

dde03ad

…fea-kshap-opt

Merge remote-tracking branch 'origin' into 018-fea-kshap-opt

36a2521

JohnZed reviewed Feb 6, 2021

View reviewed changes

dantegd added 8 commits February 6, 2021 17:50

FIX OLS code simplification and workaround for Pascal/Volta gemv race…

95996da

… condition

Merge remote-tracking branch 'fork/018-debug-ols' into 018-fea-kshap-opt

9a21628

ENH More small corrections and utility to fetch housing dataset

0e6ef69

FIX clang format fixes

3d7f6ec

DOC Link github issue in OLS fixed code

dd829b0

ENH Improvements to Explanation and interop with main SHAP package an…

08ad62c

…d other small fixes

ENH Added housing pytest

1540fa0

FIX Copyright year

9a9d5eb

JohnZed approved these changes Feb 8, 2021

View reviewed changes

rapids-bot bot merged commit 8082f3b into rapidsai:branch-0.18 Feb 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuml.experimental SHAP improvements #3433

cuml.experimental SHAP improvements #3433

dantegd commented Jan 29, 2021 •

edited

Loading

JohnZed left a comment

codecov-io commented Feb 1, 2021 •

edited

Loading

dantegd commented Feb 2, 2021

dantegd commented Feb 3, 2021

dantegd commented Feb 3, 2021

dantegd commented Feb 3, 2021

JohnZed left a comment

JohnZed Feb 4, 2021

dantegd Feb 5, 2021

JohnZed Feb 4, 2021

dantegd Feb 4, 2021

JohnZed Feb 4, 2021

dantegd Feb 5, 2021

JohnZed left a comment

JohnZed Feb 5, 2021

dantegd commented Feb 8, 2021

JohnZed left a comment

JohnZed Feb 8, 2021

JohnZed Feb 8, 2021

JohnZed Feb 8, 2021

JohnZed commented Feb 9, 2021

JohnZed commented Feb 9, 2021

cuml.experimental SHAP improvements #3433

cuml.experimental SHAP improvements #3433

Conversation

dantegd commented Jan 29, 2021 • edited Loading

JohnZed left a comment

Choose a reason for hiding this comment

codecov-io commented Feb 1, 2021 • edited Loading

Codecov Report

dantegd commented Feb 2, 2021

dantegd commented Feb 3, 2021

dantegd commented Feb 3, 2021

dantegd commented Feb 3, 2021

JohnZed left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JohnZed left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dantegd commented Feb 8, 2021

JohnZed left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JohnZed commented Feb 9, 2021

JohnZed commented Feb 9, 2021

dantegd commented Jan 29, 2021 •

edited

Loading

codecov-io commented Feb 1, 2021 •

edited

Loading