Our own Partial Dependence Implementation #2834

freddyaboulton · 2021-09-23T15:14:01Z

Pull Request Description

Fixes #2502
Fixes #2475

Same run-time as main for model understanding tests:
main: 9m 59s
this branch: 9m 14s

Plots match between this branch and main

This branch

Main

This branch

Main

After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

codecov · 2021-09-23T15:23:12Z

Codecov Report

Merging #2834 (635fcf4) into main (06d7df7) will decrease coverage by 0.1%.
The diff coverage is 99.1%.

@@           Coverage Diff           @@
##            main   #2834     +/-   ##
=======================================
- Coverage   99.8%   99.8%   -0.0%     
=======================================
  Files        302     303      +1     
  Lines      28148   28226     +78     
=======================================
+ Hits       28070   28145     +75     
- Misses        78      81      +3

Impacted Files	Coverage Δ
evalml/model_understanding/_partial_dependence.py	`98.8% <98.8%> (ø)`
evalml/model_understanding/graphs.py	`100.0% <100.0%> (ø)`
...del_understanding_tests/test_partial_dependence.py	`99.3% <100.0%> (+0.1%)`	⬆️
evalml/pipelines/components/utils.py	`98.4% <0.0%> (-1.6%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 06d7df7...635fcf4. Read the comment docs.

freddyaboulton · 2021-09-23T15:43:52Z

evalml/model_understanding/_partial_dependence.py

+            if not isinstance(feature_range, (np.ndarray, pd.Series)):
+                feature_range = np.array(feature_range)
+            if feature_range.ndim != 1:
+                raise ValueError(


I'm ok if this isn't covered. It's impossible to trigger this as a user because custom_range is not a public parameter. But I'd like to keep this check in case we refactor this in the future. Helped catch a couple of bugs during development.

freddyaboulton · 2021-09-23T15:44:47Z

evalml/model_understanding/graphs.py

@@ -653,6 +652,11 @@ def partial_dependence(
            is_datetime = [_is_feature_of_type(features, X, ww.logical_types.Datetime)]

        if isinstance(features, (list, tuple)):
+            if any(is_datetime) and len(features) > 1:


There used to be two if isinstance(features, (list, tuple)) checks. Consolidating into one now.

angela97lin

This is epic 👏!

I left some nitpicky comments but nothing blocking. Great work @freddyaboulton!

Also, the speedups are a cherry on top :)

angela97lin · 2021-09-28T21:44:32Z

evalml/tests/model_understanding_tests/test_partial_dependence.py

+        pl, X, features=("amount", "provider"), grid_resolution=5
+    )
+    assert not dep2way.isna().any().any()
+    # Minus 1 in the columns because there is `class_label`


+1, not minus?

Thank you!!

angela97lin · 2021-09-28T21:46:17Z

evalml/tests/model_understanding_tests/test_partial_dependence.py

+    )
+    assert not dep2way.isna().any().any()
+    # Minus 1 in the columns because there is `class_label`
+    assert dep2way.shape == (5, X["provider"].dropna().nunique() + 1)


Omega nitpick, but I think it'd be a good idea to set grid_resolution as a variable and use it above / here, whereassert dep2way.shape == (grid_resolution_variable,...)? Just so its more clear where this 5 value is coming from :)

I completely agree!

angela97lin · 2021-09-28T21:53:46Z

evalml/model_understanding/_partial_dependence.py

+    arrays = [np.asarray(x) for x in arrays]
+    shape = (len(x) for x in arrays)
+
+    ix = np.indices(shape)
+    ix = ix.reshape(len(arrays), -1).T
+
+    out = pd.DataFrame()
+
+    for n, arr in enumerate(arrays):
+        out[n] = arrays[n][ix[:, n]]
+
+    return out


This seems to be the same as https://github.com/scikit-learn/scikit-learn/blob/844b4be24d20fc42cc13b957374c718956a0db39/sklearn/utils/extmath.py#L655 except we return a dataframe, and since it's a public sklearn method, we could just import it--whatcha think? Also totally down to take their impl 😂

This is a great idea. I know as we first tredged through partial dependence that we borrowed a lot. Perhaps a bit more from some private methods than I would like, but it was necessary. If we can refactor to use their public methods, that's great.

@angela97lin Great point. Originally I wanted to use their method but the problem is that numpy arrays cannot handle mixed-types very well. So if we want to have a grid of categoricals and datetimes, the conversion storing it in a numpy array won't really work.

There may be a way around it I'm not seeing (maybe this) but IMO that's a nice to have as opposed to a requirement?

chukarsten

wow @freddyaboulton , this is amazing. I am really impressed. I feel like you cleaned up the code substantially, improve performance and enhanced functionality. This is a great PR. I had a question about the handling of the times, but that isn't blocking.

chukarsten · 2021-09-28T16:57:19Z

evalml/model_understanding/_partial_dependence.py

+        pd.Series: Range of dates between percentiles.
+    """
+    timestamps = np.array(
+        [X_dt - pd.Timestamp("1970-01-01")] // np.timedelta64(1, "s")


I don't know why, but I fixated on this. I think probably because I come from a natural science background...but is it worth us leaving the reference date and the quantum of time as variable? I don't think any of our common or current use cases extend to people doing time series modeling on like a chemical reaction timescale (~milli/microseconds). But I can definitely see pharma customers being interested in it.

Let me know what you think. I don't think we necessarily have to do the work here, but it might be nice to at least talk about it.

Fantastic point @chukarsten ! I think what this is getting at is making our custom_range internal parameter public. I think there can be value in letting users specify how the grid for their features is computed!

I will file a separate issue for tracking that.

chukarsten · 2021-09-28T21:22:52Z

evalml/model_understanding/_partial_dependence.py

+        prediction_method = pipeline.predict_proba
+
+    for _, new_values in grid.iterrows():
+        X_eval = X.copy()


Do we need to copy this each time? Does it make more sense to just rebuild the new dataframe with a concat or something at the end? If it's just as performant, then whatever, this makes sense and is clear.

Great point. I think we can move it out the loop. Will test it out!

freddyaboulton · 2021-09-28T22:51:10Z

@chukarsten @angela97lin Thank you so much for the reviews! I didn't think this would get into the coming release.

Kicked off perf tests out of paranoia to make sure none of the datasets error out on partial dependence. Will merge if those look good.

freddyaboulton · 2021-09-29T02:12:54Z

Perf tests here and they look good to me!

freddyaboulton force-pushed the 2502-our-own-partial-dependence branch from 457a2e3 to 91b7a18 Compare September 23, 2021 15:15

freddyaboulton marked this pull request as ready for review September 23, 2021 15:42

auto-assign bot assigned freddyaboulton Sep 23, 2021

freddyaboulton requested review from bchen1116, chukarsten, angela97lin, christopherbunn, dsherry, eccabay and ParthivNaresh and removed request for bchen1116 and chukarsten September 23, 2021 15:42

freddyaboulton commented Sep 23, 2021

View reviewed changes

freddyaboulton force-pushed the 2502-our-own-partial-dependence branch 2 times, most recently from 1f024ea to b6ad171 Compare September 28, 2021 14:51

angela97lin mentioned this pull request Sep 28, 2021

Add error-handling to predict/predict_proba: check if feature names / logical types match those seen during fit #2855

Closed

angela97lin approved these changes Sep 28, 2021

View reviewed changes

chukarsten approved these changes Sep 28, 2021

View reviewed changes

freddyaboulton added 4 commits September 28, 2021 18:52

Custom grids in partial dependence

37357b8

Docstrings + test

f6e0a66

Add to release notes

980f77f

Address comments

0756eb7

freddyaboulton force-pushed the 2502-our-own-partial-dependence branch from fa4c109 to 0756eb7 Compare September 28, 2021 22:52

Revert change

635fcf4

freddyaboulton merged commit e257b1b into main Sep 29, 2021

chukarsten mentioned this pull request Oct 1, 2021

Release v0.34.0 #2864

Merged

freddyaboulton deleted the 2502-our-own-partial-dependence branch May 13, 2022 15:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Our own Partial Dependence Implementation #2834

Our own Partial Dependence Implementation #2834

freddyaboulton commented Sep 23, 2021 •

edited

Loading

codecov bot commented Sep 23, 2021 •

edited

Loading

freddyaboulton Sep 23, 2021

freddyaboulton Sep 23, 2021 •

edited

Loading

angela97lin left a comment •

edited

Loading

angela97lin Sep 28, 2021

freddyaboulton Sep 28, 2021

angela97lin Sep 28, 2021

freddyaboulton Sep 28, 2021

angela97lin Sep 28, 2021

chukarsten Sep 28, 2021

freddyaboulton Sep 28, 2021 •

edited

Loading

chukarsten left a comment

chukarsten Sep 28, 2021

freddyaboulton Sep 28, 2021

chukarsten Sep 28, 2021

freddyaboulton Sep 28, 2021

freddyaboulton commented Sep 28, 2021

freddyaboulton commented Sep 29, 2021

Our own Partial Dependence Implementation #2834

Our own Partial Dependence Implementation #2834

Conversation

freddyaboulton commented Sep 23, 2021 • edited Loading

Pull Request Description

This branch

Main

This branch

Main

codecov bot commented Sep 23, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

freddyaboulton Sep 23, 2021 • edited Loading

Choose a reason for hiding this comment

angela97lin left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

freddyaboulton Sep 28, 2021 • edited Loading

Choose a reason for hiding this comment

chukarsten left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

freddyaboulton commented Sep 28, 2021

freddyaboulton commented Sep 29, 2021

freddyaboulton commented Sep 23, 2021 •

edited

Loading

codecov bot commented Sep 23, 2021 •

edited

Loading

freddyaboulton Sep 23, 2021 •

edited

Loading

angela97lin left a comment •

edited

Loading

freddyaboulton Sep 28, 2021 •

edited

Loading