Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any plans for supporting gradient boosted trees from sklearn #232

Closed
jverre-drivalytix opened this issue Aug 20, 2018 · 4 comments
Closed

Comments

@jverre-drivalytix
Copy link
Contributor

Hi,

Thanks for the great package !

It is possible to build gradient boosted trees using the sklearn library in much the same way as one would using the xgboost library.
However it is not currently supported in the TreeExplainer class, are there any plans to support this model type ?

I've tried making some changes to support it but it does not seem straight forward as support xgboost.

@slundberg
Copy link
Collaborator

I don't have plans right now. It would require exporting the trees in a format similar to how it is done for sklearn's random forest. But I have not looked at the GBM implementation to know if it would be easy. @jmschrei you know if this would be easy or hard?

@jverre-drivalytix
Copy link
Contributor Author

I have been investigating some more and it seems like it could work with some minimal changes. I am not very familiar with exactly how the SHAP values so it is possible that this won't work for certain edge cases.

In TreeExplainer.__init__, I added:

self.model_subtype = None

if ...
elif str(type(model)).endswith("sklearn.ensemble.gradient_boosting.GradientBoostingClassifier'>"):
    self.model_subtype = 'sklean_gb_classifier'
    scale = len(model.estimators_) * model.learning_rate
    self.base_offset = model.init_.predict
    self.trees = [Tree(e.tree_, scaling=scale) for e in model.estimators_[:,0]]
    self.less_than_or_equal = True
elif...

And in TreeExplainer._tree_shap_ind, I replaced phi[-1, :] = self.base_offset * self.tree_limit with:

if self.model_subtype == 'sklean_gb_classifier':
    offset = self.base_offset(self._current_X[i,:].reshape(1, -1))[0][0]
else:
    offset = self.base_offset 
    phi[-1, :] = offset * self.tree_limit

From my testing, I think it works as the sum of the SHAP values and the expected value is equal to the prediction probability.

If this is useful, I can put together a pull request but will need some help with testing.

@slundberg
Copy link
Collaborator

Ah right I forget we already the regressor version now and you just need the classifier version. What you added looks good, so please make a PR. The only change I see right now is that base_offset should be calculated in the constructor, but I haven't run it.

@jverre-drivalytix
Copy link
Contributor Author

Now supported

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants