-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for SHAP #219
Comments
Hi @mortonjt , Thanks for opening this feature request! Adding a SHAP wrapper has been on the unwritten issue list for some time now 😁 Would you be interested in working on this method?
Technically this would be possible but maybe not desirable, as it complicates installation. How large are the SHAP packages? I think we should make SHAP a dependency if the license is compatible, and as long as it does not introduce conflicts. CC: @ebolyen @misialq for any thoughts on this.
Yes! I agree, output the SHAP values and these can be passed to different other plots... this gives more flexibility also in case other relevant visualization options are added in other Q2 plugins. cc: @adamovanja |
Closing this issue since the related PRs have been closed. |
Re-opening as this has been requested again on the QIIME 2 forum. |
Addition Description
SHAP is one of the state-of-the-art methods for computing feature importance using concepts from game theory.
Briefly, for each prediction, SHAP will estimate how much each feature contributed to the prediction, by computing leave-one-feature-out estimation across all possible subsets of features (making it optimal, while being scalable). Shapely values can be positive or negative, indicating if a feature contributed "positively" or "negatively" to a prediction. See original paper for details as well as the follow up solution for tree-ensemble methods
Current Behavior
Feature importance is estimated based on leave-one-feature out estimation, based on only the full table (i.e. for 1000 features, feature importance is based on 1000 iterations of leaving out a feature). Feature importances are strictly positive, so directionality cannot be inferred. It is also suboptimal.
Proposed Behavior
It would be useful if there is a separate method that computes Shapley values for Gradient Boosting or Random Forests classifiers.
The syntax is simple, requiring 2 lines of additional code after fitting the model (see here). I have verified that this code is functional.
Questions
Comments
References
The text was updated successfully, but these errors were encountered: