Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for SHAP #219

Open
mortonjt opened this issue Oct 2, 2022 · 3 comments
Open

Support for SHAP #219

mortonjt opened this issue Oct 2, 2022 · 3 comments
Labels
help wanted Extra attention is needed type:improvement Making something better.

Comments

@mortonjt
Copy link

mortonjt commented Oct 2, 2022

Addition Description
SHAP is one of the state-of-the-art methods for computing feature importance using concepts from game theory.
Briefly, for each prediction, SHAP will estimate how much each feature contributed to the prediction, by computing leave-one-feature-out estimation across all possible subsets of features (making it optimal, while being scalable). Shapely values can be positive or negative, indicating if a feature contributed "positively" or "negatively" to a prediction. See original paper for details as well as the follow up solution for tree-ensemble methods

Current Behavior
Feature importance is estimated based on leave-one-feature out estimation, based on only the full table (i.e. for 1000 features, feature importance is based on 1000 iterations of leaving out a feature). Feature importances are strictly positive, so directionality cannot be inferred. It is also suboptimal.

Proposed Behavior
It would be useful if there is a separate method that computes Shapley values for Gradient Boosting or Random Forests classifiers.
The syntax is simple, requiring 2 lines of additional code after fitting the model (see here). I have verified that this code is functional.

Questions

  1. Would having an optional dependency to the Shap package acceptable? If there is a separate command, it is easier to self-contain without needing to add Shap as a required dependency to the entire QIIME2 suite.

Comments

  1. There are many options for visualizations in terms of visualization overall contribution and interactions between features. While the forceplot is a reasonable default visualization, but I think having the outputted Shapely values should be the minimum output, since there are so many use cases for interpreting them.

References

  1. https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html
  2. https://www.nature.com/articles/s42256-019-0138-9
@nbokulich
Copy link
Member

Hi @mortonjt ,

Thanks for opening this feature request! Adding a SHAP wrapper has been on the unwritten issue list for some time now 😁

Would you be interested in working on this method?

Would having an optional dependency to the Shap package acceptable? If there is a separate command, it is easier to self-contain without needing to add Shap as a required dependency to the entire QIIME2 suite.

Technically this would be possible but maybe not desirable, as it complicates installation. How large are the SHAP packages? I think we should make SHAP a dependency if the license is compatible, and as long as it does not introduce conflicts. CC: @ebolyen @misialq for any thoughts on this.

There are many options for visualizations in terms of visualization overall contribution and interactions between features. While the forceplot is a reasonable default visualization, but I think having the outputted Shapely values should be the minimum output, since there are so many use cases for interpreting them.

Yes! I agree, output the SHAP values and these can be passed to different other plots... this gives more flexibility also in case other relevant visualization options are added in other Q2 plugins.

cc: @adamovanja

@lizgehret lizgehret linked a pull request Nov 1, 2022 that will close this issue
@lizgehret lizgehret moved this to Needs Triage in QIIME 2 - Triage 🚑 Nov 1, 2022
@lizgehret lizgehret moved this from Needs Triage to Awaiting Info in QIIME 2 - Triage 🚑 Nov 1, 2022
@lizgehret lizgehret added the type:improvement Making something better. label Nov 1, 2022
@lizgehret
Copy link
Member

Closing this issue since the related PRs have been closed.

@lizgehret lizgehret closed this as not planned Won't fix, can't repro, duplicate, stale Dec 11, 2023
@nbokulich
Copy link
Member

Re-opening as this has been requested again on the QIIME 2 forum.

Forum x-ref

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed type:improvement Making something better.
Development

Successfully merging a pull request may close this issue.

3 participants