Skip to content

Commit

Permalink
[DOCS] Update feature importance overview
Browse files Browse the repository at this point in the history
  • Loading branch information
lcawl committed Sep 16, 2020
1 parent c37056b commit 4ba1595
Show file tree
Hide file tree
Showing 2 changed files with 33 additions and 22 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
55 changes: 33 additions & 22 deletions docs/en/stack/ml/df-analytics/ml-feature-importance.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,34 +2,45 @@
[[ml-feature-importance]]
= {feat-imp-cap}

experimental[]

{feat-imp-cap} values indicate which fields had the biggest impact on each
prediction that is generated by <<dfa-classification,{classification}>> or
<<dfa-regression,{regression}>> analysis. The features of the data points are
responsible for a particular prediction to varying degrees. {feat-imp-cap} shows
to what degree a given feature of a data point contributes to the prediction.
The {feat-imp} value can be either positive or negative depending on its effect
on the prediction. If the feature reduces the prediction value, the {feat-imp}
is negative, if it increases the prediction, then the {feat-imp} is positive.
The magnitude of {feat-imp} shows how significantly the feature affects the
prediction both locally (for a given data point) or generally (for the whole
data set).
prediction that is generated by {classification} or {regression} analysis. Each
field (or _feature_ of the data point) is responsible for the prediction to
varying degrees. In {kib}, you can examine the most important features in JSON
objects or decision plots:

{feat-imp-cap} in the {stack} is calculated using the SHAP (SHapley Additive
exPlanations) method as described in
https://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf[Lundberg, S. M., & Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In NeurIPS 2017].
[role="screenshot"]
image::images/regression-decision-plot.png["Feature importance values for a {regression} {dfanalytics-job} in {kib}"]

A {feat-imp} value can be either positive or negative depending on its effect
on the prediction. The magnitude of the {feat-imp} value shows how significantly the feature affects the prediction both locally (for a given data point) and generally (for the whole data set).
// Is there any way to know how much of a feature importance value reflects local vs general influence?

For {reganalysis}, each decision plot starts at a shared baseline, which is
the average of the prediction values for all the data points in the training
data set. When you add all of the feature importance values for a particular
data point to that baseline, you arrive at the numeric prediction value. If a
{feat-imp} value is negative, it reduces the prediction value. If a {feat-imp}
value is positive, it increases the prediction value.

By default, {feat-imp} values are not calculated when you configure the job via
the API. To generate this information, when you create a {dfanalytics-job} you
must specify the `num_top_feature_importance_values` property. When you
configure the job in {kib}, {feat-imp} values are calculated automatically. The
{feat-imp} values are stored in the {ml} results field for each document in the
destination index.
By default, {feat-imp} values are not calculated. To generate this information,
when you create a {dfanalytics-job} you must specify the `num_top_feature_importance_values` property. For examples, see
<<flightdata-classification>> and <<flightdata-regression>>.

NOTE: The number of {feat-imp} values for each document might be less than the
`num_top_feature_importance_values` property value. For example, it returns only
features that had a positive or negative effect on the prediction.
The {feat-imp} values are stored in the {ml} results field for each document in the destination index. The number of {feat-imp} values for each document might be less than the `num_top_feature_importance_values` property value. For example, it returns only features that had a positive or negative effect on the prediction.

You can therefore use feature importance to determine whether the predictions
are sensible. Is the relationship between the dependent variable and the
important features supported by your domain knowledge? The lessons you
learn about the importance of specific features might also affect your decision
to include them in future iterations of your trained model.

[[ml-feature-importance-readings]]
== Further reading

{feat-imp-cap} in the {stack} is calculated using the SHAP (SHapley Additive
exPlanations) method as described in
https://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf[Lundberg, S. M., & Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In NeurIPS 2017].

https://www.elastic.co/blog/feature-importance-for-data-frame-analytics-with-elastic-machine-learning[{feat-imp-cap} for {dfanalytics} with Elastic {ml}]

0 comments on commit 4ba1595

Please sign in to comment.