lcawl · lcawl · Sep 23, 2020 · Sep 30, 2020 · Oct 26, 2020 · Oct 26, 2020
diff --git a/docs/en/stack/ml/df-analytics/flightdata-classification.asciidoc b/docs/en/stack/ml/df-analytics/flightdata-classification.asciidoc
@@ -346,7 +346,8 @@ GET model-flight-delay-classification/_search
 --------------------------------------------------
 // TEST[skip:TBD]
 
-The snippet below shows a part of a document with the annotated results:
+The snippet below shows the probability and score details for a document in the
+destination index:
 
 [source,console-result]
 --------------------------------------------------  
@@ -369,37 +370,10 @@ The snippet below shows a part of a document with the annotated results:
             ],
             "prediction_probability" : 0.9427605087816684,
             "prediction_score" : 0.3462468700158476,
-            "feature_importance" : [
-              {
-                "feature_name" : "DistanceMiles",
-                "classes" : [
-                  {
-                    "class_name" : false,
-                    "importance" : -1.4766536146534828
-                  },
-                  {
-                    "class_name" : true,
-                    "importance" : 1.4766536146534828
-                  }
-                ]
-              },
-              {
-                "feature_name" : "FlightTimeMin",
-                "classes" : [
-                  {
-                    "class_name" : false,
-                    "importance" : 1.0919201754729184
-                  },
-                  {
-                    "class_name" : true,
-                    "importance" : -1.0919201754729184
-                  }
-                ]
-              },
-              ...
+            ...
 --------------------------------------------------
 <1> An array of values specifying the probability of the prediction and the 
-score for each class. 
+score for each class.
 
 The class with the highest score is the prediction. In this example, `false` has
 a `class_score` of 0.35 while `true` has only 0.06, so the prediction will be
@@ -427,15 +401,18 @@ form of a decision plot:
 [role="screenshot"]
 image::images/flights-classification-importance.png["A decision plot for {feat-imp} values in {kib}"]
 
-The features with the most significant positive or negative impact appear at the
-top of the decision plot. Thus in this example, the features related to flight
-time and distance had the most significant influence on this prediction. This
-type of information can help you to understand how models arrive at their 
-predictions. It can also indicate which aspects of your data set are most
-influential or least useful when you are training and tuning your model.
+In {kib}, the decision path shows the relative impact of each feature on the
+probability of the prediction. The features with the most significant positive
+or negative impact appear at the top of the decision plot. Thus in this example,
+the features related to flight time and distance had the most significant
+influence on the probability value for this prediction. This type of information
+can help you to understand how models arrive at their predictions. It can also
+indicate which aspects of your data set are most influential or least useful
+when you are training and tuning your model.
 
-If you do not use {kib}, you can see summarized {feat-imp} values by using the
-{ref}/get-inference.html[get trained model API].
+If you do not use {kib}, you can see the summarized {feat-imp} values by using
+the {ref}/get-inference.html[get trained model API] and the individual values by
+searching the destination index.
 
 .API example
 [%collapsible]
@@ -446,8 +423,8 @@ GET _ml/trained_models/model-flight-delay-classification*?include=total_feature_
 --------------------------------------------------
 // TEST[skip:TBD]
 
-The snippet below shows an example of the total {feat-imp} details in the
-trained model metadata:
+The snippet below shows an example of the total and baseline {feat-imp} details
+in the trained model metadata:
 
 [source,console-result]
 --------------------------------------------------
@@ -459,16 +436,28 @@ trained model metadata:
       ...
       "metadata" : {
         ...
+        "feature_importance_baseline" : { <1>
+          "classes" : [
+            {
+              "class_name" : true,
+              "baseline" : -1.5869016940485443
+            },
+            {
+              "class_name" : false,
+              "baseline" : 1.5869016940485443 
+            }
+          ]
+        },
         "total_feature_importance" : [
           {
             "feature_name" : "dayOfWeek",
             "classes" : [
               {
                 "class_name" : false,
                 "importance" : {
-                  "mean_magnitude" : 0.037513174351966404, <1>
-                  "min" : -0.20132653028125566, <2>
-                  "max" : 0.20132653028125566 <3>
+                  "mean_magnitude" : 0.037513174351966404, <2>
+                  "min" : -0.20132653028125566, <3>
+                  "max" : 0.20132653028125566 <4>
                 }
               },
               {
@@ -504,14 +493,71 @@ trained model metadata:
           },
           ...
 --------------------------------------------------
-<1> This value is the average of the absolute {feat-imp} values for the
+<1> This object contains the baselines that are used to calculate the {feat-imp}
+decision paths in {kib}.
+<2> This value is the average of the absolute {feat-imp} values for the
 `dayOfWeek` field across all the training data when the predicted class is
 `false`.
-<2> This value is the minimum {feat-imp} value across all the training data for
+<3> This value is the minimum {feat-imp} value across all the training data for
 this field when the predicted class is `false`.
-<3> This value is the maximum {feat-imp} value across all the training data for
+<4> This value is the maximum {feat-imp} value across all the training data for
 this field when the predicted class is `false`.
 
+To see the top {feat-imp} values for each prediction, search the destination
+index. For example:
+
+[source,console]
+--------------------------------------------------
+GET model-flight-delay-classification/_search
+--------------------------------------------------
+// TEST[skip:TBD]
+
+The snippet below shows an example of the {feat-imp} details for a document in
+the search results:
+
+[source,console-result]
+--------------------------------------------------  
+          ...
+          "FlightDelay" : false,
+          ...
+          "ml" : {
+            "FlightDelay_prediction" : false,
+            ...
+            "prediction_probability" : 0.9427605087816684,
+            "prediction_score" : 0.3462468700158476,
+            "feature_importance" : [
+              {
+                "feature_name" : "DistanceMiles",
+                "classes" : [
+                  {
+                    "class_name" : false,
+                    "importance" : -1.4766536146534828
+                  },
+                  {
+                    "class_name" : true,
+                    "importance" : 1.4766536146534828
+                  }
+                ]
+              },
+              {
+                "feature_name" : "FlightTimeMin",
+                "classes" : [
+                  {
+                    "class_name" : false,
+                    "importance" : 1.0919201754729184
+                  },
+                  {
+                    "class_name" : true,
+                    "importance" : -1.0919201754729184
+                  }
+                ]
+              },
+              ...
+--------------------------------------------------
+
+The sum of the {feat-imp} values for each class in this data point approximates
+the logarithm of its odds.
+
 ====
 
 [[flightdata-classification-evaluate]]

diff --git a/docs/en/stack/ml/df-analytics/flightdata-regression.asciidoc b/docs/en/stack/ml/df-analytics/flightdata-regression.asciidoc
@@ -360,7 +360,7 @@ searching the destination index.
 ====
 [source,console]
 --------------------------------------------------
-GET _ml/inference/model-flight-delays*?include=total_feature_importance
+GET _ml/inference/model-flight-delays*?include=total_feature_importance,feature_importance_baseline
 --------------------------------------------------
 // TEST[skip:TBD]
 
@@ -377,13 +377,16 @@ the trained model metadata:
       ...
       "metadata" : {
         ...
+        "feature_importance_baseline" : {
+          "baseline" : 47.43643652716527 <1>
+        },
         "total_feature_importance" : [
           {
             "feature_name" : "dayOfWeek",
             "importance" : {
-              "mean_magnitude" : 0.38674590521018903, <1>
-              "min" : -9.42823116446923, <2>
-              "max" : 8.707461689065173 <3>
+              "mean_magnitude" : 0.38674590521018903, <2>
+              "min" : -9.42823116446923, <3>
+              "max" : 8.707461689065173 <4>
             }
           },
           {
@@ -395,11 +398,13 @@ the trained model metadata:
           }
           ...
 ----
-<1> This value is the average of the absolute {feat-imp} values for the
+<1> This value is the baseline for the {feat-imp} decision path. It is the
+average of the prediction values across all the training data.
+<2> This value is the average of the absolute {feat-imp} values for the
 `dayOfWeek` field across all the training data.
-<2> This value is the minimum {feat-imp} value across all the training data for
+<3> This value is the minimum {feat-imp} value across all the training data for
 this field.
-<3> This value is the maximum {feat-imp} value across all the training data for
+<4> This value is the maximum {feat-imp} value across all the training data for
 this field.
 
 To see the top {feat-imp} values for each prediction, search the destination

diff --git a/docs/en/stack/ml/df-analytics/images/flights-classification-decision-plot.png b/docs/en/stack/ml/df-analytics/images/flights-classification-decision-plot.png
diff --git a/docs/en/stack/ml/df-analytics/images/flights-classification-importance.jpg b/docs/en/stack/ml/df-analytics/images/flights-classification-importance.jpg
diff --git a/docs/en/stack/ml/df-analytics/images/flights-regression-decision-plot.png b/docs/en/stack/ml/df-analytics/images/flights-regression-decision-plot.png
diff --git a/docs/en/stack/ml/df-analytics/ml-feature-importance.asciidoc b/docs/en/stack/ml/df-analytics/ml-feature-importance.asciidoc
@@ -32,19 +32,27 @@ how the impact of each field varies by class. For example:
 image::images/diamonds-classification-total-importance.png["Total {feat-imp} values for a {classification} {dfanalytics-job} in {kib}"]
 
 You can also examine the feature importance values for each individual
-prediction. In {kib}, you can see these values in JSON objects or decision plots:
-
-[role="screenshot"]
-image::images/flights-regression-decision-plot.png["Feature importance values for a {regression} {dfanalytics-job} in {kib}"]
-
+prediction. In {kib}, you can see these values in JSON objects or decision plots.
 For {reganalysis}, each decision plot starts at a shared baseline, which is
 the average of the prediction values for all the data points in the training
 data set. When you add all of the feature importance values for a particular
 data point to that baseline, you arrive at the numeric prediction value. If a 
 {feat-imp} value is negative, it reduces the prediction value. If a {feat-imp}
-value is positive, it increases the prediction value.
+value is positive, it increases the prediction value. For example:
 
-//TBD: Add section about classification analysis.
+[role="screenshot"]
+image::images/flights-regression-decision-plot.png["Feature importance values for a {regression} {dfanalytics-job} in {kib}"]
+
+For {classanalysis}, the sum of the {feat-imp} values approximates the predicted 
+logarithm of odds for each data point. The simplest way to understand {feat-imp}
+in the context of {classanalysis} is to look at the decision plots in {kib}. For
+each data point, there is a chart which shows the relative impact of each
+feature on the prediction probability for that class. This information helps you
+to understand which features reduces or increase the prediction probability. For
+example:
+
+[role="screenshot"]
+image::images/flights-classification-decision-plot.png["A decision plot in {kib}for a {classification} {dfanalytics-job}"]
 
 By default, {feat-imp} values are not calculated. To generate this information,
 when you create a {dfanalytics-job} you must specify the