-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOCS] Add feature importance to classification example #1359
Conversation
8960791
to
4ba1595
Compare
in your destination index. See the | ||
{ml-docs}/flightdata-classification.html#flightdata-classification-results[Viewing {classification} results] | ||
section in the {classification} example. | ||
in your destination index. | ||
|
||
[[dfa-classification-class-score]] | ||
=== `class_score` | ||
|
||
The value of `class_score` controls the probability at which a class label is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class_score
is definitely not a probability, since if I choose k
very very small, class_score
may be arbitrary large, while probability is always between 0 and 1. It's better to call it a "likelihood". And also it doesn't "control" it but simple "shows". It is controlled by the threshold k
, which we estimate automagically based on class_assignment_objective
configuration.
values. A higher number means that the model is more confident. | ||
If you want to understand how certain the model is about each prediction, you | ||
can examine its probability and score (`ml.prediction_probability` and | ||
`ml.prediction_score`). These values range between 0 and 1; the higher the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking, class_score
can be larger than 1 in some degenerated cases. So it's defined as larger or equal to 0.
//Does this mean the sum of the feature importance values for false in this | ||
example should equal the logit(p), where p is the class_probability for false? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct up to a constant. There is also a datapoint-independent constant -- an average log-odd overall all training points, which we add to the sum of feature importance before taking the inverse-logit to compute the probabilities.
any class. | ||
//Does this mean the sum of the feature importance values for false in this | ||
example should equal the logit(p), where p is the class_probability for false? | ||
//Does this imply that the feature importance value itself is the result of a logit function? Or that we use the function to merely represent the distribution of feature importance values? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens is that the decision forest predicted the log-odds directly and then we compute feature importance on the log-odds values. When we evaluate a data point, we take the log-odds predicted by the decision forest and then apply the inverse of the logit function to get the class probability.
==== | ||
While the probability of a class ranges between 0 and 1, its log-odds range | ||
between negative and positive infinity. In {kib}, the decision path for each | ||
class starts near zero, which represents a class probability of 0.5. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unfortunately a bit more complicated: 0 would represent the class probability of a constant baseline. It relates to the average class probability for the selected class (in Kibana UI) over entire training set.
If you select Canceled
as the target variable in flight data, this nuance becomes obvious. Since there are many more data points with Canceled = False
, let's assume that the average class probability over the entire training set would be something like 0.92
. This means that if the class probability of a data point is larger than 0.92
(for example 0.98
) than the decision path will go to the right (sum of feature importances is positive). On the other hand, if the class probability is smaller than 0.92
(for example 0.84), than the decision path will go to the left (sum of feature importances is negative).
While the probability of a class ranges between 0 and 1, its log-odds range | ||
between negative and positive infinity. In {kib}, the decision path for each | ||
class starts near zero, which represents a class probability of 0.5. | ||
// Is this true for multi-class classification or just binary classification? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's true for both multi-class and binary since in the UI you are selecting a class of interest from a drop-down menu.
5d8027e
to
691f8df
Compare
691f8df
to
cac81f1
Compare
ea7183a
to
6947f68
Compare
Related to elastic/kibana#73561
This PR drafts changes to the classification example such that it includes feature importance explanations.
It will be backported to 7.10 and does not take into consideration changes in 7.11 and later for elastic/kibana#77874
Preview
https://stack-docs_1359.docs-preview.app.elstc.co/guide/en/machine-learning/master/flightdata-classification.html