-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Add probability values in decision path visualization for classification data frame analytics #80229
Conversation
const filteredFeatureImportance = mappedFeatureImportance.filter( | ||
(f) => f !== undefined | ||
) as ExtendedFeatureImportance[]; | ||
|
||
return buildDecisionPathData(filteredFeatureImportance); | ||
const finalResult: DecisionPathPlotData = filteredFeatureImportance | ||
// sort so absolute importance so it goes from bottom (baseline) to top |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit - typo? Sort by absolute importance...
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching that. Updated here: 8121fe6
Pinging @elastic/ml-ui (:ml) |
...gins/ml/public/application/components/data_grid/feature_importance/decision_path_popover.tsx
Outdated
Show resolved
Hide resolved
...gins/ml/public/application/components/data_grid/feature_importance/decision_path_popover.tsx
Outdated
Show resolved
Hide resolved
.../public/application/components/data_grid/feature_importance/use_classification_path_data.tsx
Show resolved
Hide resolved
.../public/application/components/data_grid/feature_importance/use_classification_path_data.tsx
Outdated
Show resolved
Hide resolved
.../public/application/components/data_grid/feature_importance/use_classification_path_data.tsx
Outdated
Show resolved
Hide resolved
.../public/application/components/data_grid/feature_importance/use_classification_path_data.tsx
Outdated
Show resolved
Hide resolved
.../public/application/components/data_grid/feature_importance/use_classification_path_data.tsx
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the test mock data, I wonder if we could live with some smaller and more artificial minimal dataset for the jest tests? On the other hand, could we do a test relying on a more real-world dataset using an API integration test (not necessarily in this PR)?
], | ||
const baselineData: LineAnnotationDatum[] | undefined = useMemo( | ||
() => | ||
baseline && isRegressionFeatureImportanceBaseline(baseline) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With Dima's suggestion making the type guard accept any
, the baseline &&
part here might then no longer be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated here 10eae2d
@walter That's a good point. I think it will be beneficial to also have functional test to see if the decision path is matching up with what we are showing in the other columns in the data grid. I'll add a follow up PR to this. |
.../public/application/components/data_grid/feature_importance/use_classification_path_data.tsx
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest edits LGTM
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor comment on the code. Gave this a good test, and the baseline calculation looks correct for every classification and regression job I ran until this one on the mushroom data set:
Job config:
{
"id": "mushroom_cap_color_class",
"create_time": 1603980394530,
"version": "8.0.0",
"description": "",
"source": {
"index": [
"mushroom"
],
"query": {
"match_all": {}
}
},
"dest": {
"index": "mushroom_cap_color_class",
"results_field": "ml"
},
"analysis": {
"classification": {
"dependent_variable": "cap-color",
"num_top_feature_importance_values": 6,
"class_assignment_objective": "maximize_minimum_recall",
"num_top_classes": -1,
"prediction_field_name": "cap-color_prediction",
"training_percent": 18,
"randomize_seed": -5653826009821974000
}
},
"analyzed_fields": {
"includes": [
"bruises",
"cap-color",
"cap-shape",
"cap-surface",
"edibility",
"gill-attachment",
"gill-color",
"gill-size",
"gill-spacing",
"habitat",
"odor",
"population",
"ring-number",
"ring-type",
"spore-print-color",
"stalk-color-above-ring",
"stalk-color-below-ring",
"stalk-root",
"stalk-shape",
"stalk-surface-above-ring",
"stalk-surface-below-ring",
"veil-color",
"veil-type"
],
"excludes": []
},
"model_memory_limit": "70mb",
"allow_lazy_start": false,
"max_num_threads": 1
}
Tested latest update 191991e and the calculation is now looking correct for all my regression and classification jobs.
@peteharverson Discussed with Valeriy and we decided to add an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this needs rebasing against #82334, but otherwise tested latest edits and the baseline calculations LGTM.
@@ -415,11 +415,20 @@ export const showDataGridColumnChartErrorMessageToast = ( | |||
// helper function to transform { [key]: [val] } => { [key]: val } | |||
// for when `fields` is used in es.search since response is always an array of values | |||
// since response always returns an array of values for each field | |||
export const getProcessedFields = (originalObj: object) => { | |||
export const getProcessedFields = (originalObj: object, omitBy?: (key: string) => boolean) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess this can be removed now that #82334 is merged?
@@ -63,7 +63,13 @@ export const getIndexData = async ( | |||
|
|||
if (!options.didCancel) { | |||
setRowCount(resp.hits.total.value); | |||
setTableItems(resp.hits.hits.map((d) => getProcessedFields(d.fields))); | |||
setTableItems( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess this is not needed now that #82334 is merged?
💚 Build SucceededMetrics [docs]async chunks size
distributable file count
History
To update your PR or re-run it, just comment with: |
…fication data frame analytics (elastic#80229) Co-authored-by: Kibana Machine <[email protected]>
…fication data frame analytics (#80229) (#82551) Co-authored-by: Kibana Machine <[email protected]> Co-authored-by: Kibana Machine <[email protected]>
Summary
This PR is part of #77874. Now that we have the
feature_importance_baseline
exposed as part of the trained model metadata elastic/elasticsearch#63172, we can now use the stored baseline to make the decision path in the data frame analytics exploration more complete. Changes include:Regression
Removed the
/api/ml/data_frame/analytics/{analyticsId}/baseline
endpoint which was previously used to calculate the baseline for regression jobs and switch the use the baseline exposed by the trained_model metadata.Binary classification
Multi-class classification
The prediction probability calculated for feature is calculated as following:
Checklist
Delete any items that are not applicable to this PR.