Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Add decision path charts to exploration results table #73561

Merged
merged 50 commits into from
Sep 9, 2020
Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
62e8918
[ML] Add decision path charts
qn895 Jul 28, 2020
031415f
[ML] Remove baseline div
qn895 Jul 28, 2020
40ef5e6
[ML] Move baseline logic to exploration results table
qn895 Jul 28, 2020
7e47db9
[ML] Fix type issues
qn895 Jul 28, 2020
e8d0b41
[ML] Improvements to data viz charts
qn895 Jul 29, 2020
5ad6690
[ML] Add info text
qn895 Jul 29, 2020
40f17cf
Merge remote-tracking branch 'upstream/master' into feature-importance
qn895 Jul 29, 2020
ea12a63
[ML] Change size to pass as Chart props
qn895 Jul 29, 2020
49a541c
[ML] Change to 3 sigfig instead of decimals
qn895 Jul 29, 2020
52964e5
[ML] Fix i18n issue
qn895 Jul 30, 2020
efbeffd
Merge remote-tracking branch 'upstream/master' into feature-importance
qn895 Aug 10, 2020
8b1e8c8
[ML] Decision path popover improvement + importance summary
qn895 Aug 10, 2020
e029cfc
Merge remote-tracking branch 'upstream/master' into feature-importance
qn895 Aug 18, 2020
19a1aae
[ML] Change title to fit summary
qn895 Aug 18, 2020
c718437
[ML] Consolidate feature importance
qn895 Aug 18, 2020
d0fa245
[ML] useMemo for popoverContent
qn895 Aug 18, 2020
a317505
[ML] Add adjustment to baseline
qn895 Aug 18, 2020
973ddb9
[ML] Call /baseline only if it's a regression analysis for now
qn895 Aug 18, 2020
1c22df3
Merge remote-tracking branch 'upstream/master' into feature-importance
qn895 Aug 19, 2020
f248722
Merge remote-tracking branch 'upstream/master' into feature-importance
qn895 Aug 19, 2020
39e3749
[ML] Remove feature importance summary bars for now
qn895 Aug 19, 2020
0c57415
[ML] Update types
qn895 Aug 19, 2020
6b525b4
[ML] Update types for popOverContent
qn895 Aug 19, 2020
c7bd561
Merge remote-tracking branch 'upstream/master' into feature-importance
qn895 Aug 20, 2020
c9261a4
[ML] Refactor to support classification DFA
qn895 Aug 20, 2020
38a5e48
[ML] Update filteredFeatureImportance
qn895 Aug 20, 2020
c6cf181
[ML] Add improvement to decision path chart
qn895 Aug 20, 2020
175bfd4
[ML] Refactor feature_importance to have own model & add error handling
qn895 Aug 20, 2020
c5b68d5
[ML] Remove getDataFrameAnalyticsBaselineSchema and body
qn895 Aug 21, 2020
0970f04
Merge remote-tracking branch 'upstream/master' into feature-importance
qn895 Aug 21, 2020
0fa835f
[ML] Update typo at the predicted value
qn895 Aug 21, 2020
669c683
[ML] Remove duplicate code in dfa creation
qn895 Aug 21, 2020
eccdb4d
[ML] Fix datagrid popover pagination crash
qn895 Aug 21, 2020
049ba5a
[ML] Fix domain calc not working reliably
qn895 Aug 21, 2020
172a58e
Merge remote-tracking branch 'upstream/master' into feature-importance
qn895 Aug 24, 2020
ef815d1
[ML] Minor refactoring
qn895 Aug 24, 2020
74ef5fd
[ML] Fix so classification support boolean class names
qn895 Aug 24, 2020
eaae596
[ML] Fix better height and change to empty annotation marker
qn895 Aug 24, 2020
b50daae
Merge remote-tracking branch 'upstream/master' into feature-importance
qn895 Aug 25, 2020
9aa5d81
[ML] Adjust style & precision header for baseline
qn895 Aug 25, 2020
a71e190
[ML] Update grid and header
qn895 Aug 26, 2020
9f435f6
Merge remote-tracking branch 'upstream/master' into feature-importance
qn895 Sep 3, 2020
a954cc6
[ML] Update to new client & more strict conversion
qn895 Sep 3, 2020
09495e3
[ML] Fix type
qn895 Sep 3, 2020
1345cf2
[ML] Update tolerance to check for abs difference
qn895 Sep 4, 2020
76488b6
Merge remote-tracking branch 'upstream/master' into feature-importance
qn895 Sep 4, 2020
a93e740
[ML] Change to DEFAULT_RESULTS_FIELD
qn895 Sep 8, 2020
004518b
Merge remote-tracking branch 'upstream/master' into feature-importance
qn895 Sep 8, 2020
5fd5ff4
[ML] Update doc link, tick formatter, and baseline api for generic re…
qn895 Sep 8, 2020
b857ffe
Merge branch 'master' into feature-importance
elasticmachine Sep 9, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions x-pack/plugins/ml/common/types/data_frame_analytics.ts
Original file line number Diff line number Diff line change
Expand Up @@ -79,3 +79,9 @@ export interface DataFrameAnalyticsConfig {
version: string;
allow_lazy_start?: boolean;
}

export enum ANALYSIS_CONFIG_TYPE {
OUTLIER_DETECTION = 'outlier_detection',
REGRESSION = 'regression',
CLASSIFICATION = 'classification',
}
Comment on lines +83 to +87
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider using const instead

Suggested change
export enum ANALYSIS_CONFIG_TYPE {
OUTLIER_DETECTION = 'outlier_detection',
REGRESSION = 'regression',
CLASSIFICATION = 'classification',
}
export const ANALYSIS_CONFIG_TYPE = {
OUTLIER_DETECTION: 'outlier_detection',
REGRESSION: 'regression',
CLASSIFICATION: 'classification',
} as const;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just talked to Melissa and since ANALYSIS_CONFIG_TYPE is used in a quite a lot of place, I think it's better to do this in a follow-up PR.

23 changes: 23 additions & 0 deletions x-pack/plugins/ml/common/types/feature_importance.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License;
* you may not use this file except in compliance with the Elastic License.
*/

export interface ClassFeatureImportance {
class_name: string;
importance: number;
}
export interface FeatureImportance {
feature_name: string;
importance?: number;
classes?: ClassFeatureImportance[];
}

export interface TopClass {
class_name: string;
class_probability: number;
class_score: number;
}

export type TopClasses = TopClass[];
46 changes: 46 additions & 0 deletions x-pack/plugins/ml/common/util/analytics_utils.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License;
* you may not use this file except in compliance with the Elastic License.
*/

import {
AnalysisConfig,
ClassificationAnalysis,
OutlierAnalysis,
RegressionAnalysis,
ANALYSIS_CONFIG_TYPE,
} from '../types/data_frame_analytics';

export const isOutlierAnalysis = (arg: any): arg is OutlierAnalysis => {
const keys = Object.keys(arg);
return keys.length === 1 && keys[0] === ANALYSIS_CONFIG_TYPE.OUTLIER_DETECTION;
};

export const isRegressionAnalysis = (arg: any): arg is RegressionAnalysis => {
const keys = Object.keys(arg);
return keys.length === 1 && keys[0] === ANALYSIS_CONFIG_TYPE.REGRESSION;
};

export const isClassificationAnalysis = (arg: any): arg is ClassificationAnalysis => {
const keys = Object.keys(arg);
return keys.length === 1 && keys[0] === ANALYSIS_CONFIG_TYPE.CLASSIFICATION;
};

export const getPredictionFieldName = (
analysis: AnalysisConfig
):
| RegressionAnalysis['regression']['prediction_field_name']
| ClassificationAnalysis['classification']['prediction_field_name'] => {
// If undefined will be defaulted to dependent_variable when config is created
let predictionFieldName;
if (isRegressionAnalysis(analysis) && analysis.regression.prediction_field_name !== undefined) {
predictionFieldName = analysis.regression.prediction_field_name;
} else if (
isClassificationAnalysis(analysis) &&
analysis.classification.prediction_field_name !== undefined
) {
predictionFieldName = analysis.classification.prediction_field_name;
}
return predictionFieldName;
};
Original file line number Diff line number Diff line change
Expand Up @@ -119,13 +119,14 @@ export const getDataGridSchemasFromFieldTypes = (fieldTypes: FieldTypes, results
schema = 'numeric';
}

if (
field.includes(`${resultsField}.${FEATURE_IMPORTANCE}`) ||
field.includes(`${resultsField}.${TOP_CLASSES}`)
) {
if (field.includes(`${resultsField}.${TOP_CLASSES}`)) {
schema = 'json';
}

if (field.includes(`${resultsField}.${FEATURE_IMPORTANCE}`)) {
schema = 'featureImportance';
}

return { id: field, schema, isSortable };
});
};
Expand Down Expand Up @@ -250,10 +251,6 @@ export const useRenderCellValue = (
return cellValue ? 'true' : 'false';
}

if (typeof cellValue === 'object' && cellValue !== null) {
return JSON.stringify(cellValue);
}

Comment on lines -253 to -256
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why this is no longer necessary?

Copy link
Member Author

@qn895 qn895 Aug 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this line is a duplicate of line 228 of this file so I removed it. This change shouldn't cause any issue :)

return cellValue;
};
}, [indexPattern?.fields, pagination.pageIndex, pagination.pageSize, tableItems]);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,7 @@
*/

import { isEqual } from 'lodash';
import React, { memo, useEffect, FC } from 'react';

import React, { memo, useEffect, FC, useMemo } from 'react';
import { i18n } from '@kbn/i18n';

import {
Expand All @@ -24,13 +23,15 @@ import {
} from '@elastic/eui';

import { CoreSetup } from 'src/core/public';

import { DEFAULT_SAMPLER_SHARD_SIZE } from '../../../../common/constants/field_histograms';

import { INDEX_STATUS } from '../../data_frame_analytics/common';
import { ANALYSIS_CONFIG_TYPE, INDEX_STATUS } from '../../data_frame_analytics/common';

import { euiDataGridStyle, euiDataGridToolbarSettings } from './common';
import { UseIndexDataReturnType } from './types';
import { DecisionPathPopover } from './feature_importance/decision_path_popover';
import { TopClasses } from '../../../../common/types/feature_importance';

// TODO Fix row hovering + bar highlighting
// import { hoveredRow$ } from './column_chart';

Expand All @@ -41,6 +42,8 @@ export const DataGridTitle: FC<{ title: string }> = ({ title }) => (
);

interface PropsWithoutHeader extends UseIndexDataReturnType {
baseline?: number;
analysisType?: ANALYSIS_CONFIG_TYPE;
dataTestSubj: string;
toastNotifications: CoreSetup['notifications']['toasts'];
}
Expand All @@ -60,6 +63,7 @@ type Props = PropsWithHeader | PropsWithoutHeader;
export const DataGrid: FC<Props> = memo(
(props) => {
const {
baseline,
chartsVisible,
chartsButtonVisible,
columnsWithCharts,
Expand All @@ -80,8 +84,9 @@ export const DataGrid: FC<Props> = memo(
toastNotifications,
toggleChartVisibility,
visibleColumns,
predictionFieldName,
analysisType,
} = props;

// TODO Fix row hovering + bar highlighting
// const getRowProps = (item: any) => {
// return {
Expand All @@ -90,6 +95,42 @@ export const DataGrid: FC<Props> = memo(
// };
// };

const popOverContent = useMemo(() => {
return analysisType === ANALYSIS_CONFIG_TYPE.REGRESSION ||
analysisType === ANALYSIS_CONFIG_TYPE.CLASSIFICATION
? {
featureImportance: ({ children }: { cellContentsElement: any; children: any }) => {
const rowIndex = children?.props?.rowIndex;
const row = data[rowIndex];
const parsedFIArray = row.ml.feature_importance;
let predictedValue: string | number | undefined;
let topClasses: TopClasses = [];
if (
predictionFieldName !== undefined &&
row &&
row.ml[predictionFieldName] !== undefined
) {
predictedValue = row.ml[predictionFieldName];
topClasses = row.ml.top_classes;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ml is hard-coded here but it could be configured to be any other value using a custom value defined in the config under dest.results_field https://www.elastic.co/guide/en/elasticsearch/reference/7.9/put-dfanalytics.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Thanks so much. Will update.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated here ef815d1


return (
<DecisionPathPopover
analysisType={analysisType}
predictedValue={predictedValue}
baseline={baseline}
featureImportance={parsedFIArray}
topClasses={topClasses}
predictionFieldName={
predictionFieldName ? predictionFieldName.replace('_prediction', '') : undefined
}
/>
);
},
}
: undefined;
}, [baseline, data]);

useEffect(() => {
if (invalidSortingColumnns.length > 0) {
invalidSortingColumnns.forEach((columnId) => {
Expand Down Expand Up @@ -225,6 +266,7 @@ export const DataGrid: FC<Props> = memo(
}
: {}),
}}
popoverContents={popOverContent}
pagination={{
...pagination,
pageSizeOptions: [5, 10, 25],
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License;
* you may not use this file except in compliance with the Elastic License.
*/

// adjust the height so it's compact for items with more features
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this comment go further down where it's relevant or can it be removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated here ef815d1

import {
AnnotationDomainTypes,
Axis,
AxisConfig,
Chart,
LineAnnotation,
LineAnnotationDatum,
LineSeries,
PartialTheme,
Position,
RecursivePartial,
ScaleType,
Settings,
} from '@elastic/charts';
import { EuiIcon } from '@elastic/eui';

import React, { useCallback, useMemo } from 'react';
import { i18n } from '@kbn/i18n';
import { DecisionPathPlotData } from './use_classification_path_data';

const baselineStyle = {
line: {
strokeWidth: 1,
stroke: 'gray',
opacity: 1,
},
details: {
fontSize: 12,
fontFamily: 'Arial',
fontStyle: 'bold',
fill: 'gray',
padding: 0,
},
};

const axes: RecursivePartial<AxisConfig> = {
tickLabelStyle: {
fontSize: 12,
},
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const baselineStyle = {
line: {
strokeWidth: 1,
stroke: 'gray',
opacity: 1,
},
details: {
fontSize: 12,
fontFamily: 'Arial',
fontStyle: 'bold',
fill: 'gray',
padding: 0,
},
};
const axes: RecursivePartial<AxisConfig> = {
tickLabelStyle: {
fontSize: 12,
},
};
import euiVars from '@elastic/eui/dist/eui_theme_light.json';
const { euiColorFullShade, euiColorMediumShade } = euiVars;
const axisColor = euiColorMediumShade;
const baselineStyle: LineAnnotationStyle = {
line: {
strokeWidth: 1,
stroke: euiColorFullShade,
opacity: 0.75,
},
details: {
fontFamily: 'Arial',
fontSize: 10,
fontStyle: 'bold',
fill: euiColorMediumShade,
padding: 0,
},
};
const axes: RecursivePartial<AxisConfig> = {
axisLineStyle: {
stroke: axisColor,
},
tickLabelStyle: {
fontSize: 10,
fill: axisColor,
},
tickLineStyle: {
stroke: axisColor,
},
gridLineStyle: {
horizontal: {
dash: [1, 2],
},
vertical: {
strokeWidth: 0,
},
},
};

A suggestion with some style tweaks and using colors supplied by EUI:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated here 9aa5d81

const theme: PartialTheme = {
axes,
};

interface DecisionPathChartProps {
decisionPathData: DecisionPathPlotData;
predictionFieldName?: string;
baseline?: number;
minDomain: number | undefined;
maxDomain: number | undefined;
}

export const DecisionPathChart = ({
decisionPathData,
predictionFieldName,
minDomain,
maxDomain,
baseline,
}: DecisionPathChartProps) => {
const heightMultiplier = Array.isArray(decisionPathData) && decisionPathData.length > 4 ? 30 : 75;
const baselineData: LineAnnotationDatum[] = useMemo(
() => [
{
dataValue: baseline ? parseFloat(baseline.toPrecision(3)) : undefined,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove the rounding to precision here (or round all other values of the decision path line too), otherwise there might be a slight offset between the annotation line for the baseline and the baseline on the y axis of the decision path, see here with and without the rounding:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! My previous approach to round the baseline so that tooltip will only show up to 3 significant figures was wrong. I updated it not round the data point, and instead to use header accessor for the formatted tooltip a71e190.

details: i18n.translate(
'xpack.ml.dataframe.analytics.explorationResults.decisionPathBaselineText',
{
defaultMessage:
'baseline (average of predictions for all data points in the training data set)',
}
),
},
],
[baseline]
);
const tickFormatter = useCallback((d) => `${Number(d).toPrecision(3)}`, []);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this doesn't need to be wrapped in a template literal

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated here ef815d1


return (
<Chart size={{ height: decisionPathData.length * heightMultiplier }}>
<Settings theme={theme} rotation={90} />
{baseline && (
<LineAnnotation
id="xpack.ml.dataframe.analytics.explorationResults.decisionPathBaseline"
domainType={AnnotationDomainTypes.YDomain}
dataValues={baselineData}
style={baselineStyle}
marker={<EuiIcon type={'annotation'} />}
/>
)}

<Axis
id={'xpack.ml.dataframe.analytics.explorationResults.decisionPathXAxis'}
tickFormat={tickFormatter}
title={i18n.translate(
'xpack.ml.dataframe.analytics.explorationResults.decisionPathXAxisTitle',
{
defaultMessage: "Prediction for '{predictionFieldName}'",
values: { predictionFieldName },
}
)}
showGridLines={true}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
showGridLines={true}
showGridLines={false}

Suggest to hide the vertical grid lines.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated here a71e190

position={Position.Bottom}
showOverlappingTicks
domain={
minDomain && maxDomain
? {
min: minDomain,
max: maxDomain,
}
: undefined
}
/>
<Axis showGridLines={true} id="left" position={Position.Left} />
<LineSeries
id={'xpack.ml.dataframe.analytics.explorationResults.decisionPathLine'}
name={i18n.translate(
'xpack.ml.dataframe.analytics.explorationResults.decisionPathLineTitle',
{
defaultMessage: 'Prediction',
}
)}
xScaleType={ScaleType.Ordinal}
yScaleType={ScaleType.Linear}
xAccessor={0}
yAccessors={[2]}
data={decisionPathData}
/>
</Chart>
);
};
Loading