Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Improve support for script and aggregation fields in anomaly detection jobs #81923

Merged
merged 30 commits into from
Nov 17, 2020
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
75c4924
[ML] Add support for script fields in wizard and in viz
qn895 Oct 27, 2020
d90f1ec
[ML] Fix datafeed preview for aggregations not showing if name of agg…
qn895 Oct 27, 2020
b580708
[ML] Seems like it can be just aggregations
qn895 Oct 27, 2020
77a7e92
[ML] Fix AE plotting for aggregated fields
qn895 Oct 27, 2020
ecc77ef
[ML] Export Aggregation interface
qn895 Oct 27, 2020
8cd0732
[ML] Fix plotting issue with AE script fields
qn895 Oct 27, 2020
a930e21
[ML] Fix typescript
qn895 Oct 28, 2020
571bf79
Merge remote-tracking branch 'upstream/master' into ml-fix-script-n-a…
qn895 Oct 30, 2020
25e063d
Merge upstream/master into ml-fix-script-n-aggregation-fields
qn895 Nov 3, 2020
0455ba1
[ML] Clean up validation for nested aggregation field
qn895 Nov 4, 2020
4a557f5
[ML] Add validation for missing summary count if datafeed has aggrega…
qn895 Nov 4, 2020
2712c6f
[ML] Fix anomaly search broken because max results is 0
qn895 Nov 4, 2020
f76039b
[ML] Add missing_summary_count_field_name validation to validate.ts test
qn895 Nov 4, 2020
85e1773
[ML] Fix order between datafeedConfig and allowMMLGreaterThanMax
qn895 Nov 4, 2020
79f22dd
[ML] Fix jest tests
qn895 Nov 4, 2020
739a423
[ML] Fix calculateModelMemoryLimitProvider order
qn895 Nov 4, 2020
dfd070a
Merge remote-tracking branch 'upstream/master' into ml-fix-script-n-a…
qn895 Nov 5, 2020
0ced887
Merge remote-tracking branch 'upstream/master' into ml-fix-script-n-a…
qn895 Nov 5, 2020
d96a119
[ML] Rename cardinalityField, combine if statements, add missing_summ…
qn895 Nov 5, 2020
a3a0809
[ML] Fix character escape & disable next step if misisng summaryCount…
qn895 Nov 5, 2020
9769d94
[ML] Fix allowMMLGreaterThanMax changed to undefined previously
qn895 Nov 5, 2020
f54c8e8
[ML] Update datafeedAggregations check
qn895 Nov 5, 2020
7355559
[ML] Change DatafeedOverride to Datafeed
qn895 Nov 5, 2020
1b5898f
[ML] Fix field in aggregatable check
qn895 Nov 6, 2020
20c1f67
Merge remote-tracking branch 'upstream/master' into ml-fix-script-n-a…
qn895 Nov 9, 2020
1a3c40f
Merge upstream/master into ml-fix-script-n-aggregation-fields
qn895 Nov 11, 2020
22f938c
[ML] Update text description
qn895 Nov 11, 2020
cf5d4d1
[ML] Update text translations
qn895 Nov 11, 2020
6d50e8e
Merge branch 'master' into ml-fix-script-n-aggregation-fields
kibanamachine Nov 17, 2020
a661a27
Merge branch 'master' into ml-fix-script-n-aggregation-fields
kibanamachine Nov 17, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions x-pack/plugins/ml/common/constants/messages.ts
Original file line number Diff line number Diff line change
Expand Up @@ -442,6 +442,16 @@ export const getMessages = once(() => {
url:
'https://www.elastic.co/guide/en/elasticsearch/reference/{{version}}/ml-job-resource.html#ml-job-resource',
},
missing_summary_count_field_name: {
status: VALIDATION_STATUS.ERROR,
text: i18n.translate(
'xpack.ml.models.jobValidation.messages.missingSummaryCountFieldNameMessage',
{
defaultMessage:
'A job configured with a datafeed with aggregations must set summary_count_field_name; use doc_count or suitable alternative.',
}
),
},
skipped_extended_tests: {
status: VALIDATION_STATUS.WARNING,
text: i18n.translate('xpack.ml.models.jobValidation.messages.skippedExtendedTestsMessage', {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,9 @@ export interface Datafeed {
job_id: JobId;
query: object;
query_delay?: string;
script_fields?: object;
script_fields?: {
[key: string]: any;
};
scroll_size?: number;
delayed_data_check_config?: object;
indices_options?: IndicesOptions;
Expand All @@ -30,8 +32,8 @@ export interface ChunkingConfig {
time_span?: string;
}

interface Aggregation {
buckets: {
export interface Aggregation {
[key: string]: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure we should do this. AFAIK this will allow for any type of string based key to be valid for this interface like date_histoasdf.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a tricky one. Talked to Ben and apparently the users don't necessarily need to name the item as buckets for the datafeed config to be valid. Right now on the UI side, we don't support that. I can revert this change back for this PR, but we'll address this issue in a follow up PR.

date_histogram: {
field: string;
fixed_interval: string;
Expand Down
13 changes: 13 additions & 0 deletions x-pack/plugins/ml/common/types/fields.ts
Original file line number Diff line number Diff line change
Expand Up @@ -89,3 +89,16 @@ export const mlCategory: Field = {
type: ES_FIELD_TYPES.KEYWORD,
aggregatable: false,
};

export interface FieldAggCardinality {
field: string;
percent?: any;
}

export interface ScriptAggCardinality {
script: any;
}

export interface AggCardinality {
cardinality: FieldAggCardinality | ScriptAggCardinality;
}
6 changes: 3 additions & 3 deletions x-pack/plugins/ml/common/util/job_utils.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -188,8 +188,8 @@ describe('ML - job utils', () => {
expect(isTimeSeriesViewDetector(job, 3)).toBe(false);
});

test('returns false for a detector using a script field as a metric field_name', () => {
expect(isTimeSeriesViewDetector(job, 4)).toBe(false);
test('returns true for a detector using a script field as a metric field_name', () => {
expect(isTimeSeriesViewDetector(job, 4)).toBe(true);
});
});

Expand Down Expand Up @@ -281,6 +281,7 @@ describe('ML - job utils', () => {
expect(isSourceDataChartableForDetector(job, 22)).toBe(true);
expect(isSourceDataChartableForDetector(job, 23)).toBe(true);
expect(isSourceDataChartableForDetector(job, 24)).toBe(true);
expect(isSourceDataChartableForDetector(job, 37)).toBe(true);
});

test('returns false for expected detectors', () => {
Expand All @@ -296,7 +297,6 @@ describe('ML - job utils', () => {
expect(isSourceDataChartableForDetector(job, 34)).toBe(false);
expect(isSourceDataChartableForDetector(job, 35)).toBe(false);
expect(isSourceDataChartableForDetector(job, 36)).toBe(false);
expect(isSourceDataChartableForDetector(job, 37)).toBe(false);
});
});

Expand Down
1 change: 0 additions & 1 deletion x-pack/plugins/ml/common/util/job_utils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,6 @@ export function isSourceDataChartableForDetector(job: CombinedJob, detectorIndex
// Perform extra check to see if the detector is using a scripted field.
const scriptFields = Object.keys(job.datafeed_config.script_fields);
isSourceDataChartable =
scriptFields.indexOf(dtr.field_name!) === -1 &&
scriptFields.indexOf(dtr.partition_field_name!) === -1 &&
scriptFields.indexOf(dtr.by_field_name!) === -1 &&
scriptFields.indexOf(dtr.over_field_name!) === -1;
Expand Down
19 changes: 19 additions & 0 deletions x-pack/plugins/ml/common/util/validation_utils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,22 @@ export function isValidJson(json: string) {
return false;
}
}

export function findAggField(aggs: Record<string, any>, fieldName: string): any {
let value;
Object.keys(aggs).some(function (k) {
if (k === fieldName) {
value = aggs[k];
return true;
}
if (aggs.hasOwnProperty(k) && typeof aggs[k] === 'object') {
value = findAggField(aggs[k], fieldName);
return value !== undefined;
}
});
return value;
}

export function isValidAggregationField(aggs: Record<string, any>, fieldName: string): boolean {
return findAggField(aggs, fieldName) !== undefined;
}
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,8 @@ export const anomalyDataChange = function (
config.timeField,
range.min,
range.max,
bucketSpanSeconds * 1000
bucketSpanSeconds * 1000,
config.datafeedConfig
)
.toPromise();
} else {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,7 @@ export const useModelMemoryEstimator = (
// Update model memory estimation payload on the job creator updates
useEffect(() => {
modelMemoryEstimator.update({
datafeedConfig: jobCreator.datafeedConfig,
analysisConfig: jobCreator.jobConfig.analysis_config,
indexPattern: jobCreator.indexPatternTitle,
query: jobCreator.datafeedConfig.query,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,12 @@ export const DatafeedPreview: FC<{
if (combinedJob.datafeed_config && combinedJob.datafeed_config.indices.length) {
try {
const resp = await mlJobService.searchPreview(combinedJob);
const data = resp.aggregations
? resp.aggregations.buckets.buckets.slice(0, ML_DATA_PREVIEW_COUNT)
: resp.hits.hits;
let data = resp.hits.hits;
// the first item under aggregations can be any name
if (typeof resp.aggregations === 'object' && Object.keys(resp.aggregations).length > 0) {
const accessor = Object.keys(resp.aggregations)[0];
data = resp.aggregations[accessor].buckets.slice(0, ML_DATA_PREVIEW_COUNT);
}

setPreviewJsonString(JSON.stringify(data, null, 2));
} catch (error) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,11 @@ import {
FieldHistogramRequestConfig,
FieldRequestConfig,
} from '../../datavisualizer/index_based/common';
import { DataRecognizerConfigResponse, Module } from '../../../../common/types/modules';
import {
DatafeedOverride,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DatafeedOverride is for overriding the datafeed config when calling the module setup and so i don't think should be used here.
It looks like the standard Datafeed interface would be better suited throughout this PR.

DataRecognizerConfigResponse,
Module,
} from '../../../../common/types/modules';
import { getHttp } from '../../util/dependency_cache';

export interface MlInfoResponse {
Expand Down Expand Up @@ -628,13 +632,15 @@ export function mlApiServicesProvider(httpService: HttpService) {
},

calculateModelMemoryLimit$({
datafeedConfig,
analysisConfig,
indexPattern,
query,
timeFieldName,
earliestMs,
latestMs,
}: {
datafeedConfig: DatafeedOverride;
analysisConfig: AnalysisConfig;
indexPattern: string;
query: any;
Expand All @@ -643,6 +649,7 @@ export function mlApiServicesProvider(httpService: HttpService) {
latestMs: number;
}) {
const body = JSON.stringify({
datafeedConfig,
analysisConfig,
indexPattern,
query,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@ import { ML_MEDIAN_PERCENTS } from '../../../../common/util/job_utils';
import { JobId } from '../../../../common/types/anomaly_detection_jobs';
import { MlApiServices } from '../ml_api_service';
import { CriteriaField } from './index';
import type { DatafeedOverride } from '../../../../common/types/modules';
import type { Aggregation } from '../../../../common/types/anomaly_detection_jobs/datafeed';
import { findAggField } from '../../../../common/util/validation_utils';

interface ResultResponse {
success: boolean;
Expand Down Expand Up @@ -68,8 +71,12 @@ export function resultsServiceRxProvider(mlApiServices: MlApiServices) {
timeFieldName: string,
earliestMs: number,
latestMs: number,
intervalMs: number
intervalMs: number,
dataFeedConfig?: DatafeedOverride
): Observable<MetricData> {
const scriptFields: any | undefined = dataFeedConfig?.script_fields;
const aggFields: Aggregation | undefined = dataFeedConfig?.aggregations;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the : any | undefined / : Aggregation | undefined parts necessary, I assumed they should be inferred on assignment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed here d96a119


// Build the criteria to use in the bool filter part of the request.
// Add criteria for the time range, entity fields,
// plus any additional supplied query.
Expand Down Expand Up @@ -150,15 +157,35 @@ export function resultsServiceRxProvider(mlApiServices: MlApiServices) {
body.aggs.byTime.aggs = {};

const metricAgg: any = {
[metricFunction]: {
field: metricFieldName,
},
[metricFunction]: {},
};
if (scriptFields !== undefined && scriptFields[metricFieldName] !== undefined) {
metricAgg[metricFunction].script = scriptFields[metricFieldName].script;
} else {
metricAgg[metricFunction].field = metricFieldName;
}

if (metricFunction === 'percentiles') {
metricAgg[metricFunction].percents = [ML_MEDIAN_PERCENTS];
}
body.aggs.byTime.aggs.metric = metricAgg;

// when the field is an aggregation field, because the field doesn't actually exist in the indices
// we need to pass all the sub aggs from the original datafeed config
// so that we can access the aggregated field
if (typeof aggFields === 'object' && Object.keys(aggFields).length > 0) {
// first item under aggregations can be any name, not necessarily 'buckets'
const accessor = Object.keys(aggFields)[0];
const tempAggs = { ...(aggFields[accessor].aggs ?? aggFields[accessor].aggregations) };
const foundValue = findAggField(tempAggs, metricFieldName);

if (foundValue !== undefined) {
tempAggs.metric = foundValue;
delete tempAggs[metricFieldName];
}
body.aggs.byTime.aggs = tempAggs;
} else {
body.aggs.byTime.aggs.metric = metricAgg;
}
}

return mlApiServices.esSearch$({ index, body }).pipe(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -285,7 +285,7 @@ export function resultsServiceProvider(mlApiServices) {
influencerFieldValues: {
terms: {
field: 'influencer_field_value',
size: maxFieldValues,
size: !!maxFieldValues ? maxFieldValues : ANOMALY_SWIM_LANE_HARD_LIMIT,
order: {
maxAnomalyScore: 'desc',
},
Expand Down Expand Up @@ -415,7 +415,7 @@ export function resultsServiceProvider(mlApiServices) {
influencerFieldValues: {
terms: {
field: 'influencer_field_value',
size: maxResults !== undefined ? maxResults : 2,
size: !!maxResults ? maxResults : 2,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change is subtle but using a falsey check like !! rather than an explicit check for undefined changes its behaviour.
if maxResults is 0 it would now default to 2
Not that 0 is a sensible number, it's just generally dangerous IMO using falsey checks for variables that are numbers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually had to update this to using !! because in the case maxResults = 0, the size will be 0 and will make the query invalid.

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated here 7355559

order: {
maxAnomalyScore: 'desc',
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,8 @@ function getMetricData(
chartConfig.timeField,
earliestMs,
latestMs,
intervalMs
intervalMs,
chartConfig?.datafeedConfig
)
.pipe(
map((resp) => {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ import { MLCATEGORY } from '../../../common/constants/field_types';
import { AnalysisConfig } from '../../../common/types/anomaly_detection_jobs';
import { fieldsServiceProvider } from '../fields_service';
import { MlInfoResponse } from '../../../common/types/ml_server_info';
import { DatafeedOverride } from '../../../common/types/modules';
import type { MlClient } from '../../lib/ml_client';

export interface ModelMemoryEstimationResult {
Expand Down Expand Up @@ -46,7 +47,8 @@ const cardinalityCheckProvider = (client: IScopedClusterClient) => {
query: any,
timeFieldName: string,
earliestMs: number,
latestMs: number
latestMs: number,
datafeedConfig?: DatafeedOverride
): Promise<{
overallCardinality: { [key: string]: number };
maxBucketCardinality: { [key: string]: number };
Expand Down Expand Up @@ -101,7 +103,8 @@ const cardinalityCheckProvider = (client: IScopedClusterClient) => {
query,
timeFieldName,
earliestMs,
latestMs
latestMs,
datafeedConfig
);
}

Expand Down Expand Up @@ -142,7 +145,8 @@ export function calculateModelMemoryLimitProvider(
timeFieldName: string,
earliestMs: number,
latestMs: number,
allowMMLGreaterThanMax = false
allowMMLGreaterThanMax = false,
datafeedConfig?: DatafeedOverride
): Promise<ModelMemoryEstimationResult> {
const { body: info } = await mlClient.info<MlInfoResponse>();
const maxModelMemoryLimit = info.limits.max_model_memory_limit?.toUpperCase();
Expand All @@ -154,7 +158,8 @@ export function calculateModelMemoryLimitProvider(
query,
timeFieldName,
earliestMs,
latestMs
latestMs,
datafeedConfig
);

const { body } = await mlClient.estimateModelMemory<ModelMemoryEstimateResponse>({
Expand Down
Loading