Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Data Frame Analytics creation wizard: add validation step (Part 1) #93478

Merged

Conversation

alvarezmelissa87
Copy link
Contributor

@alvarezmelissa87 alvarezmelissa87 commented Mar 3, 2021

Summary

Related meta issue #88248
Also addresses: #78637

Adds a validation step to the Data Frame Analytics creation wizard.
Adds functional tests for validation step.
Adds api integration test for validate endpoint.

  • Dependent variable checks

    • is constant
    • has more than 30% empty values
  • Training Percent

    • is too low - not enough training docs for model
    • is too high - time/resource heavy due to large number of training docs
  • Analysis fields

    • fields included in analysis have more than 30% empty values (TODO: list actual fields up to a limit?)
    • too many fields included - time/resource may be increased

Current limitations used to validate

export const TRAINING_DOCS_UPPER = 200000;
export const TRAINING_DOCS_LOWER = 200;
export const INCLUDED_FIELDS_THRESHOLD = 100;
// limits number of fields for MISSING agg in request
export const MINIMUM_NUM_FIELD_FOR_CHECK = 25;
export const FRACTION_EMPTY_LIMIT = 0.3;

Regression/Classification example with warning messages:

image

Regression/Classification example success:

image

Outlier detection with warning messages:

image

Outlier detection success:

image

Checklist

Delete any items that are not applicable to this PR.

@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

];
return (
<>
<EuiFlexGroup style={{ width: '70%' }}>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to rerun the validation before returning to the validation step? For example, I set the training percent too low initially, then went back to the Configuration section and increased it, but the warning didn't disappear until I went back into the Validation step.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this to be done in a follow up 👍

@alvarezmelissa87 alvarezmelissa87 changed the title [ML] Data Frame Analytics creation wizard: add validation step [ML] Data Frame Analytics creation wizard: add validation step (Part 1) Mar 4, 2021
Copy link
Contributor

@peteharverson peteharverson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Once this and the follow-ups in #88248 are in, we can continue to test with larger data sets. Currently I don't have a clear idea on whether the thresholds used in the tests need refining, but they worked fine on my local setup.

@alvarezmelissa87
Copy link
Contributor Author

This part 1 PR has been updated and is ready for a final look when you get a chance cc @jgowdyelastic 🙏

Copy link
Contributor

@lcawl lcawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some strings were complete sentences and others were not, so I synched on the former. Otherwise, text LGTM

switch (status) {
case VALIDATION_STATUS.INFO:
return 'primary';
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, the breaks are redundant in these switch statements as you're returning from inside the case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just noticed this is moved code. might be worth updating it still.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in a7e5fc69c41f0b104667516fc53457eebf0a649d

import { EuiLoadingSpinner, EuiSpacer } from '@elastic/eui';
import { i18n } from '@kbn/i18n';

import { ml } from '../../../../../services/ml_api_service';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for new code we should be using the context which provides this.

import { useMlApiContext } from '../../../../../contexts/kibana';

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in a7e5fc69c41f0b104667516fc53457eebf0a649d

@@ -166,4 +166,12 @@ export const dataFrameAnalytics = {
method: 'GET',
});
},
validateDataFrameAnalytics(analyticsConfig: DeepPartial<DataFrameAnalyticsConfig>) {
const body = JSON.stringify(analyticsConfig);
return http<any>({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we replace any with the return type here?
should be the return type of validateAnalyticsJob

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 943882adedeeaf1254d355f87d7b8549051e6c37

};

function getTrainingPercentMessage(trainingDocs: number) {
let trainingPercentMessage;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trainingPercentMessage probably isn't needed as we could return from within the if statement below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in a7e5fc69c41f0b104667516fc53457eebf0a649d

}

try {
const { body }: { body: ValidationSearchResult } = await asCurrentUser.search({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should be able to use the generic on search here.
something like

const { body } = await asCurrentUser.search<ValidationSearchResult>(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in a7e5fc69c41f0b104667516fc53457eebf0a649d

Copy link
Contributor

@lcawl lcawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Text strings LGTM!

Copy link
Member

@jgowdyelastic jgowdyelastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alvarezmelissa87 alvarezmelissa87 force-pushed the ml-dfa-validation-step branch from 943882a to 9e15668 Compare March 5, 2021 22:19
@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
ml 1735 1741 +6

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
ml 6.4MB 6.4MB +8.2KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
ml 69.9KB 70.1KB +163.0B

History

  • 💚 Build #111467 succeeded 943882adedeeaf1254d355f87d7b8549051e6c37
  • 💚 Build #111456 succeeded a7e5fc69c41f0b104667516fc53457eebf0a649d
  • 💛 Build #111410 was flaky 8458efe8cd2ff4ffe13319473cd357721dc40e89
  • 💚 Build #111164 succeeded a27f210bfc3f981021ba627aa6f32a542b2db70d
  • 💔 Build #111132 failed 64b4e9dd5da186820cd28990b244d906bc3d3964

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @alvarezmelissa87

@alvarezmelissa87 alvarezmelissa87 merged commit cac26b8 into elastic:master Mar 6, 2021
alvarezmelissa87 added a commit to alvarezmelissa87/kibana that referenced this pull request Mar 6, 2021
…1) (elastic#93478)

* wip: create validationStep component

* wip: trainingPercent check, analysisFields check. Step details

* move validation check to server

* handle no training percent in validation

* move callout component to shared dir

* use shared Callout component in AD val and update message headings

* update types

* adds functional tests for validation

* adds api integration test for validate endpoint

* consolidate messages for depvar and fields

* fix accessibility test

* update license

* update validation messages

* update types in validation model

* add jobValidationReturnType
@alvarezmelissa87 alvarezmelissa87 deleted the ml-dfa-validation-step branch March 6, 2021 02:50
alvarezmelissa87 added a commit that referenced this pull request Mar 6, 2021
…1) (#93478) (#93869)

* wip: create validationStep component

* wip: trainingPercent check, analysisFields check. Step details

* move validation check to server

* handle no training percent in validation

* move callout component to shared dir

* use shared Callout component in AD val and update message headings

* update types

* adds functional tests for validation

* adds api integration test for validate endpoint

* consolidate messages for depvar and fields

* fix accessibility test

* update license

* update validation messages

* update types in validation model

* add jobValidationReturnType
@peteharverson peteharverson added release_note:feature Makes this part of the condensed release notes and removed release_note:enhancement labels May 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Data Frame Analytics ML data frame analytics features :ml release_note:feature Makes this part of the condensed release notes v7.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants