Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Explain log rate spikes: Analysis API endpoint. #135058

Merged
merged 17 commits into from
Jun 28, 2022

Conversation

walterra
Copy link
Contributor

@walterra walterra commented Jun 23, 2022

Summary

Part of #136265.

Updates the API endpoint "/internal/aiops/explain_log_rate_spikes" to work with real data instead of the dummy example.

The analysis now includes identifying field candidates and statistically significant field value pairs. Because there is no UI in place yet to allow a user to select custom time ranges for baseline and deviation, for now we fetch a date histogram of the selected index pattern and identify the parameters to get passed as WindowParameters automatically by picking the part of the date histogram with the highest doc count.

This PR is more about getting the API endpoint into shape and updates integration tests too. The UI code should be considered temporary. At the moment it just outputs the returned analysis state as raw JSON in a EUI code block.

image

Checklist

Delete any items that are not applicable to this PR.

@walterra walterra self-assigned this Jun 23, 2022
@walterra walterra force-pushed the ml-aiops-api-explain-log-rate-spikes branch from a5d83d0 to dff118e Compare June 24, 2022 11:23
@walterra walterra force-pushed the ml-aiops-api-explain-log-rate-spikes branch from dff118e to d478d0d Compare June 27, 2022 09:05
@walterra walterra marked this pull request as ready for review June 27, 2022 12:46
@walterra walterra requested a review from a team as a code owner June 27, 2022 12:46
@walterra walterra added :ml release_note:skip Skip the PR/issue when compiling release notes v8.4.0 labels Jun 27, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

@@ -19,4 +19,4 @@ export const PLUGIN_NAME = 'AIOps';
* This is an internal hard coded feature flag so we can easily turn on/off the
* "Explain log rate spikes UI" during development until the first release.
*/
export const AIOPS_ENABLED = false;
export const AIOPS_ENABLED = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably want to switch this back to false for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with Pete and we're going to leave this enabled for now.

Copy link
Contributor

@alvarezmelissa87 alvarezmelissa87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ⚡ 🙏

const minWindowGap = 5 * 60 * 1000; // 5min

// work out bounds
const deviationWindow = Math.max(totalWindow / 10, minDeviationWindow);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would benefit from a comment to say why you are picking 10 and 3.5 here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 87d9d8f and fd3590b.

{
// TODO Consider time ranges.
start: 0,
end: 2147483647000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment to show what this end time is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 87d9d8f.

</EuiText>
</EuiFlexItem>
<EuiFlexItem>
<EuiProgress
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bad choice of data set, but with the event_rate data set, the analysis stops at 20%. As discussed, it should abort and return 100%

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9c3ad32.

if (shouldStop) {
end();
return;
// Async IIFE to run the analysis while not blocking returning `responseWithHeaders`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you select a data view without a time field, you just get a blank page. The API should return an error so we can handle this better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, will pick this up in a follow up, it needs a bit more refactoring and error handling.

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
aiops 96 98 +2
ml 1619 1666 +47
total +49

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/aiops-utils 18 24 +6

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
aiops 275.6KB 277.4KB +1.8KB
ml 3.3MB 3.4MB +89.1KB
total +90.9KB

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
kibana 381 382 +1

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
aiops 3.6KB 3.7KB +107.0B
Unknown metric groups

API count

id before after diff
@kbn/aiops-utils 35 46 +11
aiops 11 12 +1
total +12

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @walterra

Copy link
Contributor

@peteharverson peteharverson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest changes LGTM

@walterra walterra merged commit a0f69e0 into elastic:main Jun 28, 2022
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Jun 28, 2022
@walterra walterra deleted the ml-aiops-api-explain-log-rate-spikes branch June 28, 2022 11:26
@walterra walterra mentioned this pull request Jun 30, 2022
18 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting :ml release_note:skip Skip the PR/issue when compiling release notes v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants