-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] AIOps: Adds dip support to log rate analysis in ML AIOps Labs #163100
[ML] AIOps: Adds dip support to log rate analysis in ML AIOps Labs #163100
Conversation
34691f9
to
c7d80fe
Compare
3297e7e
to
2223676
Compare
Pinging @elastic/ml-ui (:ml) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good - just tested inside the ML AIOps Labs page. All the examples I ran correctly detected if it was a dip or a spike.
Left a few comments, mostly related to the text.
x-pack/plugins/aiops/public/components/log_rate_analysis/log_rate_analysis_results.tsx
Outdated
Show resolved
Hide resolved
...alerting/log_threshold/components/alert_details_app_section/components/log_rate_analysis.tsx
Show resolved
Hide resolved
}) | ||
: i18n.translate('xpack.aiops.analysis.analysisTypeDipCallOutContent', { | ||
defaultMessage: | ||
'The median log rate in the selected deviation time range is lower than the baseline. Therefore, the analysis results table shows statistically significant items within the baseline time range that are less present or missing within the deviation time range. The "doc count" column refers to the amount of documents in the baseline time range.', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
less in number
better than less present
? Any thoughts @szabosteve ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 4f8c71a.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm late to the party! Yes, less in number
is a clearer solution. Thanks for updating!
...alerting/log_threshold/components/alert_details_app_section/components/log_rate_analysis.tsx
Outdated
Show resolved
Hide resolved
*/ | ||
time: number | string; | ||
/** | ||
* Number of doc count for that time bucket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we able to calculate how the doc count (per bucket) for the deviation compares to the baseline? I end up wanting to know how the counts compare, rather than just a single count number for the baseline / deviation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I thought about that too, it would be good to have both a baseline and deviation column in the table. To make the numbers comparable it would be good to show median per bucket (to use the same measure we use to define if it's spike or dip). I added an item to the meta issue, I'd like to add that in a separate PR: #160247
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deferring to the @elastic/actionable-observability team for review of the alerting-related changes
x-pack/packages/ml/aiops_components/src/document_count_chart/document_count_chart.tsx
Outdated
Show resolved
Hide resolved
x-pack/packages/ml/aiops_components/src/document_count_chart/document_count_chart.tsx
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest edits LGTM
LGTM 🎉 |
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]Module Count
Public APIs missing comments
Async chunks
Public APIs missing exports
Page load bundle
Unknown metric groupsAPI count
History
To update your PR or re-run it, just comment with: cc @walterra |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -60,11 +58,9 @@ export const LogRateAnalysis: FC<AlertDetailsLogRateAnalysisSectionProps> = ({ r | |||
const [dataView, setDataView] = useState<DataView | undefined>(); | |||
const [esSearchQuery, setEsSearchQuery] = useState<QueryDslQueryContainer | undefined>(); | |||
const [logRateAnalysisParams, setLogRateAnalysisParams] = useState< | |||
{ significantFieldValues: SignificantFieldValue[] } | undefined | |||
| { logRateAnalysisType: LogRateAnalysisType; significantFieldValues: SignificantFieldValue[] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: There is an extra |
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's caused by our linting rules because the first item starts at a new line.
elastic#163100) This updates log rate analysis to be able to auto-detect whether the selected deviation is a spike or dip compared to the baseline time range. To achieve this, we compare the median bucket size of the two selections. If a dip gets detected, the analysis will then switch the window parameters sent to the API endpoint to run the analysis. An info callout points out the auto-selected analysis type and explains to which time range the analysis results refer to. We need to do this to make it clear that for dip analysis the significant terms and their doc counts refer to the baseline time range and vice versa for spike analysis.
Summary
Part of #161832.
This updates log rate analysis to be able to auto-detect whether the selected deviation is a spike or dip compared to the baseline time range. To achieve this, we compare the median bucket size of the two selections. If a dip gets detected, the analysis will then switch the window parameters sent to the API endpoint to run the analysis.
An info callout points out the auto-selected analysis type and explains to which time range the analysis results refer to. We need to do this to make it clear that for dip analysis the significant terms and their doc counts refer to the baseline time range and vice versa for spike analysis.
Log rate spike
Log rate dip
Functional tests
The artificial logs dataset generator for functional tests was updated to be able to also produce a dataset with a dip. Functional tests have been added to make use of that:
Observability Alert Details Page
In Observability, since we now auto-detect the analysis type, we no longer need to pass on the analysis type in the alert details page from the alert context. Instead, the analysis type will be part of the
onAnalysisComplete()
callback. The prompt for the AI assistant was updated to include the information about spike/dip that's also present in the in callout to users.Checklist