Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Alerting] Surface rule execution durations in rule details #114616

Closed
ymao1 opened this issue Oct 12, 2021 · 4 comments · Fixed by #114719
Closed

[Alerting] Surface rule execution durations in rule details #114616

ymao1 opened this issue Oct 12, 2021 · 4 comments · Fixed by #114719
Assignees
Labels
estimate:small Small Estimated Level of Effort Feature:Alerting/RulesManagement Issues related to the Rules Management UX Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@ymao1
Copy link
Contributor

ymao1 commented Oct 12, 2021

POC here

As part of the effort to address long-running rules, we would like to surface rule execution duration information (currently stored in the event log) in the UI.

With this PR, we are surfacing it in the Rule Management View. This issue covers surfacing it in the Rule Details view.

In the rule details view. We can calculate statistics for avg/min/max duration for all the retrieved event log entries in the getAlertInstanceSummary function and show them in the UI. We could also show a warning if the average execution duration for the rule greatly exceeds the configured schedule (for example, if the rule is scheduled to run every minute and the average duration is 25 minutes, we should tell the user!)
Example:
Screen Shot 2021-09-09 at 9 21 54 AM

@ymao1 ymao1 added estimate:small Small Estimated Level of Effort Feature:Alerting/RulesManagement Issues related to the Rules Management UX Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Oct 12, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@ymao1
Copy link
Contributor Author

ymao1 commented Oct 12, 2021

@mdefazio I have split out the Rule Details view into its own issue. When you are ready with the mockup, can you post to this issue? Thanks!

@mdefazio
Copy link
Contributor

mdefazio commented Oct 12, 2021

Here are some initial thoughts on how to show these values in the detail page. I've taken some cues from Security's Host page and how they show metrics there (with some obvious slight differences since we want to show 3 values in relation to a single chart, compared to 1 value per chart on the Hosts page).

I'm including a dropdown to provide the user the ability to show last 60, last 120, etc (or whatever count we determine is best here). This may be out of scope, so i'm fine omitting it for now. We may want some indication that we are only showing the last 60 executions (assuming this is also the case for the alerts table on this page)?

I've grayed out the alert table as these mockups are only meant to focus on the addition of the metrics above the table.

Video walkthrough:
https://user-images.githubusercontent.com/3756330/136995977-a1136356-0252-421a-873b-3180eea25c27.mp4


Screenshots:

No data
image

Only a few executions
I think it will be easier to read and understand if the bars for the execution chart are always in relation to 60 (as opposed to a single bar filling the width of the chart). Let me know if this poses issues though.

image

At least 60 executions
image

Warning message within values
Also showing what this may look like if we wanted to show more than the last 60 executions
image

@mdefazio
Copy link
Contributor

Here are updated mockups that show the following metrics:

  • Last response
  • Average duration
  • Execution duration chart (with average indicator)

If the average duration exceeds the rule interval schedule, then I think we should show a warning on this panel.

Default view:
image

Warning panel and tooltips:
image

@ymao1 ymao1 self-assigned this Oct 14, 2021
@kobelb kobelb added the needs-team Issues missing a team label label Jan 31, 2022
@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
estimate:small Small Estimated Level of Effort Feature:Alerting/RulesManagement Issues related to the Rules Management UX Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants