Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display event loop delay histogram data in the Status page #120667

Closed
TinaHeiligers opened this issue Dec 7, 2021 · 12 comments · Fixed by #121052
Closed

Display event loop delay histogram data in the Status page #120667

TinaHeiligers opened this issue Dec 7, 2021 · 12 comments · Fixed by #121052
Assignees
Labels
enhancement New value added to drive a business result Feature:StatusPage Issues related to the Kibana Status Page and APIs impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. loe:needs-research This issue requires some research before it can be worked on or estimated Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v7.17.0 v8.0.0 v8.1.0

Comments

@TinaHeiligers
Copy link
Contributor

TinaHeiligers commented Dec 7, 2021

In v7.x, we enhanced the ProcessMetricsCollector to report the the event loop delay distribution as well as the mean. These additional metrics help with monitoring and diagnosing performance, because they can be used to roughly estimate the number of cycles that went above the mean for the deployment.

To help diagnose performance problems, we added a subset of the percentile distribution to the metrics.ops logger that we could monitor and use if needed.

We should add these to the status page as supplemental info to the mean.

@TinaHeiligers TinaHeiligers added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc enhancement New value added to drive a business result loe:needs-research This issue requires some research before it can be worked on or estimated impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. v8.1.0 Feature:StatusPage Issues related to the Kibana Status Page and APIs labels Dec 7, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@TinaHeiligers
Copy link
Contributor Author

TinaHeiligers commented Dec 7, 2021

How should we display the data?

@pgayvallet @lukeelmers I'm leaning toward the second option of showing the percentiles as a card footer:

  • Add the mean and the percentiles as separate EuiCard components?
    e.g:
    Screen Shot 2021-12-07 at 13 48 30
  • Show a subset of the percentiles (50th, 95th and 99th for consistency with the logs) as a footer in the same delay card?
    Screen Shot 2021-12-07 at 14 44 12
    If we wanted to use a nested EuiCard we'd probably want to add a 'show-more' action to expand the details and reconsider the layout of the cards:
    Screen Shot 2021-12-07 at 16 51 26
  • Some other way...
    WDYT?

@lukeelmers
Copy link
Member

I don't have strong feelings on this, but my initial reaction is that option 2 makes the most sense as well (just using the existing delay card, maybe also including the actual 50, 95, 99 percentiles for clarity).

I don't think we need to get particularly fancy with this; IMO the main thing is just showing the data somewhere.

@mshustov
Copy link
Contributor

mshustov commented Dec 8, 2021

I like the option Show a subset of the percentiles (50th, 95th and 99th for consistency with the logs) as a footer in the same delay card that a user can find the value at the first glance.
a few suggestions for format:

  • dealy should point that is the mean value
  • percentiles should have hints about the levels.
mean delay: 11.04 ms
percentiles: 50: 10.8ms; 95: 12.56ms ; 99: 12.61ms;

I'd improve Load section with additional details as well.

@pgayvallet
Copy link
Contributor

FWIW, this cards layout (and the whole MetricTile / formatMetric wanna-be-generic approach) really restrict our options when adding new info to this page... We will probably need to stop using MetricTile for everything in favor or more specific components.

I'd probably go with Show a subset of the percentiles (50th, 95th and 99th for consistency with the logs) as a footer in the same delay card,but having a single (and tall) card on the last line seems ugly... We may need some design guidance on that one?

@TinaHeiligers
Copy link
Contributor Author

TinaHeiligers commented Dec 8, 2021

I don't think we need to get particularly fancy with this; IMO the main thing is just showing the data somewhere.

We could use the footer as a plain text rather than the MetricCard as a once-off conditional for now if we also want this to land in 7.17.0 (the option that doesn't stretch the card). Which versions should we be targeting here? I'm guessing 7.17 + based on the target version for adding the percentiles to the logs.

We will probably need to stop using MetricTile for everything in favor or more specific components.

Building some generic card components that could expand as an option in the footer or similar to the actions we have in the Saved Objects Table could be an option. Or some way on expanding a card to show more detail.

As to where and how to show the extra detail, ya, we'll need design-guidance.
@elastic/stack-design we need some guidance on the best way to add details to a card please.

@ryankeairns
Copy link
Contributor

ryankeairns commented Dec 8, 2021

A couple of quick ideas:

  • combine the metrics into groups (i.e. one panel contains several metrics).
  • use EuiStat large (l) size for the main/top number and extra small (xs) for the footer metrics (see docs here)

Screen Shot 2021-12-08 at 11 10 34 AM

It is an awful lot of panels, but short of doing a mockup (in the future) these tweaks may get you by for now.

@TinaHeiligers
Copy link
Contributor Author

TinaHeiligers commented Dec 9, 2021

having a single (and tall) card on the last line seems ugly... We may need some design guidance on that one?

@ryankeairns Do you have any suggestions on how to best display an odd number of metrics? We used to have 6 that aligned nicely on two rows or three each but now we're adding another one to make it 7. We could group a few more of the existing metrics into one card (response time ave & response time max, heap used & heap total) but then we'd again end up with an odd number of entries to display.

@pgayvallet How does this look to you:
Screen Shot 2021-12-10 at 16 02 21
I've gone with grouping the response time metrics together because the average is easier to interpret in terms of the max.
The PR's just about ready, I'll push up a draft soon for us to discuss.

It's no where perfect but we'll probably do a whole redesign anyway when the status page has a "home" in Kibana's nav, if not sooner.

@ryankeairns
Copy link
Contributor

ryankeairns commented Dec 13, 2021

@TinaHeiligers +1 to spending some time on a redesign in the near future. It would be best to work through a few options and discuss which metrics are related, etc. Short of that, it's going to remain a bit messy.

@pgayvallet
Copy link
Contributor

pgayvallet commented Dec 13, 2021

How does this look to you:

Like it would still greatly benefit from a design rework as you said. However this is WAY better than the initial screenshots, so it will do perfectly for now in my opinion.

EDIT: posted at the exact same time as @ryankeairns. If we all think we should revisit this page's design, we can also wait until doing so before adding the new metrics. But adding them now and then thinking about the design would also allow to have all the metrics in the current page to think about the re-design, so it's probably for the best?

@TinaHeiligers
Copy link
Contributor Author

@pgayvallet The PR is ready for review: #121052

@ryankeairns
Copy link
Contributor

ryankeairns commented Dec 13, 2021

To be clear, I'm good with adding metrics now and doing design work later. Just reach out once it's been prioritized on your end and we can slot in some designer time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Feature:StatusPage Issues related to the Kibana Status Page and APIs impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. loe:needs-research This issue requires some research before it can be worked on or estimated Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v7.17.0 v8.0.0 v8.1.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants