Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Infra UI] Infrastructure metrics explorer #28027

Closed
makwarth opened this issue Jan 3, 2019 · 3 comments · Fixed by #34019 or #35846
Closed

[Infra UI] Infrastructure metrics explorer #28027

makwarth opened this issue Jan 3, 2019 · 3 comments · Fixed by #34019 or #35846
Assignees
Labels
Feature:Metrics UI Metrics UI feature Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services

Comments

@makwarth
Copy link

makwarth commented Jan 3, 2019

Discussion issue
This issue depends on the navigation changes in #27916
Co-authors: @roncohen @exekias

Motivation

  • Provide a dedicated, simple and fast way to chart and discover historic performance trends of infrastructure time series metrics for Infrastructure app users

TSVB somewhat supports this use cases, but requires a lot of knowledge and configuration.
Here's some challenges with using TSVB for this use case:

  • Index pattern configuration required
  • Doesn't lead with the most relevant input: The time series metric you want to explore
  • Field dropdowns shows all Metricbeat fields (though filtered by keyword/integer values), which is very overwhelming
  • Derivative aggregations are too complicated
  • Too many irrelevant configuration options for basic infrastructure time series exploration
  • Another app

Proposed initial solution

1
ES index pattern is pre-configured via Infrastructure's configuration options.

2
Auto-suggestion will only show metrics with numeric values within active time range.

3
Users get a graph with just one single input (metric field name)
Checkbox for derivative/rate. (Later, it'd be nice to get metric type automatically to solve this automatically)
Ideally the metric explorer will support the unit convention we already use for pct and bytes

4
Similar to metric field, the auto-suggestion only suggests fields with keyword values within active time range

5
One chart per beat.hostname

6
One chart per metric per beat.hostname

@makwarth makwarth added discuss Feature:Metrics UI Metrics UI feature labels Jan 3, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/infrastructure-ui

@simianhacker
Copy link
Member

With regards to how many hosts we bring back for the "grouping". If we use the TSVB backend for the data we are going to be limited to using a traditional terms agg which you have to provide a limit (defaults to 10). One idea would be to use a composite aggregation to paginate through the groupings. Then use TSVB to get the metrics only for the current page of results. So if we are showing 9 charts per page then there would only be 9 TSVB requests (in parallel) which would be like having 9 TSVB charts on a page (which performs well today).

@simianhacker
Copy link
Member

simianhacker commented Apr 3, 2019

I posted an "draft" PR at: #34019

For the most part this is feature complete with regards to what's been outlined in this ticket. I invite y'all to load the Draft PR in a development env and play around to get a feel for how things work. It would be great to get early feedback before the functional tests are written. Feel free to point out any bugs you find (there should be some) but more importantly focus on usability feedback. By no means is this a finished product still lot's of polishing todo.

There are a few areas that we should discuss functionality wise.

  • I limited the number of metrics you can add to 5. I picked that number because charts with more than 5 series tend to become pretty busy and their utility declines from there.
  • Instead of using a checkbox/switch for rate I decided to make it an aggregation in the drop down. When it was just a checkbox/switch it felt very error prone. Almost all cases where the user wants a rate what they really want is derivative(max(some.kind.of.counter.field)). The use case that we are leaving on the table is when someone wants an actual derivative (or rate) of an average or count.
  • For the rate() aggregation I'm scaling the results to per second which I think is the most common unit. Do we want that to be configurable?
  • Here are the aggregations I added.
    image
    You can see rate which isn't really an aggregation available in Elasticsearch but a special combination of aggregations and pipelines; we can add more. A few that I've been thinking about:
    • standard deviation - we could do this as a faded band so you can see where the upper and lower boundaries are.
    • percentile - this will need a field for setting the percentile when the user selects this option otherwise we could just hard code it and make the options 95th Percentile
    • overall max
    • overall min
    • overall standard deviation - faded band as well
    • log event rate which is like count but a per second rate of the events. This would be useful for showing error rates.
    • log event count like above but not a scaled rate
  • The index pattern is set to metricAlias which is set to metricbeat-* by default. This can be set via the "settings" UI. What's the harm in using both metricAlais and logsAlias to include logs as well? I can imagine a scenario where we want to allow the users to plot error rates OR we can add an log event rate aggregation (see above) that only runs against the log alias.
  • The fields dropdown and the Kuery autocomplete include fields that may not have values are actually exist; they are both using the same service. In TSVB we only show group by fields that have actual values instead of everything in the mappings. Should we enhance that service to only return fields with data? This will improve the experience for the Custom Group By feature, Search Fields in both infra and logs and this UI. This will solve the aerospike.* problem. Bad idea because there are 2400 fields and querying ES to see if those fields exist takes too long.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Metrics UI Metrics UI feature Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants