Improve loading time of Kubernetes package Dashboards #31021

MichaelKatsoulis · 2022-03-28T13:17:32Z

We have found some opportunities for optimization in the dashboards::

Top CPU intensive pods gets the max of kubernetes.container.cpu.usage.core.ns, then uses derivative aggregation over it and keeps the positive values. We can simply use kubernetes.container.cpu.usage.node.pct instead and group by the pod name.
Same for Top Memory intensive pods
CPU Usage by node sums all cpu usage nanocores per container, then uses a painless script to normalise it to the metricset period and groups by the node name. Instead we can use the node metric Kubernetes.node.cpu.usage.nanocores and divide it with kubernetes.node.cpu.allocatable.cores. Same approach is used in metrics UI
Same for Memory Usage by node. We can divide kubernetes.node.memory.usage.bytes to kubernetes.node.memory.allocatable.bytes
Same approach for network in and out bytes

I tested that by creating a separate dashboard with all those visualisations optimised and the loading time for 24h range decreased from 1m and 10 seconds down to 30 seconds.

The text was updated successfully, but these errors were encountered:

MichaelKatsoulis · 2022-03-30T11:09:57Z

My suggestion regarding the default kubernetes dashboard optimization is to split it into 2 different dashboards.
The split can be based on the concept of each visualisation's data.
Meaning that some of them make sense to be displayed per time, while others make more sense to display the current value as a number.

For example for the number of available/desired/unavailable pods or number of nodes it is most important is to display the current situation in the cluster.

While for other visualisations like Top CPU intensive pods or CPU utilization per node it would be insightful to display the evolution of the value per time. A user would like to see how the cpu utilisations of a specific pod or node has changed over the past week.

The two dashboards could like this:

Time series dashboard:

Current state dashboard:

This grouping can make the dashboards more performant as less queries will be performed simultaneously.
Also costly queries with aggregations over big time range will only be performed for the vis that make sense.

MichaelKatsoulis · 2022-03-30T11:11:02Z

cc @ChrsMark , @tetianakravchenko , @ruflin, @mlunadia, @gsantoro

ruflin · 2022-03-31T06:51:28Z

++ on splitting up the dashboards. Will the dashboards link to each other?

ChrsMark · 2022-03-31T07:16:07Z

That would be great. We do the same for Istio module to split the control plane from data plane views:
https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-module-istio.html#_dashboard_30

MichaelKatsoulis · 2022-03-31T08:04:47Z

Yes I was thinking something like Istio tab view! That would be great! Does this allow to set different time ranges to each one?

ruflin · 2022-03-31T08:08:28Z

I thought by now there is a new / better way on how to link dashboards together. @alexfrancoeur You might be able to point us to the right direction here?

alexfrancoeur · 2022-03-31T15:18:34Z

Thanks for the ping @ruflin. I think there are a number of best practices these integration dashboards can start to leverage. I've listed a bunch here in the past.

Kibana has drilldown capabilities in a dashboard (https://www.elastic.co/guide/en/kibana/current/drilldowns.html#create-drilldowns). This is great for creating workflows from dashboard to dashboard. An overview dashboard to a details dashboard for example. We support dashboard to dashboard and dashboard to external URLs (paid feature). For the integrations dashboard, a combination of markdown for general navigation and drilldowns for workflows is probably the best option.

If you'd like to sit down with some kibana folks and discuss best practices, we're happy to engage. For example we no longer need to ship 100's of visualizations referenced by a dashboard, we can simplify to package all in a single dashboard JSON now.

This is off topic, but while we're on the topic of dashboard linking, I think it's worth raising if we should be linking to solutions as well. There is probably some low hanging fruit here. Rather than taking that context and navigating to a dashboard, we could apply it to a solution view to create solution drilldowns. I hack together this all the time for demos using the URL drilldowns. Meaning if there's a host IP in a dashboard, let's click into that to navigate to a filtered view in the metrics app. Building these experiences as part of our integrations add for a much more integrated experience when onboarding a new data source. If there's interest in collaborating on something like this, let's have a quick chat with myself and @sixstringcode

ruflin · 2022-04-01T07:11:56Z

Thanks for the list @alexfrancoeur . The drilldown one is the one I was looking for. There lots of other great hints in the issue you linked.

About "as value" I just had a conversation with your team and we should figure out ways how to automatically convert it. @ChrsMark @MichaelKatsoulis If we redo the k8s dashboards, lets use these best practices directly as an example and also switch to "value".

On the linking to solutions, ++. It is a topic we should also involve @jasonrhodes from unified observability.

ruflin · 2022-04-01T07:18:01Z

I've been attending the TSDB meeting yesterday and @Mpdreamz showed off a demo for using TSDB in APM. A first version of the metrics parts are merged into main in Elasticsearch and available in the snapshot builds. I think it is also worth trying this out for the k8s data to see what impact it has.

My understanding is that currently the storage and query part are available but we can't make use of it yet in Kibana. Also in the package-spec we don't support the time series fields yet. What it means is that we have to adjust the templates manually and see what affect it has.

@imotov Tried to find some public docs I can point the team to around mappings and TSDI but was not successful. Is this already available?

ruflin · 2022-04-01T07:23:07Z

I filed elastic/package-spec#311 to get support for TSDB in the package-spec.

MichaelKatsoulis · 2022-04-01T10:00:49Z

I read in the best practises and by @ruflin suggestions that moving to lens is the way forward. I don't see or maybe I don't know how some of our tsvb visualisations can be moved to lens.

I will give an example regarding desired pods. The field kubernetes.deployment.replicas.desired has a value per deployment like

kube_deployment_spec_replicas{namespace="default",deployment="hello-python"} 1
kube_deployment_spec_replicas{namespace="kube-system",deployment="coredns"} 2
kube_deployment_spec_replicas{namespace="kube-system",deployment="kube-state-metrics"} 1
kube_deployment_spec_replicas{namespace="local-path-storage",deployment="local-path-provisioner"} 1

and we want to sum up all the last values of this fields for all deployments.
If we compare seeming the same dashboards with same query in tsvb and lens we can spot huge differences.

None of the results is the correct one. But lens one extreme!
Tsvb result is actually affected by the interval.

There where discussions about this in elastic/integrations#2159 (comment) and @ChrsMark updated the tsvb query by using series aggregation and grouping by deployment name.

But as long as this is not supported in Lens, I don't see how we can use it for such cases.

MichaelKatsoulis · 2022-04-05T15:22:53Z

Additional thoughts following @ChrsMark suggestions in https://github.com/elastic/enhancements/issues/14008#issuecomment-1088524593

We could have:

Different dashboards per Kubernetes resource (deployments, Daemonsets, StatefulSets) with useful informations for the pods controlled by them (cpu, memory, network, disk).
Each dashboard could have a dropdown menu where user can choose pods of which namespace and which deployment/daemonset/statefulset name to see metrics for.
Separate dashboard for node metrics of the cluster with dropdown menu for each node name.
An overview dashboard with some cluster wide information (number of deployments replicas available , number of daemonset replicas available , number of nodes). Each of the vis of this dashboard can be a drill down that leads to the more detailed dashboards of step 1 and 3.
Stream/log k8s events should also be part of the dashboards.

ruflin · 2022-04-07T06:49:34Z

@MichaelKatsoulis I like your proposal above. It would be great if we could work on these dashboards in collaboration with the team from @flash1293 We can't necessarily achive everything with Lens now and all the other great features in Kibana but we should be able to eventually. Please keep the conversations going.

MichaelKatsoulis · 2022-04-07T09:38:34Z

I played around with drill down. It doesn't work exactly as demonstrated and documented in https://www.elastic.co/guide/en/kibana/current/drilldowns.html#_create_the_dashboard_drilldown. I was expecting a Go to dashboard option when creating a new drill down. But I only see a Go to URL. I use version 8.1.2 and also tried with 8.2.0-SNAPSHOT.

flash1293 · 2022-04-07T10:46:29Z

@MichaelKatsoulis This is expected for metric visualizations - the "Go to Dashboard" drilldown is tied to the filter trigger, that means it's only shown in case the visualization can place a filter (and if it happens, it will prompt the user to go to the other dashboard instead). There are no plans for TSVB, but we do plan to add this functionality for Lens metric visualizations: elastic/kibana#122879

MichaelKatsoulis · 2022-04-07T11:00:41Z

@flash1293 thanks for the clarification. To be honest I don't understand why it is tied to the filter trigger.
In our case I want an overview dash like

and when the user wants to see extra detailed info for the Nodes(currently tsvb but could be lens) it will point to

I have currently created drilldown with go to URL but this won't work for an out of the box dashboard as the url of the dash will be different.

flash1293 · 2022-04-07T11:08:36Z

Completely agree, that’s what we will work on for Lens in 8.3

MichaelKatsoulis · 2022-04-07T12:41:09Z

An extra thing that could be discussed regarding drilldown is the user experience. Instead of the user having to press the options button and then select the drill down name like:

There should be an easier and more clear way.
If I were the user I would not understand that this red 1 on the vis means that there is a drilldown, and in order to see it I need two more steps.
Probably pressing on the 1 (or whatever that makes more sense) should navigate them to the dash.

flash1293 · 2022-04-07T12:56:09Z

The 1 is only visible in edit mode, it's not shown in view mode (which should be the common case for users)
The easiest integration for [Lens] Allow metric visualization to drill down kibana#122879 is to allow the user to click into the visualization (e.g. on the "pods" text), then getting a context menu which allows them to navigate. We can think about how to provide an affordance during implementation

MichaelKatsoulis · 2022-04-07T14:37:14Z

@flash1293 I agree with your second bullet. That would be a good way. As it is now, there is no way a user can understand there is something more. The word drilldown also does not mean anything to someone that doesn't know what it is.

MichaelKatsoulis · 2022-04-07T14:53:26Z

As an update for the effort so far:

I have created the following dashboards which are connected with drilldown (with go to url, waiting for the go to dashboards in lens)

MichaelKatsoulis · 2022-04-18T13:57:14Z

@flash1293 Could we arrange a zoom call whenever possible to ask you about best ways to show some metrics in Kibana?
I want to create some nice gauges but to get those numbers, series aggregations are needed and then mathematical formulas like division.

I can do things like this. But cannot get to use those two number for a division to get the percentage.

jasonrhodes · 2022-04-18T19:04:09Z

@katefarrar @mlunadia it would be great for us to try to understand what it is about these dashboards that people want/need/use as we try to think through the infrastructure UI.

MichaelKatsoulis · 2022-04-21T08:43:01Z

We had a nice discussion with @flash1293 about ways to create some visualizations and we concluded that some things are not possible yet. But they can be in the near future.
Until then we can use some workarounds when showing informations like memory reserved, memory used, cores reserved, cores used, pods reserved using the mark-down option.

Ideally we would like to be able to math calculation with the numbers in each vis to get the percentage.
Also we are waiting for the drill down option to be available in Lens in 8.3 or 8.4 release to better connect the dashboards between each other.

mlunadia · 2022-04-21T15:47:14Z

@jasonrhodes 100% we have plans to tackle this holistically and will for now address any low hanging fruit. We have already started working on establishing a baseline with different discovery activities one of them will be bringing your input in.

jasonrhodes · 2022-04-21T16:19:08Z

@elastic/infra-monitoring-ui I wonder if there are things we can learn from the optimizations in this issue that could be applied to any other querying we are doing for infra UI.

miltonhultgren · 2022-04-25T08:39:45Z

I guess there are two things we can do:

From a joint product perspective look at which visualizations we have in the UI today that could be changed to a gauge or single value instead of a trend line. I think today we almost only use trend lines? Changing that could allow us to change to a more performant query while also giving better feedback about the data to the user. Having multiple types of visualizations feels natural.
Optimize the queries themselves. I'm a bit hesitant about this since it also requires the in-depth domain knowledge about which field means what and which aggregations causes that field to mean something else. What we could do however is try to take stock of the queries we do and how they perform (similar to the SM work we're doing) and then pick the top X and see if we can optimize them or feed them into point 1.

jasonrhodes · 2022-04-26T17:17:50Z

@miltonhultgren thanks! These sound like good ideas to me. I'm wondering if there are specific optimizations made in the work related to this issue (from @MichaelKatsoulis and others) that we could use to inform how we might optimize our own queries, but you're right that there is likely some work we'll need to do to understand whether that kind of overlap exists.

ChrsMark · 2022-06-14T11:15:42Z

Further improvements will take place with the usage of TSDB features. Investigations will take place along with Rally framework.

MichaelKatsoulis added enhancement Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team labels Mar 28, 2022

MichaelKatsoulis self-assigned this Mar 28, 2022

MichaelKatsoulis changed the title ~~Optimise Kubernetes Dashboards~~ Optimize Kubernetes Dashboards Mar 28, 2022

ChrsMark mentioned this issue Apr 4, 2022

Prepare requested resources visualizations for Kubernetes Observability elastic/integrations#1014

Closed

ChrsMark mentioned this issue Apr 6, 2022

Handle K8s events with invalid count or invalid Timestamps #31126

Open

MichaelKatsoulis mentioned this issue Apr 18, 2022

Add OOTB Kubernetes dashboard to Kubernetes integration & implement best dashboard creation practices elastic/integrations#3115

Merged

mlunadia changed the title ~~Optimize Kubernetes Dashboards~~ Improve loading time of Kubernetes package Dashboards Apr 28, 2022

MichaelKatsoulis mentioned this issue May 11, 2022

New kubernetes module dashboards #31591

Merged

6 tasks

MichaelKatsoulis closed this as completed in elastic/integrations#3115 Jun 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve loading time of Kubernetes package Dashboards #31021

Improve loading time of Kubernetes package Dashboards #31021

MichaelKatsoulis commented Mar 28, 2022 •

edited by andresrc

Loading

MichaelKatsoulis commented Mar 30, 2022

MichaelKatsoulis commented Mar 30, 2022 •

edited

Loading

ruflin commented Mar 31, 2022

ChrsMark commented Mar 31, 2022

MichaelKatsoulis commented Mar 31, 2022

ruflin commented Mar 31, 2022

alexfrancoeur commented Mar 31, 2022

ruflin commented Apr 1, 2022

ruflin commented Apr 1, 2022

ruflin commented Apr 1, 2022

MichaelKatsoulis commented Apr 1, 2022 •

edited

Loading

MichaelKatsoulis commented Apr 5, 2022

ruflin commented Apr 7, 2022

MichaelKatsoulis commented Apr 7, 2022

flash1293 commented Apr 7, 2022

MichaelKatsoulis commented Apr 7, 2022

flash1293 commented Apr 7, 2022

MichaelKatsoulis commented Apr 7, 2022

flash1293 commented Apr 7, 2022

MichaelKatsoulis commented Apr 7, 2022

MichaelKatsoulis commented Apr 7, 2022

MichaelKatsoulis commented Apr 18, 2022 •

edited

Loading

jasonrhodes commented Apr 18, 2022

MichaelKatsoulis commented Apr 21, 2022

mlunadia commented Apr 21, 2022

jasonrhodes commented Apr 21, 2022

miltonhultgren commented Apr 25, 2022 •

edited

Loading

jasonrhodes commented Apr 26, 2022

ChrsMark commented Jun 14, 2022 •

edited

Loading

Improve loading time of Kubernetes package Dashboards #31021

Improve loading time of Kubernetes package Dashboards #31021

Comments

MichaelKatsoulis commented Mar 28, 2022 • edited by andresrc Loading

MichaelKatsoulis commented Mar 30, 2022

MichaelKatsoulis commented Mar 30, 2022 • edited Loading

ruflin commented Mar 31, 2022

ChrsMark commented Mar 31, 2022

MichaelKatsoulis commented Mar 31, 2022

ruflin commented Mar 31, 2022

alexfrancoeur commented Mar 31, 2022

ruflin commented Apr 1, 2022

ruflin commented Apr 1, 2022

ruflin commented Apr 1, 2022

MichaelKatsoulis commented Apr 1, 2022 • edited Loading

MichaelKatsoulis commented Apr 5, 2022

ruflin commented Apr 7, 2022

MichaelKatsoulis commented Apr 7, 2022

flash1293 commented Apr 7, 2022

MichaelKatsoulis commented Apr 7, 2022

flash1293 commented Apr 7, 2022

MichaelKatsoulis commented Apr 7, 2022

flash1293 commented Apr 7, 2022

MichaelKatsoulis commented Apr 7, 2022

MichaelKatsoulis commented Apr 7, 2022

MichaelKatsoulis commented Apr 18, 2022 • edited Loading

jasonrhodes commented Apr 18, 2022

MichaelKatsoulis commented Apr 21, 2022

mlunadia commented Apr 21, 2022

jasonrhodes commented Apr 21, 2022

miltonhultgren commented Apr 25, 2022 • edited Loading

jasonrhodes commented Apr 26, 2022

ChrsMark commented Jun 14, 2022 • edited Loading

MichaelKatsoulis commented Mar 28, 2022 •

edited by andresrc

Loading

MichaelKatsoulis commented Mar 30, 2022 •

edited

Loading

MichaelKatsoulis commented Apr 1, 2022 •

edited

Loading

MichaelKatsoulis commented Apr 18, 2022 •

edited

Loading

miltonhultgren commented Apr 25, 2022 •

edited

Loading

ChrsMark commented Jun 14, 2022 •

edited

Loading