Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve dashboard load performance #14750

Closed
stacey-gammon opened this issue Nov 3, 2017 · 15 comments
Closed

Improve dashboard load performance #14750

stacey-gammon opened this issue Nov 3, 2017 · 15 comments
Labels
Feature:Dashboard Dashboard related features performance release_note:enhancement Team:Visualizations Visualization editors, elastic-charts and infrastructure

Comments

@stacey-gammon
Copy link
Contributor

stacey-gammon commented Nov 3, 2017

Currently we send out a request for every embeddable on a dashboard in a single _msearch. This means one or two slow visualizations or saved searches can bog down an entire dashboard.

I'd like to explore ways to improve the performance and split up the requests.

One idea is to do a single _msearch for all requests but only request the hit count. Then make subsequent requests based off batching the individual requests on their hit count.

I'm not sure if hit count is the right metric to use though. It works in my sample cases where saved searches take the longest, and their hit count is 500, but it could be that on some datasets creating an aggregation over a long time span with a ton of data would only return a single hit count, yet take a long time to complete.

cc @pickypg - do you know if hit count correlates with query performance? Or am I off base? Maybe it's a combination of hit count plus index size (not sure if there is a way to get that information quickly).

Another idea thrown around was to use the scroll API to get data chunked by time, not visualization, and display the intermediate results. I'm not sure how useful this would be to people though. Would partial data be at all worthwhile to see while a slow query finishes loading, or would people find it more useful to see visualizations complete one at a time (or one group at a time), but with the full data.

somewhat related: #7215

cc @elastic/kibana-sharing

@nreese
Copy link
Contributor

nreese commented Nov 3, 2017

Why not as a simple first step, just separate saved searches into their own _msearch request?

@stacey-gammon
Copy link
Contributor Author

Maybe, but I think that starts us down a path of making too many assumptions, both at what we expect courier to handle (assuming we do this in courier and not have dashboard drive it), what the embeddable types are, and how long they will take.

What if someone adds a new embeddable type that takes a long time too?

Though embeddables still aren't a first class concept, so maybe I'm still thinking too far into the future.

Nothing is simple with courier, and I'm nervous about throwing in more one off code that doesn't fully solve the problem. But, still worthwhile to explore despite my initial misgivings!

@trevan
Copy link
Contributor

trevan commented Nov 3, 2017

I'm pretty sure that hit count isn't the best metric, but I don't know what it would be. Looking at one of my dashboards, I have a visualization that took 224 ms and it has a hit count of ~400,000 (it is a big number metric). Another one took 2s and it has a hit count of ~15,000 (a data table with 3 aggregations).

One problem with a separate _msearch for each embeddable is that large dashboards will generate a lot of requests and I think you'll start to hit browser limits (I think FF is 6 and Chrome is 10). We have a very common dashboard that has 20 normal Kibana visualizations plus 5 TSVB.

A crazy idea would be to use one request for all of the visualizations (including TSVB) and then after the first load, figure out which visualizations were really slow, store that information to the dashboard somewhere, and in future requests, split those away. Kind of a self-learning dashboard.

@stacey-gammon
Copy link
Contributor Author

Interesting. I wonder if aggregation type makes a difference.

Agree on the issue with a single separate search per embeddable - we'd need to chunk it up somehow in batches.

Definitely an interesting thought re: the self learning dashboard. I worry about that route getting complicated. e.g. you're on a slow network and your dashboard learns to do a single panel per batch (unless we could split out network latency time vs es response time...), how long would it take the dashboard to "unlearn" that and batch them up again when on a fast network. Or your es ends up getting bogged down during a busy part of the day with a lot of requests from various sources - does your dashboard learn quick enough to keep up with that or will it end up falling behind so when the traffic is busy, your still learning to make smaller batches, then by the time the traffic is low, you're dashboard needs to take time to learn to make bigger batches.

It sounds like a really interesting experiment, I just worry about maintainability, finding the right algorithm, and how long of an effort that would take. We do have machine learning experts at elastic, but if there was some other metric we could use, we might be able to improve with a simpler method.

IMO, the best scenario would be if es implemented streaming, then they would be in charge of figuring out how to batch up the returned responses, not us on the client, and we'd only have to send out a single request for all the data.

I wonder what would happen if we put the streaming logic on the kibana server side. The client handles streamed responses, and the server handles querying es. Feels like this would be faster going from Kibana server -> es server rather than kibana client -> es server... but I have no data to back that up. Maybe the Kibana server would end up being a bottleneck with multiple clients if we did it that way.

@trevan
Copy link
Contributor

trevan commented Nov 3, 2017

With timelion, tsvb, kibana visualizations, and other embeddables, you'd probably want to do the streaming logic on the Kibana server side.

@alepuccetti
Copy link

@trevan

One problem with a separate _msearch for each embeddable is that large dashboards will generate a lot of requests and I think you'll start to hit browser limits (I think FF is 6 and Chrome is 10). We have a very common dashboard that has 20 normal Kibana visualizations plus 5 TSVB.

What about run a maximum number of _msearch prioritizing the visualizations that are actually displayed on the screen? The challenge would be to handle dates that use now to keep consistency when the queued _msearch are fired.

About my personal experience, we have dashboards doing a lot of aggregations over hundreds of millions of documents and having Kibana responsiveness tied to the slowest one is not ideal. So having multiple _msearch even a queue of them would be nice. I would go even as far as having 1 request per visualization and resolved them in parallel prioritizing the ones on the screen at the moment.

@trevan
Copy link
Contributor

trevan commented Nov 21, 2017

@alepuccetti, we use really tiny visualizations to pack it in our screen. Of those 20+5 visualizations, 12 of them are visible. So we'd still hit the browser limit.

I kind of like the streaming idea, though it is a bigger change. If all requests for visualizations/embeddables were sent as one request to Kibana's backend and then have each of those requests sent individually to ES and stream back the results as they come in.

@alepuccetti
Copy link

So we'd still hit the browser limit.

Well, we could have multiple queries in one _msearch, at least would be better than one big request. Chose 6 requests (the minimum between FF and Chrome) will improve the responsiveness, even better detect the browser and tune the number of requests to use to split the queries. Also, this will add the value of easily evaluate which visualization is slower or at least will help narrow it down.

I am not sure to fully understand the streaming idea but it seems to require a bigger redesign. Using multiple _msearch would be a first step.

@alepuccetti
Copy link

I filed an issue about _msearch on the elasticsearch repo (elastic/elasticsearch#27775) which could mitigate the responsiveness problem. However, I am still convinced that kibana dashboards should be able to resolve each visualization separately.

@alepuccetti
Copy link

Update from the _msearch issue.

As was explained to me (elastic/elasticsearch#27775 (comment)) the real culprit is actually the preference parameter.
Is there any way to configure kibana to not use preferences when running _msearch?
Why was decided to use this configuration in the first place?

@chrisdavies
Copy link
Contributor

This sounds a bit like the head-of-line blocking problem. We have a batch of independent requests being held up by the slowest request. It seems to me that there is already a standard solution to this problem: http/2.

If we had an http/2 endpoint, we could possibly write our clients in the same way we'd write if we weren't optimizing at all. No manual batching or msearch or anything like that. We'd make data requests as a bunch of independent AJAX calls. Under the hood, in supporting browsers, the http/2 protocol will ensure these get multiplexed. We'd also be able to process responses out of order, which means fast requests will no longer be held up by slow ones.

We should make sure the requests are made in the same order of visualizations, so the first visualizations on the screen should be the first ones to make a request. This is a fairly easy tweak, and should improve perceived time-to-first-visualization.

http/2 requires https. This means anyone using unsecured connections will fall back to vanilla http and will have a degraded experience.

Unfortunately, Elasticsearch doesn't support http/2 yet. Until they do, we have to come up with alternative solutions. It might be worth benchmarking the current approach and comparing it to an http/2 approach (routed through an http/2 compatible proxy).

@chrisdavies
Copy link
Contributor

Talked to @stacey-gammon about this, and she suggested that we put some good instrumentation into Kibana so we can get actual stats on dashboard / visualization load times in the wild.

I think it would also be worth putting a handful of test scenarios together and doing some bench-marking:

  • Current msearch approach
  • Remove msearch, and proxy Elasticsearch access through an http/2-compatible proxy1
  • Unbatched, un-proxied http (might as well measure it)
  • Separate saved searches into their own msearch request

1 This can fall into a head-of-line blocking problem, too, though we should be able to mitigate it in various ways.

@stacey-gammon
Copy link
Contributor Author

Chatted a bit with @epixa today... just want to jot down a note that we can't have a client side only solution if we want to support plugins that want to expose rest APIs.

If we have a client side solution that ships queries to a Kibana server side solution, we can use the same solution for both use cases (client side and rest APIs).

@wylieconlon
Copy link
Contributor

@stacey-gammon I think a lot of the original issues were resolved, should this issue be updated with any remaining issues or closed?

@stacey-gammon
Copy link
Contributor Author

I think it's safe to close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Dashboard Dashboard related features performance release_note:enhancement Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects
None yet
Development

No branches or pull requests

8 participants