-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve dashboard load performance #14750
Comments
Why not as a simple first step, just separate saved searches into their own _msearch request? |
Maybe, but I think that starts us down a path of making too many assumptions, both at what we expect courier to handle (assuming we do this in courier and not have dashboard drive it), what the embeddable types are, and how long they will take. What if someone adds a new embeddable type that takes a long time too? Though embeddables still aren't a first class concept, so maybe I'm still thinking too far into the future. Nothing is simple with courier, and I'm nervous about throwing in more one off code that doesn't fully solve the problem. But, still worthwhile to explore despite my initial misgivings! |
I'm pretty sure that hit count isn't the best metric, but I don't know what it would be. Looking at one of my dashboards, I have a visualization that took 224 ms and it has a hit count of ~400,000 (it is a big number metric). Another one took 2s and it has a hit count of ~15,000 (a data table with 3 aggregations). One problem with a separate _msearch for each embeddable is that large dashboards will generate a lot of requests and I think you'll start to hit browser limits (I think FF is 6 and Chrome is 10). We have a very common dashboard that has 20 normal Kibana visualizations plus 5 TSVB. A crazy idea would be to use one request for all of the visualizations (including TSVB) and then after the first load, figure out which visualizations were really slow, store that information to the dashboard somewhere, and in future requests, split those away. Kind of a self-learning dashboard. |
Interesting. I wonder if aggregation type makes a difference. Agree on the issue with a single separate search per embeddable - we'd need to chunk it up somehow in batches. Definitely an interesting thought re: the self learning dashboard. I worry about that route getting complicated. e.g. you're on a slow network and your dashboard learns to do a single panel per batch (unless we could split out network latency time vs es response time...), how long would it take the dashboard to "unlearn" that and batch them up again when on a fast network. Or your es ends up getting bogged down during a busy part of the day with a lot of requests from various sources - does your dashboard learn quick enough to keep up with that or will it end up falling behind so when the traffic is busy, your still learning to make smaller batches, then by the time the traffic is low, you're dashboard needs to take time to learn to make bigger batches. It sounds like a really interesting experiment, I just worry about maintainability, finding the right algorithm, and how long of an effort that would take. We do have machine learning experts at elastic, but if there was some other metric we could use, we might be able to improve with a simpler method. IMO, the best scenario would be if es implemented streaming, then they would be in charge of figuring out how to batch up the returned responses, not us on the client, and we'd only have to send out a single request for all the data. I wonder what would happen if we put the streaming logic on the kibana server side. The client handles streamed responses, and the server handles querying es. Feels like this would be faster going from Kibana server -> es server rather than kibana client -> es server... but I have no data to back that up. Maybe the Kibana server would end up being a bottleneck with multiple clients if we did it that way. |
With timelion, tsvb, kibana visualizations, and other embeddables, you'd probably want to do the streaming logic on the Kibana server side. |
What about run a maximum number of About my personal experience, we have dashboards doing a lot of aggregations over hundreds of millions of documents and having Kibana responsiveness tied to the slowest one is not ideal. So having multiple |
@alepuccetti, we use really tiny visualizations to pack it in our screen. Of those 20+5 visualizations, 12 of them are visible. So we'd still hit the browser limit. I kind of like the streaming idea, though it is a bigger change. If all requests for visualizations/embeddables were sent as one request to Kibana's backend and then have each of those requests sent individually to ES and stream back the results as they come in. |
Well, we could have multiple queries in one I am not sure to fully understand the streaming idea but it seems to require a bigger redesign. Using multiple |
I filed an issue about |
Update from the As was explained to me (elastic/elasticsearch#27775 (comment)) the real culprit is actually the |
This sounds a bit like the head-of-line blocking problem. We have a batch of independent requests being held up by the slowest request. It seems to me that there is already a standard solution to this problem: http/2. If we had an http/2 endpoint, we could possibly write our clients in the same way we'd write if we weren't optimizing at all. No manual batching or msearch or anything like that. We'd make data requests as a bunch of independent AJAX calls. Under the hood, in supporting browsers, the http/2 protocol will ensure these get multiplexed. We'd also be able to process responses out of order, which means fast requests will no longer be held up by slow ones. We should make sure the requests are made in the same order of visualizations, so the first visualizations on the screen should be the first ones to make a request. This is a fairly easy tweak, and should improve perceived time-to-first-visualization. http/2 requires https. This means anyone using unsecured connections will fall back to vanilla http and will have a degraded experience. Unfortunately, Elasticsearch doesn't support http/2 yet. Until they do, we have to come up with alternative solutions. It might be worth benchmarking the current approach and comparing it to an http/2 approach (routed through an http/2 compatible proxy). |
Talked to @stacey-gammon about this, and she suggested that we put some good instrumentation into Kibana so we can get actual stats on dashboard / visualization load times in the wild. I think it would also be worth putting a handful of test scenarios together and doing some bench-marking:
1 This can fall into a head-of-line blocking problem, too, though we should be able to mitigate it in various ways. |
Chatted a bit with @epixa today... just want to jot down a note that we can't have a client side only solution if we want to support plugins that want to expose rest APIs. If we have a client side solution that ships queries to a Kibana server side solution, we can use the same solution for both use cases (client side and rest APIs). |
@stacey-gammon I think a lot of the original issues were resolved, should this issue be updated with any remaining issues or closed? |
I think it's safe to close this. |
Currently we send out a request for every embeddable on a dashboard in a single _msearch. This means one or two slow visualizations or saved searches can bog down an entire dashboard.
I'd like to explore ways to improve the performance and split up the requests.
One idea is to do a single _msearch for all requests but only request the hit count. Then make subsequent requests based off batching the individual requests on their hit count.
I'm not sure if hit count is the right metric to use though. It works in my sample cases where saved searches take the longest, and their hit count is 500, but it could be that on some datasets creating an aggregation over a long time span with a ton of data would only return a single hit count, yet take a long time to complete.
cc @pickypg - do you know if hit count correlates with query performance? Or am I off base? Maybe it's a combination of hit count plus index size (not sure if there is a way to get that information quickly).
Another idea thrown around was to use the scroll API to get data chunked by time, not visualization, and display the intermediate results. I'm not sure how useful this would be to people though. Would partial data be at all worthwhile to see while a slow query finishes loading, or would people find it more useful to see visualizations complete one at a time (or one group at a time), but with the full data.
somewhat related: #7215
cc @elastic/kibana-sharing
The text was updated successfully, but these errors were encountered: