-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Result streaming on the discover tab #15662
Comments
Splitting up the requests was always a bit hacky and significantly complicated the codebase. We could explore some options if it affects a large number of users, but this is the first I've heard of this problem. Out of curiosity, what takes longer for you, the query itself or the aggregation for building the date histogram? You can grab the request Discover is sending from your browser's dev tool and play around with it in Kibana's dev tool app to get some timings. |
I've done some profiling and if I'm interpreting this correctly the DateHistogramAggregator is the thing that is slow here, specifically the collect phase: |
Thanks, I'll have to think about this some more and do some additional testing. Might take a little while with the holidays coming up. Just a guess, but if the date histogram is the bottleneck it might speed things up if you select a larger date interval instead of relying on whatever |
Turns out this might be solvable in ES. |
I've done some basic tests and not found any issues to do with speed due to the timezone settings in Kibana. I can do some more thorough tests on this on Tuesday. My issue is that a query to view the discover histogram on a billion documents is taking about 5 minutes (2 node cluster) and timing out. If this query is split up into chunks (indexes ala Kibana 4) then it doesn't time-out. Not sure if the timezone issue is causing the query to take a long time - or if a histogram query over that many documents is likely to take that long. As I say I will test more thoroughly on Tuesday thanks. |
I've done some testing and the timezone is definitely one factor in this, but the query still times out and takes too long to be useful: Query timing: This ticket is a request for the return of the Histogram loading behaviour from Kibana 4 - loading the data one index at a time - which meant that we did not get timeouts. With this behaviour we would have started to see results after just a few seconds and then the data would fill up as it completes. As it is this new behaviour makes the Discover tab mostly unusable to us. The ability to disable it in #17065 is helpful - but not really the solution we were looking for. |
Thanks for the additional info @jgough. Out of curiosity, how slow is the query if you remove the date histogram agg completely? I agree some form of progressive loading might be nice for slow queries. Interval based patterns and field stats are going away so we couldn't implement it the same way as in Kibana 4, but we could still use simple date ranges to break up the query. Also keep in mind in the short term you can increase the timeout settings in kibana.yml if you don't mind waiting for the slow queries to load. |
Just managed to get a few more quick benchmarks. Note this is a slightly different time range so the results are not quite comparable to above With histogram agg (Europe/London): >~120s (timed out) |
We have a similar concern with the loading of the Discover tab. We have a 2B+ document and growing logging cluster. A search of 7 days of logs often matches over 1B results. While we don't get timeouts, the discover tab loads pretty slow, about 30s when caches are warm. I tested the query without aggs and it does cut the time in about half, but Kibana doesn't give an option to remove the histogram from the Discover tab. Changing the timezone doesn't make much difference since we are on a later version of ES that solves the timezone problem. All this loading is blocking and you get nothing until the entire search is done. If I search an individual day that matches ~250M documents I get a 6s load time (with caches warm). I would think you could get a much better user experience using progressive and/or parallel loading. If I took that same 7 day search and broke it up into 7 1 day searches, the total query time would be higher, but kibana could start showing results on the screen sooner. There would also be the opportunity to parallelize the requests which would bring the total time to full results down much lower. We are running Elasticsearch and Kibana 7.4 on AWS. |
Closing because |
In version 4 of Kibana, the Discover tab was filled by doing multiple
_msearch
HTTP requests, one to each index matching the index pattern.In version 6 now a single _msearch request is now done to display the Discover tab.
This meant that when viewing the Discover tab in Kibana 4, there was one request per index which, while inefficient, performed much better on slower Elasticsearch hardware. Queries over billions of documents in a hundred indexes would load progressively - slowly - 100 small HTTP requests but they would not timeout.
With Kibana 6 and a single
_msearch
request there is no progressive loading, and the single query for everything needs to complete and return before showing any data - 1 big HTTP request. On slower Elasticsearch instances this means that it can often timeout on large collections of data.It would seem very useful to have a toggle to bring back this progressive loading with multiple queries, one per index. This currently is a blocker for us upgrading to Kibana 6.
I discussed this issue previously here:
https://discuss.elastic.co/t/discover-tab-timing-out-with-single-msearch-request/110325
The text was updated successfully, but these errors were encountered: