-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up downloads of large reports #4245
Comments
I'm fairly sure that deep pagination is problematic for Lucene (and hence Solr). See, for example, https://solr.apache.org/guide/8_11/pagination-of-results.html#performance-problems-with-deep-paging |
The report is being generated as if a browser is paging through results? Not through a query that asks for larger numbers of rows? |
If my understanding is correct, behind the scenes the reporter is iterating over pages of results from Solr. This is the standard way of dealing with a large result set (instead of just asking for all of the results in a single request). There might be some performance gains by increasing the page size when iterating for a download. |
When downloading reports of over about 75,000 druids, the download speed diminishes over time. This results in multi-hour download times for large reports.
Example: I started a report download for 500,000 druids this morning. It never finished because my wifi connection dropped two hours later.
I didn't screenshot the speed right away, but I think it started at about 60 KB/sec
After about 15 minutes this was the speed:
And then it kept dropping:
At this point my wifi dropped, about 460,000 rows into the report.
If the report maintained the speed it started with, it still would have taken a while but I think the download would have worked. The drop in speed means the last rows take much longer than the first. I did the same download on Monday and it took over 3 hours.
The text was updated successfully, but these errors were encountered: