Speed up downloads of large reports #4245

andrewjbtw · 2023-10-18T19:30:03Z

When downloading reports of over about 75,000 druids, the download speed diminishes over time. This results in multi-hour download times for large reports.

Example: I started a report download for 500,000 druids this morning. It never finished because my wifi connection dropped two hours later.

I didn't screenshot the speed right away, but I think it started at about 60 KB/sec

After about 15 minutes this was the speed:

And then it kept dropping:

At this point my wifi dropped, about 460,000 rows into the report.

If the report maintained the speed it started with, it still would have taken a while but I think the download would have worked. The drop in speed means the last rows take much longer than the first. I did the same download on Monday and it took over 3 hours.

justinlittman · 2023-10-18T20:41:10Z

I'm fairly sure that deep pagination is problematic for Lucene (and hence Solr). See, for example, https://solr.apache.org/guide/8_11/pagination-of-results.html#performance-problems-with-deep-paging

andrewjbtw · 2023-10-19T04:38:05Z

The report is being generated as if a browser is paging through results? Not through a query that asks for larger numbers of rows?

justinlittman · 2023-10-19T13:26:37Z

If my understanding is correct, behind the scenes the reporter is iterating over pages of results from Solr. This is the standard way of dealing with a large result set (instead of just asking for all of the results in a single request).

There might be some performance gains by increasing the page size when iterating for a download.

refs #4245

justinlittman added a commit that referenced this issue Oct 19, 2023

Increase solr page size for csv download.

7f3f908

refs #4245

justinlittman mentioned this issue Oct 19, 2023

Increase solr page size for csv download. #4246

Merged

justinlittman added a commit that referenced this issue Oct 19, 2023

Increase solr page size for csv download.

eac576e

refs #4245

justinlittman added a commit that referenced this issue Oct 19, 2023

Increase solr page size for csv download.

cb25406

refs #4245

justinlittman self-assigned this Oct 19, 2023

justinlittman added a commit that referenced this issue Oct 19, 2023

Increase solr page size for csv download.

8ada93d

refs #4245

justinlittman removed their assignment Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up downloads of large reports #4245

Speed up downloads of large reports #4245

andrewjbtw commented Oct 18, 2023

justinlittman commented Oct 18, 2023

andrewjbtw commented Oct 19, 2023

justinlittman commented Oct 19, 2023

Speed up downloads of large reports #4245

Speed up downloads of large reports #4245

Comments

andrewjbtw commented Oct 18, 2023

justinlittman commented Oct 18, 2023

andrewjbtw commented Oct 19, 2023

justinlittman commented Oct 19, 2023