[RFC] Add support for read-ahead doc values prefetch #16727
Labels
enhancement
Enhancement or improvement to existing feature or request
Search:Performance
Search:Remote Search
Search:Searchable Snapshots
Is your feature request related to a problem? Please describe
A searchable snapshot index reads data from a snapshot repository on demand at search time rather than downloading all index data to cluster storage at restore time by using block based storage mechanism. This enables fetching only the parts of the Lucene IndexInput files accessed by the query from the snapshot within the repository instead of downloading the entire files on disk, reducing the total storage requirement per node.
However, when it comes to aggregation heavy workloads, fetching data blocks on-demand can introduce latency due to the multiple I/O operations required to fetch the necessary data for each query.
To improve the performance in such scenarios, we can introduce a read-ahead prefetch mechanism that proactively fetches next N blocks in anticipation of demand when the current block is accessed, thus reducing the I/O bottlenecks.
Describe the solution you'd like
The read-ahead prefetch mechanism leverages the sequential access behavior typically seen in the aggregation queries. Aggregation queries access .dvd files to retrieve doc values required for processing. Such queries typically touch all documents matching the query clause, resulting in reads across multiple blocks of .dvd files. The compactness of .dvd files relative to the shard size increases the probability of accessing contiguous blocks. For instance, we’ve seen that for a shard of 48.5 GB with 620 million documents, the .dvd file size was about 8.05 GB (approximately 1031 blocks if we consider 8MB blocks). Given this compactness, most matching docs are likely to correspond to adjacent or nearby blocks. This access pattern is ideal for read-ahead prefetching since the next set of blocks needed by the query can often be anticipated based on the current block being processed. This allows us to reduce latency and I/O wait times significantly, as the blocks will likely already be in local store by the time the query needs them.
Since aggregation queries make use of underlying lucene .dvd files, for now, we can start with integrating read-ahead prefetch for .dvd files, but it can easily be extended to other files where we expect such contiguous access pattern, for e.g., the kNN exact search use case without pre-filtering where it scans the .vec files in sequential manner. We would also need to consider dvd accesses within .cfs files in case the compound format is enabled.
I would work on POC for this and gather performance numbers after testing read-ahead for aggregation queries for searchable snapshots. Meanwhile, our internal testing showed that with read-ahead prefetch, aggregation query times are reduced by upto 60-70%.
Related component
Search:Searchable Snapshots
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: