-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Searchable Snapshots / Segment Replication] Custom query routing for performance improvements #7436
Comments
Why is the default mechanism not sufficient for SegRep/Searchable Snapshots?Searchable snapshots enable querying of indices stored within the snapshots on repositories by only fetching data needed by the query on-demand. It utilizes a file cache for enabling downloads and tracking frequently used pieces of data called as blocks. Queries can have a better latency if the same shard is queried for repeated/similar queries since the cache efficiency will be higher. Segment replication provides an alternative replication mechanism between nodes by copying Lucene segment files from the primary shard to its replicas. The indexing requests are first processed by the primary shard. When the primary is refreshed it creates new Lucene segments and opens up a reader to make the newly indexed documents searchable. An async event is then performed that copies the newly created segments to replicas over the network. Only once this process is complete are the documents searchable on replicas. The time it takes for this to complete is referred to as replication lag, and depending on cluster configuration the replication lag may be unacceptable for sensitive queries. Routing search requests to primaries can provide better read-after-write consistency and avoid the replication lag for sensitive queries. Preferences supported by routing which need to be weighed in for custom routing
These preferences are bypassed when one of the following preferences are set at query time:
Test/Performance Considerations
Proposed SolutionSegment Replication
Searchable SnapshotsTo enable high file cache efficiency, we need to route the requests for a particular shard in the index repeatedly to the same shard. This can be achieved using the following solutions, where solution 1 and 2 can be a precursor to Solution 3 -
2. Alternative: Hash the search request, use the hash key as string preference
Moving ahead with Solution 1 for Searchable Snapshots as the next step. |
Created an issue for the pending task: #7593 |
Is your feature request related to a problem? Please describe.
Searchable snapshots enable querying of indices stored within the snapshots on repositories by only fetching data needed by the query on-demand. It utilizes a file cache for enabling downloads and tracking frequently used pieces of data called as blocks. Queries can have a better latency if the same shard is queried for repeated/similar queries since the cache efficiency will be higher.
Segment replication provides an alternative replication mechanism between nodes by copying Lucene segment files from the primary shard to its replicas. The indexing requests are first processed by the primary shard, and can provide better consistency if data is queried from the primary shard.
As performance improvements, we would like to achieve the following -
Describe the solution you'd like
Enable custom routing of queries to maximize performance for the above use cases.
As a first step, additional preferences for query routing were added as a part of #7375
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: