[Searchable Snapshots / Segment Replication] Custom query routing for performance improvements #7436

kotwanikunal · 2023-05-04T22:32:35Z

Is your feature request related to a problem? Please describe.
Searchable snapshots enable querying of indices stored within the snapshots on repositories by only fetching data needed by the query on-demand. It utilizes a file cache for enabling downloads and tracking frequently used pieces of data called as blocks. Queries can have a better latency if the same shard is queried for repeated/similar queries since the cache efficiency will be higher.

Segment replication provides an alternative replication mechanism between nodes by copying Lucene segment files from the primary shard to its replicas. The indexing requests are first processed by the primary shard, and can provide better consistency if data is queried from the primary shard.

As performance improvements, we would like to achieve the following -

Route queries to primary shards for better query consistency in case of segment replication enabled indices
Route queries to specific shards to maximize cache efficiency for searchable snapshot indices

Describe the solution you'd like
Enable custom routing of queries to maximize performance for the above use cases.
As a first step, additional preferences for query routing were added as a part of #7375

Describe alternatives you've considered

N/A

Additional context

kotwanikunal · 2023-05-17T00:25:45Z

Why is the default mechanism not sufficient for SegRep/Searchable Snapshots?

Searchable snapshots enable querying of indices stored within the snapshots on repositories by only fetching data needed by the query on-demand. It utilizes a file cache for enabling downloads and tracking frequently used pieces of data called as blocks. Queries can have a better latency if the same shard is queried for repeated/similar queries since the cache efficiency will be higher.

Segment replication provides an alternative replication mechanism between nodes by copying Lucene segment files from the primary shard to its replicas. The indexing requests are first processed by the primary shard. When the primary is refreshed it creates new Lucene segments and opens up a reader to make the newly indexed documents searchable. An async event is then performed that copies the newly created segments to replicas over the network. Only once this process is complete are the documents searchable on replicas. The time it takes for this to complete is referred to as replication lag, and depending on cluster configuration the replication lag may be unacceptable for sensitive queries. Routing search requests to primaries can provide better read-after-write consistency and avoid the replication lag for sensitive queries.

Preferences supported by routing which need to be weighed in for custom routing

Adaptive Replica Selection
1. Adaptive mechanism which takes in the following constraints:
  1. Response time of prior requests between the coordinating node and the eligible node
  2. How long the eligible node took to run previous searches
  3. Queue size of the eligible node’s search threadpool
2. Ranks the nodes based on the above criteria and sorts the shard routing as per node rankings
Awareness Attributes
1. Takes in configured awareness attributes and ranks shards based on the awareness attributes
Weighted Shard Routing
1. To control the distribution of search or HTTP traffic, you can use the weights per awareness attribute to control the distribution of search or HTTP traffic across zones. This is commonly used for zonal deployments, heterogeneous instances, and routing traffic away from zones during zonal failure.
2. Assigns weight per awareness attribute and orders shards as per the assigned weights

These preferences are bypassed when one of the following preferences are set at query time:

Specific Shards
Specific Nodes
Primary/Primary First
Replica/Replica First
Local

Test/Performance Considerations

Too many requests routed to a single node (SegRep/Searchable Snapshots)
Primary shards being overloaded by both indexing and querying operations (SegRep)

Proposed Solution

Segment Replication

For segment replication, the client should be able to route request to the primary shard for faster consistency.
This can be achieved by adding in the _primary preference for search queries.
Pros:
- Simple solution
- Client can specify the preference when needed
Cons:
- Will cause the nodes with primaries to work harder since both the index and querying requests will be routed to the same nodes/shards.

Added as a part of Add back primary shard preference for queries #7375

Searchable Snapshots

To enable high file cache efficiency, we need to route the requests for a particular shard in the index repeatedly to the same shard. This can be achieved using the following solutions, where solution 1 and 2 can be a precursor to Solution 3 -

Use primary as the default preference

The default routing of requests for searchable snapshot indices should have a _primary preference
This will ensure that only the primaries answer the queries for that particular shard, leading to a high cache efficiency
Pros:
- Simple, straightforward solution
- Can achieve high cache efficiency
Cons:
- User unaware of under the hood changes. Skips all the adaptive/weighted routed settings which the user might have enabled
- A hybrid setup (data + search) node might lead to node utilization imbalance causing a hot node if the primaries for searchable snapshot reside on that node and primaries for local indices are also served by the same node (indexing + searching)

2. Alternative: Hash the search request, use the hash key as string preference

The string based preference key, which can be supplied as a search preference, provides a stickiness to particular shards and is used in cases where a session needs to be maintained for search operations
The query will be routed to one copy of each shard, which may or may not be primary, and the shard that is chosen will be a function of the string that is supplied as a preference. (Same key will lead to same shard choices)
Utilizing this approach, we can ideally create a hash from the user request, and implicitly supply that string as the query preference, which would lead to same queries being served by same shards.
Pros:
- Simple solution
- Requests will be evenly distributed across shards
- Repeated queries will have a high cache efficiency
Cons:
- Any minor changes in the query will lead to a different shard preference, which will cause cache inefficiency
- Implicit nature will lead to issues with user supplid config for adaptive/weighted routing settings

Add in an adaptive algorithm with support for filecache stats

Introduce a new adaptive algorithm which can rank the nodes based on the factors mentioned above () and add in additional attributes like cache size on the node, free space within the cache (from the node stats API) to make better decisions on where the request should be routed
This might lead to additional repository calls but it will be able to balance out the requests evenly (may not have as high a cache efficiency)
Pros:
- Reduced likelihood of hot nodes or hot shards
- Can function with other adaptive attributes which the user has currently set
Cons:
- Might not have as high a cache efficiency as Solution A and can cause queries to have higher latency based on the adaptive nature
- Adaptive algorithm needs to be tuned and designed correctly to correctly weigh in the node stats for file cache while also preventing unnecessary block fetches
- Can cause thrashing since a block might be redownloaded across nodes, kicking some other blocks for a preferred shard out

Moving ahead with Solution 1 for Searchable Snapshots as the next step.

kotwanikunal · 2023-05-17T00:32:47Z

Created an issue for the pending task: #7593

kotwanikunal added enhancement Enhancement or improvement to existing feature or request untriaged labels May 4, 2023

kotwanikunal self-assigned this May 4, 2023

kotwanikunal mentioned this issue May 17, 2023

[Searchable Snapshots] Use _primary as the default search preference for searchable snapshot indices #7593

Closed

kotwanikunal closed this as completed May 17, 2023

kotwanikunal mentioned this issue May 18, 2023

Change default search preference to _primary for searchable snapshot indices #7628

Merged

6 tasks

anasalkouz removed the untriaged label Sep 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Searchable Snapshots / Segment Replication] Custom query routing for performance improvements #7436

[Searchable Snapshots / Segment Replication] Custom query routing for performance improvements #7436

kotwanikunal commented May 4, 2023 •

edited

Loading

kotwanikunal commented May 17, 2023

kotwanikunal commented May 17, 2023

[Searchable Snapshots / Segment Replication] Custom query routing for performance improvements #7436

[Searchable Snapshots / Segment Replication] Custom query routing for performance improvements #7436

Comments

kotwanikunal commented May 4, 2023 • edited Loading

kotwanikunal commented May 17, 2023

Why is the default mechanism not sufficient for SegRep/Searchable Snapshots?

Preferences supported by routing which need to be weighed in for custom routing

Test/Performance Considerations

Proposed Solution

Segment Replication

Searchable Snapshots

kotwanikunal commented May 17, 2023

kotwanikunal commented May 4, 2023 •

edited

Loading