[DISCUSS] Add back preference for searching _primaries or _replicas #6046

mattweber · 2023-01-27T17:50:50Z

With segment replication and the potential lag of replicas, it would be nice to support search routing for _primaries and _replicas. With this preference we can keep traffic off our indexers and/or only on primaries to ensure the most up-to-date data. This was previously supported in Elaslticsearch <= 6x but removed for 7x+. I imagine it wouldn't be too hard to revert that PR (it is pre-license change) to re-enable this support. What do you think?

The text was updated successfully, but these errors were encountered:

mattweber · 2023-01-27T17:51:32Z

cc @reta @dblock I can work on this if you would consider merging it

andrross · 2023-01-27T18:32:54Z

@mch2 What do you think about this idea?

andrross · 2023-01-27T18:42:10Z

We have some longer term plans regarding separating readers and writers, but this seems like low hanging fruit for a useful capability today.

mch2 · 2023-01-27T19:48:31Z

Love this idea, @mattweber Thanks for raising this! We had been considering the possibility of a priority/urgent request param to fwd requests only to primaries given the added replication lag, but having an option to control routing to both primaries & replicas is even better. This would be a great addition.

anasalkouz · 2023-01-27T19:52:27Z

@mattweber Thanks for raising this idea. Feel free to pick up this and submit PR. I will assigned it to you.

nknize · 2023-01-27T20:43:30Z

@mattweber I remember the conversation to remove that feature in 7x+ to avoid trappy hot spots. It was a controversial change for sure.

I'm all for reverting as a simple mechanism, but maybe just enable it for indexes using segrep?

Another idea we had been toying w/ as @mch2 mentioned was to add something like priority={CRITICAL | HIGH | MEDIUM | LOW} as an API mechanism to help triage query routing and guide ARS. CRITICAL queries will only be executed on the primary and pre-empt other lower priority queries, lower priority queries can run on primaries or replicas but they're ordered and pre-empted by priority. Perhaps that could be a follow on enhancement.

andrross · 2023-01-27T20:50:27Z

With this preference we can keep traffic off our indexers

@mattweber Just curious how far this feature will get you towards this goal, given that the default behavior is to allocate primary and replica shards evenly across all nodes. Do you have any mechanism to change the shard allocation?

mattweber · 2023-01-27T21:24:50Z

@andrross you are right by default and where nodes have more than one shard this still won't 100% solve things. I main use-case is currently querying the primary for most up-to-date data to validate document counts after a load.

mattweber · 2023-01-27T21:28:51Z

@nknize sure I can do that if you prefer. IMO, I guess a trappy hot spot is still possible even with segrep so it might be better to just add it back and document the potential host spot?

nknize · 2023-01-27T22:04:05Z

main use-case is currently querying the primary for most up-to-date data to validate document counts after a load.

This is the purpose of the priority query idea mentioned by @mch2. We haven't concretely pursued this yet because we haven't quantified (average) consistency metrics yet to determine if it's a mechanism worth introducing or if it will just exacerbate hot spots. (@mch2 maybe you have more info here). In essence, mechanisms like preference are really just "hacks" to push the system closer to "strongly consistent" than it's "eventually consistent" nature.

nknize · 2023-01-27T22:07:45Z

... add it back and document the potential host spot?

I lean towards this as well. Some folks are less accommodating, though, and strongly oppose trappy leniency so maybe we open a PR to do this (revert the change) and give sufficient feedback time for strong viewpoints to veto and offer an alternative?

mattweber · 2023-01-27T22:27:24Z

@nknize sure I will work on the revert and open the PR for others to review.

I believe we should not handicap people because others might do something bad with it. We can have sane defaults and documentation about potential issues. When someone decides not to follow those and change the value, it is really on them if they run into the issue we told them could happen. This is one of the things that has turned me off of Elasticsearch, too much hand holding and blocking advanced settings/features because someone might do something dumb with it.

nknize · 2023-01-27T22:42:36Z

This is one of the things that has turned me off of Elasticsearch, too much hand holding and blocking advanced settings/features because someone might do something dumb with it.

💯 * 1000

mch2 · 2023-01-27T23:00:32Z

We haven't concretely pursued this yet because we haven't quantified (average) consistency metrics yet to determine if it's a mechanism worth introducing or if it will just exacerbate hot spots. (@mch2 maybe you have more info here).

These metrics are proportional to the replication delay that will vary based on network performance, shard size, refresh times and shard/replica/node counts. With that said we've tested with configurations and seen ms/second delays and others in minutes. Will document all this to help guide, but the feature in general is going to require considering all these factors to come up with appropriate settings and if options like this search routing make sense to use.

Related, am also working on #4478 to provide an optional backpressure guardrail to prevent replicas from falling too far behind.

mattweber · 2023-01-27T23:08:14Z

@mch2 FYI, just used it on a 2.8TB primary index with 155M docs with amazing results. Very happy with it so far.

mattweber · 2023-01-30T17:31:09Z

I have the revert done but get a failing test with REPLICA_FIRST preference tests so I need to dig into that a bit more before I open the PR. Hopefully I will have something ready later this week.

dblock · 2023-01-30T21:58:03Z

@mch2 FYI, just used it on a 2.8TB primary index with 155M docs with amazing results. Very happy with it so far.

but does it work for large indexes? :)

anasalkouz · 2023-01-31T22:42:27Z

but does it work for large indexes? :)

I think there are many factors to decide if SegRep is a good option or not like # of primary shards, # of replicas, # of nodes, index size, workload ..etc, we expect this to perform even better once we integrate SegRep with Remote Storage.
@mch2 we should have a clear guide for users on when we recommend to enable SegRep.

mch2 · 2023-03-07T20:13:58Z

Hi @mattweber checking in here. I'd be happy to help out with digging into the failing test if you'd like to throw up a draft of your revert?

nknize · 2023-03-16T19:35:04Z

@mattweber I'm also happy to help move this forward if you'd like. Do you have a WIP branch we could collaborate? I'd like to move forward with this sooner than later and happy to create one but I think you could knock out rev1 pretty quickly.

shwetathareja · 2023-04-25T14:08:09Z

ping @mattweber are you planning to take it forward? else happy to help.

kotwanikunal · 2023-04-28T18:30:06Z

@shwetathareja / @nknize / @mattweber : We have a similar use case for searchable snapshots where we would like to route requests to particular shards to maximize cache efficiency.
I am diving into this issue to see if we can achieve both the goals with a single/combined solution.
I will keep this issue up to date with my findings.

@anasalkouz / @andrross

shwetathareja · 2023-05-02T05:36:01Z

@kotwanikunal As the first step, we can bring this commit and then we can improvise. Thoughts?

kotwanikunal · 2023-05-02T23:18:01Z

@shwetathareja Raised a PR to add the commit back in as is: #7375

anasalkouz assigned mattweber Jan 27, 2023

anasalkouz added distributed framework enhancement Enhancement or improvement to existing feature or request labels Jan 27, 2023

nknize added the discuss Issues intended to help drive brainstorming and decision making label Jan 27, 2023

nknize changed the title ~~Add preference for searching _primaries or _replicas~~ [DISCUSS] Add back preference for searching _primaries or _replicas Jan 27, 2023

mch2 mentioned this issue Mar 7, 2023

[Segment Replication] Update RefreshPolicy.WAIT_UNTIL for replica shards with segment replication enabled to wait for replica refresh #6464

Merged

6 tasks

nknize mentioned this issue Apr 24, 2023

[RFC] Reader and Writer Separation in OpenSearch #7258

Open

kotwanikunal self-assigned this Apr 28, 2023

anasalkouz added this to Concurrent Search May 2, 2023

anasalkouz moved this to In Progress in Concurrent Search May 2, 2023

kotwanikunal mentioned this issue May 2, 2023

Add back primary shard preference for queries #7375

Merged

6 tasks

kotwanikunal closed this as completed in #7375 May 4, 2023

This was referenced May 4, 2023

[Searchable Snapshots / Segment Replication] Custom query routing for performance improvements #7436

Closed

[DOC] Add documentation for new preferences for search requests opensearch-project/documentation-website#4034

Closed

kotwanikunal mentioned this issue May 17, 2023

[Searchable Snapshots] Use _primary as the default search preference for searchable snapshot indices #7593

Closed

anasalkouz moved this from In Progress to Done in Concurrent Search May 18, 2023

cwperks mentioned this issue Jun 30, 2023

Set ConfigurationLoader mget request preference to _primary for strong consistency opensearch-project/security#2903

Closed

3 tasks

mch2 mentioned this issue Jul 14, 2023

[DISCUSS - Segment Replication] SegRep consistency limitations #8700

Closed

dreamer-89 mentioned this issue Sep 13, 2023

[DOC] Segment replication changes with remote store GA and 2.10 release opensearch-project/documentation-website#5011

Closed

4 tasks

anasalkouz removed this from Concurrent Search Dec 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DISCUSS] Add back preference for searching _primaries or _replicas #6046

[DISCUSS] Add back preference for searching _primaries or _replicas #6046

mattweber commented Jan 27, 2023

mattweber commented Jan 27, 2023

andrross commented Jan 27, 2023

andrross commented Jan 27, 2023

mch2 commented Jan 27, 2023 •

edited

Loading

anasalkouz commented Jan 27, 2023

nknize commented Jan 27, 2023 •

edited

Loading

andrross commented Jan 27, 2023

mattweber commented Jan 27, 2023

mattweber commented Jan 27, 2023

nknize commented Jan 27, 2023

nknize commented Jan 27, 2023

mattweber commented Jan 27, 2023

nknize commented Jan 27, 2023

mch2 commented Jan 27, 2023 •

edited

Loading

mattweber commented Jan 27, 2023

mattweber commented Jan 30, 2023

dblock commented Jan 30, 2023

anasalkouz commented Jan 31, 2023

mch2 commented Mar 7, 2023

nknize commented Mar 16, 2023

shwetathareja commented Apr 25, 2023

kotwanikunal commented Apr 28, 2023

shwetathareja commented May 2, 2023

kotwanikunal commented May 2, 2023

[DISCUSS] Add back preference for searching _primaries or _replicas #6046

[DISCUSS] Add back preference for searching _primaries or _replicas #6046

Comments

mattweber commented Jan 27, 2023

mattweber commented Jan 27, 2023

andrross commented Jan 27, 2023

andrross commented Jan 27, 2023

mch2 commented Jan 27, 2023 • edited Loading

anasalkouz commented Jan 27, 2023

nknize commented Jan 27, 2023 • edited Loading

andrross commented Jan 27, 2023

mattweber commented Jan 27, 2023

mattweber commented Jan 27, 2023

nknize commented Jan 27, 2023

nknize commented Jan 27, 2023

mattweber commented Jan 27, 2023

nknize commented Jan 27, 2023

mch2 commented Jan 27, 2023 • edited Loading

mattweber commented Jan 27, 2023

mattweber commented Jan 30, 2023

dblock commented Jan 30, 2023

anasalkouz commented Jan 31, 2023

mch2 commented Mar 7, 2023

nknize commented Mar 16, 2023

shwetathareja commented Apr 25, 2023

kotwanikunal commented Apr 28, 2023

shwetathareja commented May 2, 2023

kotwanikunal commented May 2, 2023

mch2 commented Jan 27, 2023 •

edited

Loading

nknize commented Jan 27, 2023 •

edited

Loading

mch2 commented Jan 27, 2023 •

edited

Loading