Compatibility with segment replication #936

dreamer-89 · 2023-06-29T02:10:07Z

Summary

With 2.9.0 release, there are lot of enhancements going in for segment replication[1][2] feature (went GA in 2.7.0), we need to ensure different plugins are compatible with current state of this feature. Previously, we ran tests on plugin repos to verify this compatibility but want plugin owners to be aware of these changes so that required updates (if any) can be made. With 2.10.0 release, remote store feature is going GA which internally uses SEGMENT replication strategy only i.e. it enforces all indices to use SEGMENT replication strategy. So, it is important to validate plugins are compatible with segment replication feature.

What changed

1. Refresh policy behavior

RefreshPolicy.IMMEDIATE will only refresh primary shards but not replica shards immediately. Instead post refresh, primary will start a round of segment replication to update the replica shard copies leading to eventual consistency.
RefreshPolicy.WAIT_UNTIL ensures the indexing operation is searchable in your cluster i.e. RAW (Read after write guarantee). With segment replication, this guarantee is not promised due to delay in replica shared updates from asynchronous background refreshes.

2. Refresh lag on replicas

With segment replication, there is inherent delay in documents to be searchable on replica shard copies. This is due to the fact that replica shard copies over data (segment) files from primary. Thus, compared to document replication, there will be on average increase in amount of time the replica shards are consistent with primaries.

3. System/hidden indices support

With opensearch-project/OpenSearch#8200, system and hidden indices are now supported with SEGMENT replication strategy. We need to ensure there are no bottlenecks which prevents system/hidden indices with segment replication.

Next steps

With segment replication strong reads are not guaranteed. Thus, if the plugin needs strong reads guarantees specially as alternative to change in behavior of refresh policy and lag on replicas (point 1 and 2 above), we need to update search requests to target primary shard only. With opensearch-project/OpenSearch#7375, core now supports primary shards only based search. Please follow documentation for examples and details

Open questions

In case of any questions or issues, please post it in core issue

Reference

[1] Design

[2] Documentation

The text was updated successfully, but these errors were encountered:

dreamer-89 · 2023-06-29T19:51:27Z

Request owners to add v2.9.0 label on this issue.

kaituo · 2023-07-06T19:04:23Z

@dreamer-89 Our AD indices need RAW (e.g., we want users to be able to consume AD results as soon as possible and send out an alert if anomalies found). Can we just add index setting override to make sure our replication.type is Document?

dreamer-89 · 2023-07-11T00:32:04Z

@dreamer-89 Our AD indices need RAW (e.g., we want users to be able to consume AD results as soon as possible and send out an alert if anomalies found). Can we just add index setting override to make sure our replication.type is Document?

Hi @kaituo,

Thanks for sharing your use case. You can override system/hidden indices to use DOCUMENT replication but then we are not truely testing the system indices with segment replication. This validation is important because going forward SEGMENT will be the only replication strategy supported for certain configurations (e.g. Remote store at cluster level). Thus, there is no way other than to validate indices actually created with SEGMENT replication.

Will using primary shard based searching as mentioned in the issue description solve your use case ?

kaituo · 2023-07-11T00:47:39Z

@dreamer-89 It can solve part of the problems. But the changes are wider and can easily miss edge cases (we have to scan all of our frontend code and public APIs). Also, we open our result indices for user to query. We will need to come up with a UX to warn users to search primary shards only. These changes cannot be done in 1 or 2 days and can be weeks of testing, UX design, and documentation process.

What is remote store at cluster level?

Also, will segment replication be the default starting from version 2.10? If so, can we discuss the decision further as it’s a breaking change that might affect both internal and external plugins. These plugins might not be familiar with the specifics of segment replications and may require strong reads. (cc @xinlamzn @anirudha @macohen @vamshin )

dreamer-89 · 2023-07-11T01:25:34Z

@dreamer-89 It can solve part of the problems. But the changes are wider and can easily miss edge cases (we have to scan all of our frontend code and public APIs). Also, we open our result indices for user to query. We will need to come up with a UX to warn users to search primary shards only. These changes cannot be done in 1 or 2 days and can be weeks of testing, UX design, and documentation process.

What is remote store at cluster level?

Remote store[1] is a storage feature where OpenSearch cluster data is persisted on remote store for better durability. This feature is currently experimental and slated to go GA in 2.10.0.

By Remote store at cluster level, I meant remote store feature enabled at cluster level (via cluster level setting), which means all indices created, are by default remote store enabled. Due to its better durability guarantees, it will be the preferred solution for all type of indices including system/plugin indices. It will be de-facto storage model for clusters having hardware constrained nodes.

Also, will segment replication be the default starting from version 2.10? If so, can we discuss the decision further as it’s a breaking change that might affect both internal and external plugins. These plugins might not be familiar with the specifics of segment replications and may require strong reads. (cc @xinlamzn @anirudha @macohen @vamshin )

Remote store today only works with SEGMENT replication and not with DOCUMENT. Thus, when remote store is set at cluster level all indices are by default using segment replication. So, yes SEGMENT will be default when used with remote store enabled index (or cluster level setting).

[1] https://opensearch.org/docs/latest/tuning-your-cluster/availability-and-recovery/remote-store/index/

kaituo · 2023-07-11T18:36:25Z

@dreamer-89 I'll follow opensearch-project/job-scheduler#407 as a temporary solution until opensearch-project/OpenSearch#8536 is fixed.

ohltyler · 2023-07-13T21:54:01Z

With #948 merged, can this issue be resolved, and a new one created for 2.10 when it is clear what more will be needed from plugins then?

kaituo · 2023-07-13T22:03:39Z

yes, closed the issue

dreamer-89 added enhancement New feature or request untriaged labels Jun 29, 2023

dreamer-89 mentioned this issue Jun 29, 2023

[Meta] Validate plugins compatibility with segment replication opensearch-project/OpenSearch#8211

Closed

37 tasks

dreamer-89 mentioned this issue Jul 3, 2023

Compatibility with segment replication #937

Closed

gaiksaya added the v2.9.0 label Jul 3, 2023

kaituo removed the untriaged label Jul 6, 2023

kaituo self-assigned this Jul 6, 2023

This was referenced Jul 11, 2023

Enforce DOCUMENT Replication for AD Indices and Adjust Primary Shards #948

Merged

[backport to 2.x] Enforce DOCUMENT Replication for AD Indices #949

Merged

dreamer-89 mentioned this issue Jul 12, 2023

Add support for Segrep to Dashboard saved object service opensearch-project/OpenSearch-Dashboards#4522

Closed

kaituo closed this as completed Jul 13, 2023

dreamer-89 mentioned this issue Aug 21, 2023

Compatibility with segment replication #989

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compatibility with segment replication #936

Compatibility with segment replication #936

dreamer-89 commented Jun 29, 2023 •

edited

Loading

dreamer-89 commented Jun 29, 2023

kaituo commented Jul 6, 2023

dreamer-89 commented Jul 11, 2023 •

edited

Loading

kaituo commented Jul 11, 2023 •

edited

Loading

dreamer-89 commented Jul 11, 2023 •

edited

Loading

kaituo commented Jul 11, 2023

ohltyler commented Jul 13, 2023

kaituo commented Jul 13, 2023

Compatibility with segment replication #936

Compatibility with segment replication #936

Comments

dreamer-89 commented Jun 29, 2023 • edited Loading

Summary

What changed

1. Refresh policy behavior

2. Refresh lag on replicas

3. System/hidden indices support

Next steps

Open questions

Reference

dreamer-89 commented Jun 29, 2023

kaituo commented Jul 6, 2023

dreamer-89 commented Jul 11, 2023 • edited Loading

kaituo commented Jul 11, 2023 • edited Loading

dreamer-89 commented Jul 11, 2023 • edited Loading

kaituo commented Jul 11, 2023

ohltyler commented Jul 13, 2023

kaituo commented Jul 13, 2023

dreamer-89 commented Jun 29, 2023 •

edited

Loading

dreamer-89 commented Jul 11, 2023 •

edited

Loading

kaituo commented Jul 11, 2023 •

edited

Loading

dreamer-89 commented Jul 11, 2023 •

edited

Loading