[Segment Replication] Support for mixed cluster versions (Rolling Upgrade) #3881

Bukhtawar · 2022-07-13T14:23:44Z

Is your feature request related to a problem? Please describe.
Once we enable segment based replication for an index, we would need to solve version upgrades leading to mixed version clusters.
We can have an index with primary on OS 3.x whose replica might be on OS 4.x this itself is not a problem since next major versions have Lucene wire compatibility with their previous major version. At some point during the upgrade process its possible that the primary moves to OS 4.x whose replica sits on OS 3.x. Now when segrep tries to replicate segments to a replica sitting on host with lower version the replica would fail to identify segment as it might be created on a higher Lucene version which it probably doesn't know.
During a rolling upgrade, primary shards assigned to a node running the new version cannot have their replicas on the old version.

This is not a problem in document based replication as segments are independently created on both version nodes and lets say a replica fails on older version, it cannot be assigned on node which has a lower version than its primary, neither can shards move from a higher version node to a lower version node

Describe the solution you'd like
We should support mixed version cluster with segrep. One way to achieve it is to create segments on a lower version even on nodes with higher version while the upgrade is in-progress i.e cluster should operate in a BWC mode.

Describe alternatives you've considered
Segregate ALL primaries on one set replicas on a different set of nodes. Upgrade the set of nodes hosting replica and then upgrade the set of nodes hosting primary. However this can cause primary only nodes to be overloaded in a homogeneous cluster setup.

Additional context
Add any other context or screenshots about the feature request here.

Bukhtawar · 2022-10-27T17:27:57Z

Based on the proposed solution suggested above do we need to just ensure we set the lucene version on the IndexWriter using IndexWriter#setIndexCreatedVersionMajor during peer recovery to ensure IndexWriter continues using the older Lucene version for segment creation to support backward compatibility.

The only place I can find being used today is for shrink/split operations during local shard recovery

OpenSearch/server/src/main/java/org/opensearch/index/shard/StoreRecovery.java

Lines 206 to 217 in fe0b917

    
           final int luceneIndexCreatedVersionMajor = Lucene.readSegmentInfos(sources[0]).getIndexCreatedVersionMajor(); 
        
           final Directory hardLinkOrCopyTarget = new org.apache.lucene.misc.store.HardlinkCopyDirectoryWrapper(target); 
        
           IndexWriterConfig iwc = new IndexWriterConfig(null).setSoftDeletesField(Lucene.SOFT_DELETES_FIELD) 
        
               .setCommitOnClose(false) 
        
               // we don't want merges to happen here - we call maybe merge on the engine 
        
               // later once we stared it up otherwise we would need to wait for it here 
        
               // we also don't specify a codec here and merges should use the engines for this index 
        
               .setMergePolicy(NoMergePolicy.INSTANCE) 
        
               .setOpenMode(IndexWriterConfig.OpenMode.CREATE) 
        
               .setIndexCreatedVersionMajor(luceneIndexCreatedVersionMajor);

mch2 · 2022-11-01T19:55:52Z

@Bukhtawar This seems like a possible alternative worth exploring. This setting is not dynamically updatable within LiveIndexWriterConfig, so we'd need to recreate the writer/engine on primaries once all nodes have finished upgrade, maybe not so bad?

At a minimum, we could also block any replica from syncing to segments that are ahead. This will slow freshness but still gives a path fwd.

Bukhtawar · 2022-11-02T00:19:46Z

Thanks @mch2 I guess the cluster usually operates in a backward compatible mode till all nodes have upgraded. Also primary moves to the newer version before replicas do. So we just need to ensure that in a mixed cluster version upgraded primaries start with older IW version and replicate as usual to the older version replicas. Then when all nodes upgrade we implicitly have all replicas upgraded we recreate IW on primary. It might not be as bad if we could handle inflight operations gracefully by buffering them up and re-driving during the IW config switch(similar to primary relocation handoff)

muralikpbhat · 2022-11-02T05:43:26Z

Thanks Bukhtawar, this is interesting problem and definitely a blocker for Segment replication release.

Thanks @mch2 I guess the cluster usually operates in a backward compatible mode till all nodes have upgraded.

Doesn't this imply the nodes in the newer version already continue to create segments with older version until all nodes are upgraded? Or are we saying it is just backward compatible(from read perspective), but no new node can be downgraded even if the entire cluster is still not in latest version?

mch2 · 2022-11-15T01:14:43Z

Doesn't this imply the nodes in the newer version already continue to create segments with older version until all nodes are upgraded? Or are we saying it is just backward compatible(from read perspective), but no new node can be downgraded even if the entire cluster is still not in latest version?

@muralikpbhat, @Bukhtawar. I understand this idea as the newer version would continue to create segments with the older codec until all nodes are upgraded, but I don't think the cluster operates this way currently. I played around with this a bit with the Lucene demo. Setting setIndexCreatedVersionMajor in iwc will bypass this check SegmentInfos.init, but it does not control the codec with which new segments are written. We would need the new shard to understand the codec in use on replicas, ensure it is available, and set that with IndexWriterConfig.setCodec for the segments to be read.

mch2 · 2023-02-06T22:16:09Z

We are going to start working on this. Writing up some smaller steps.

[Segment Replication] - Create a POC proving out mixed clusters with SR enabled.

Poojita-Raj · 2023-03-10T18:29:01Z

We are going to start working on this. Writing up some smaller steps.

* [ ]  [[Segment Replication] - Create a POC proving out mixed clusters with SR enabled.](https://github.com/opensearch-project/OpenSearch/issues/6211)

Next step:

[SegmentReplication] RollingUpgrade - Add a compatibility check to avoid shard failure #6616

dreamer-89 · 2023-04-27T23:27:40Z

Looking into it

Poojita-Raj · 2023-05-01T20:59:42Z

Support for mixed cluster versions:

1. For default codecs - [Segment Replication] Mixed cluster version support for default codecs #7349
2. For mapper service codecs
3. For custom codecs in plugins - [Segment Replication] Support for custom codecs #7668

Additionally check the working of:

2.7 to 2.8 rolling upgrade with no changes - default codec
2.7 to 2.8 rolling upgrade with no changes - non-default codecs

mch2 · 2023-05-03T17:56:36Z

The POC we have done proves that we can load an older codec to write with. An issue with this approach is we will need to provide bwc versions of non default codecs to be loaded by codecService, and map to them in CodecService. PerFieldMappingPostingFormatCodec and any plugin provided codec, in particular knn.

Some additional thoughts:

We need to benchmark any impact to recovery/upgrade times.
We could provide an option to not go this route for users that are ok with lagging replicas during an upgrade.
We could provide an option to block/avoid a primary relocation to an upgraded node if a certain % of replicas are live on non upgraded nodes.

@Poojita-Raj @Bukhtawar @nknize curious of your thoughts on this.

Bukhtawar · 2023-05-15T12:01:40Z

One of the major friction for users upgrading to newer versions could be downtime. Trying to understand the behaviour better, during an upgrade once primary shard moves over to the new version, at what point we can flip it back to the newer IndexWriter once all its replicas have upgraded. Would it amount to disruption during the time index writer is closing and opening with the new codecs or this can be handled gracefully, more like allowing older requests to finish concurrently with newer in-flight requests served by the new writer

dreamer-89 · 2023-06-08T00:16:59Z

Coming from #7698, I see current version lucene library is not write compatible when using previous major lucene codecs i.e. using N-1 codec in Nth version of Lucene. Any attempt to indexing operation fails with UnsupportedOperationException. Verified this on different lucene version of 9.x (listed below), that creating index using Lucene87 fails during indexing. Lucene only allows codecs in current major (e.g. Lucene90, .., Luene95 for 9X) for writing (part of core library) and older/bwc codecs are only meant for reading the older segments. Thus, it appears downgrading lucene codec on primary wouldn't work during major lucene bumps.

Lucene95 - latest main
Lucene92 5358502
Lucene90 006c832

Step 1. Create index with older lucene index using current lucene version 9x (any of above 3)

{
    "settings": {
        "index": {
            "number_of_shards": 1,
            "number_of_replicas": 1,
            "codec": "Lucene87"
        }
    }
}

Step 2. Index operation

{
    "error": {
        "root_cause": [
            {
                "type": "unsupported_operation_exception",
                "reason": "Old codecs may only be used for reading"
            }
        ],
        "type": "unsupported_operation_exception",
        "reason": "Old codecs may only be used for reading"
    },
    "status": 500
}

CC @mch2 @nknize @anasalkouz

mch2 · 2023-06-14T18:47:26Z

Thanks for this @dreamer-89, this breaks our current downgrade proposal across major verisons. I've had some offline discussions with @Bukhtawar and @nknize on the possibility of running two Lucene versions side by side for major version upgrades. The initial idea was to load a the old IW version inside of a separate engine implementation and support that with each OS release. Unfortunately this won't work because we would need the same major version of Lucene across all ingest packages, particularly during analysis. So I don't think we have a clear path with this solution. Perhaps as part of a migration component in the future, but we would need to figure out how to load essentially two versions of Opensearch.

I think we can solve this for segrep by preventing this invalid mixed cluster state (primary on upgraded node copying to replicas on old node) rather than supporting it directly.

The two ways you would get into this state are:

Through a rolling upgrade where each node is restarted in place.

From our docs, we suggest to turn off replica allocation during the upgrade process. In this case we end up with a cluster where all replicas are UNASSIGNED until the setting is flipped back. So IMO nothing is required here?

Adding an upgraded node to a cluster and relocating shards to it.

Switching falling behind replicas to docrep is an idea, but would be problematic as replicas are relocated directly to the upgraded nodes, so we would have to force recovery from the primary in these cases to have the same segments. Also if a primary dies and a docrep shard is promoted, we have to reconcile that with other replicas / remote storage.

In this case, we could provide an option to move replicas first if relocation is triggered. We have a setting to move primaries first - #1445, so we could do the inverse. The rationale for that setting was to prevent all copies of any single shard from being relocated before any copies of another shard. So we would lose this guarantee unless we moved a single replica copy for each replication group before moving the rest. This would leave us eventually in a temporary state where all primaries are on old nodes and replicas on new nodes until upgrade is completed. If the old nodes/primaries are overwhelmed the primary first setting could be enabled.

dreamer-89 · 2023-06-16T04:33:59Z

Thanks for this @dreamer-89, this breaks our current downgrade proposal across major verisons. I've had some offline discussions with @Bukhtawar and @nknize on the possibility of running two Lucene versions side by side for major version upgrades. The initial idea was to load a the old IW version inside of a separate engine implementation and support that with each OS release. Unfortunately this won't work because we would need the same major version of Lucene across all ingest packages, particularly during analysis. So I don't think we have a clear path with this solution. Perhaps as part of a migration component in the future, but we would need to figure out how to load essentially two versions of Opensearch.

I think we can solve this for segrep by preventing this invalid mixed cluster state (primary on upgraded node copying to replicas on old node) rather than supporting it directly.

The two ways you would get into this state are:

Through a rolling upgrade where each node is restarted in place.

From our docs, we suggest to turn off replica allocation during the upgrade process. In this case we end up with a cluster where all replicas are UNASSIGNED until the setting is flipped back. So IMO nothing is required here?

Adding an upgraded node to a cluster and relocating shards to it.

Switching falling behind replicas to docrep is an idea, but would be problematic as replicas are relocated directly to the upgraded nodes, so we would have to force recovery from the primary in these cases to have the same segments. Also if a primary dies and a docrep shard is promoted, we have to reconcile that with other replicas / remote storage.

In this case, we could provide an option to move replicas first if relocation is triggered. We have a setting to move primaries first - #1445, so we could do the inverse. The rationale for that setting was to prevent all copies of any single shard from being relocated before any copies of another shard. So we would lose this guarantee unless we moved a single replica copy for each replication group before moving the rest. This would leave us eventually in a temporary state where all primaries are on old nodes and replicas on new nodes until upgrade is completed. If the old nodes/primaries are overwhelmed the primary first setting could be enabled.

Thank you @mch2 for the your comment and dividing this problem into two separate sub-problems i.e. rolling upgrades and relocation upgrades along with possible solutions.

Rolling upgrades - No fix is needed

I see for rolling upgrades case there is recommendation [1] to have cluster.routing.allocation.enable setting set to primary. This setting prevents replica shard allocations and avoids replica shard relocations storm resulting due to OS process restarts. The end result of having this setting in rolling upgrade prevents state where replica shards are on non-upgraded nodes while primary shard on upgraded node. A node restart action will have impact on

Replica shard.
- Shard is not allowed to allocate due to cluster.routing.allocation.enable setting and thus will remain in UNASSIGNED state.
Primary shard
- If replica copy avaialble on other node, one is promoted as new primary (failover case).
- If there is no available replica copy available due to previous node restarts, cluster manager may allocate primary on upgraded node where previous replica copy existed. This is best effort solution to have available primary shard copy.
- If there are no replica configured, shard will be allocated on same node post restart.

It is still possible to force move primary shards to upgraded nodes but it is not recommended. This needs to be updated on rolling upgrade documentation. Based on this, the only use case that needs to be solved is the relocation based upgrades (state 2 in last comment).

Relocation based upgrades

Using proposals from previous comment, compiled list of possible solutions listed below for solving relocation based upgrades.

1. No change

Do not change anything around version upgrades. With primary shards sitting on upgraded nodes, there will be replica shards failures on non-upgraded nodes. These replica shards will eventually go to UNASSIGNED state and yellow cluster. The replica shard will auto recover when allocation is attempted on upgraded node.

Pros

No code change thus cleanest

Cons

Problematic to customer sensitive to search to unassigned replicas in mixed cluster

2. Reset codec in mixed cluster state

When cluster is running in mixed version state, reset codec on primary to version running on non-upgraded node so that segment files (now written with older lucene codec) can be read on replicas which are still running on non-upgraded nodes. Please note, replicas which are running on upgraded node can still read files from primary. We used OpenSearch version to lucene codec name mapping that allows primary to pick the bwc codec name. We reset the codec to latest one once upgrade is complete. This solution was attempted in #7698 but has below two limitations.

Custom codecs

Plugins that have custom codec implementations load the current custom version of Lucene codec which prevents codec downgrade. Thus, this solution also needs changes in downstream components which overrides CodecService. One such example is k-NN plugin.

@Override
public Codec codec(String name) {
return KNNCodecVersion.current().getKnnCodecSupplier().apply(super.codec(name), mapperService);
}

Major lucene version bumps

For major version bumps this solution doesn’t work because lucene treats previous major (N-1) codecs as read only and maintain only for backward compatibility[1]. When using previous major codecs with IndexWriter; it results in UnsupportedOperationException during indexing operations.

For example, in current version of Lucene9X, Lucene87StoredFieldsFormat class (defines the format of file that stores data, specific fields and metadata (.fdt, .fdm etc)) is defined in bwc-codecs and prevents initializations of any write related methods. Any attempt to perform indexing operations results in UnsupportedOperationException

3. Move replicas first to upgraded nodes

In relocation upgrades, move replica shard copies first to upgraded nodes and move primary only when all (or certain %) have moved to upgraded nodes. Replicas shards on upgraded node with updated codecs can read segment files written in older lucene version on non-upgraded node containing primary shard. This holds true due to Lucene bwc read guarantees that allow current major version to read segment files written in all minors in previous major. For e.g. data files written in Lucene8x version can be read by replica shard running on Lucene9x version.

Pros

Non intrusive and clean.
Proven solution due to lucene bwc guarantees

Cons

For overwhelmed clusters, risk of running primary shard on degraded cluster for extended times.
Using cluster level settings mean have same behavior of moving replicas first for all indices

4. Use docrep during rolling upgrades

Use document replication in mixed cluster state that allows primary and replica to index and create segment files independently. Primary derives this engine switch only when replica runs on non-upgraded node and primary on upgraded node. When replica shard containing node is upgraded, primary ignores the event. When primary shard is upgraded, it pings alll replica shards to promote to writeable engine. Replica shards running on non-upgraded nodes follow this directive and switch to InternalEngine while replicas already on upgraded nodes ignores this command. Similarly, when upgrade completes, primary shards pings all replica shards to switch back to NRTReplicationEngine. Again, only replica shards which previously upgraded, will only demote the engine. During downgrade, replica shards perform a force sync from primary to mirror image files from primary. During this sync, replica shard ignores segment file diffs (happens due to new segment files written with InternalEngine). Once sync completes, the replica shard performs store cleanup which removes all segment files which doesn’t belong to latest commit.

Example state. Replica 1, Replica 2 are replica shard copies of Primary. Other Replica is a replica copy belonging to a different index.

Pros

Works equally well for both type of upgrades.

Cons

Intrusive as it involes switching engine.
Need take extra effort to implement segrep ↔ docrep switch as it is not available today.
During downgrade, there will be search related disruption as segment files written with InternalEngine will be cleaned up e.g. Scroll/PIT queries

Proposal

Based on above, moving replicas first seems to be clean and less riskier solution compared to others and thus is the preferred solution. Tagging folks for review and feedback @mch2 @andrross @nknize @Bukhtawar

References

[1] https://github.com/apache/lucene/tree/main/lucene/backward-codecs
[2] https://opensearch.org/docs/latest/install-and-configure/upgrade-opensearch/index/

dblock · 2023-06-20T13:40:59Z

Is rolling back a partial upgrade also a concern and can we solve it here? If 1/3 nodes in a cluster has been upgraded to 4.0, I think we want to allow rolling it back, or recovering from/failing over to a replica. I think this gives us actual rollback from failed upgrades in flight!

Bukhtawar · 2023-06-20T13:54:06Z

Is rolling back a partial upgrade also a concern and can we solve it here?

I don't think rollback is an option given segments created in higher versions cannot be understood by Lucene in lower version.

While I would ideally like to address this BWC issue in Lucene to support creating segments with IW across major versions just the way minor versions are supported. Is there a discuss thread on Lucene that we can brainstorm on.

I would potentially want us to evaluate a Blue-Green option with updating the replica copies first and then the primaries to get past the primary overload on the last few non-upgraded node issue.

dreamer-89 · 2023-06-27T15:24:37Z

Is rolling back a partial upgrade also a concern and can we solve it here?

I don't think rollback is an option given segments created in higher versions cannot be understood by Lucene in lower version.

While I would ideally like to address this BWC issue in Lucene to support creating segments with IW across major versions just the way minor versions are supported. Is there a discuss thread on Lucene that we can brainstorm on.

Thanks @Bukhtawar for your comment and feedback. I opened an issue to Lucene where I see a similar ask in the past for N-1 writer & reader support. This ask was identified as a bigger task and encouraged to be solved in distributed system. Please feel free to comment on the issue to initiate more discussion.

I would potentially want us to evaluate a Blue-Green option with updating the replica copies first and then the primaries to get past the primary overload on the last few non-upgraded node issue.

Yes, we are proposing to move replicas copies first for relocation based upgrades (Blue-Gree option) and work is tracked in #8265.

For rolling-ugprades, documentation recommends users to disable replica shard allocation using "cluster.routing.allocation.enable": "primaries" setting (to prevent recovery storm). In my experimentation so far, wIth primary allocation only setting enabled, I have not found any state where it is possible to have replica on non-upgraded nodes while primary on upgrade nodes. I provided more details in proposal above. Please let me know if this is not correct.

Poojita-Raj · 2023-08-01T22:02:15Z

After the discussion we've had here, the final conclusion when it comes to providing mixed version cluster support is to move replicas first to upgraded nodes. We are introducing a replica first shard movement setting in order to achieve this.

To support mixed cluster versions:

Prioritize replica shard movement during shard relocation (for successful blue green deployments). [Segment Replication] Prioritize replica shard movement during shard relocation #8875
Handle failover of primary shards in mixed cluster mode [Segment Replication] Handle failover in mixed cluster mode #9027

…relocation (#8875) (#9153) When some node or set of nodes is excluded, the shards are moved away in random order. When segment replication is enabled for a cluster, we might end up in a mixed version state where replicas will be on lower version and unable to read segments sent from higher version primaries and fail. To avoid this, we could prioritize replica shard movement to avoid entering this situation. Adding a new setting called shard movement strategy - `SHARD_MOVEMENT_STRATEGY_SETTING` - that will allow us to specify in which order we want to move our shards: `NO_PREFERENCE` (default), `PRIMARY_FIRST` or `REPLICA_FIRST`. The `PRIMARY_FIRST` option will perform the same behavior as the previous setting `SHARD_MOVE_PRIMARY_FIRST_SETTING` which will be now deprecated in favor of the shard movement strategy setting. Expected behavior: If `SHARD_MOVEMENT_STRATEGY_SETTING` is changed from its default behavior to be either `PRIMARY_FIRST` or `REPLICA_FIRST` then we perform this behavior whether or not `SHARD_MOVE_PRIMARY_FIRST_SETTING` is enabled. If `SHARD_MOVEMENT_STRATEGY_SETTING` is still at its default setting of `NO_PREFERENCE` and `SHARD_MOVE_PRIMARY_FIRST_SETTING` is enabled we move the primary shards first. This ensures that users still using this setting will not see any changes in behavior. Reference: #1445 Parent issue: #3881 --------- Signed-off-by: Poojita Raj <[email protected]> (cherry picked from commit c6e4bcd)

hdhalter · 2023-08-23T19:39:38Z

I the update requires documentation for 2.10, please create a doc issue. Thanks!

Poojita-Raj · 2023-08-31T17:57:14Z

Closing out this issue as all work has been completed. Doc changes are being tracked here: opensearch-project/documentation-website#4827

Bukhtawar added enhancement Enhancement or improvement to existing feature or request untriaged >breaking Identifies a breaking change. and removed untriaged labels Jul 13, 2022

mch2 added the distributed framework label Jul 18, 2022

mch2 mentioned this issue Jul 19, 2022

[META] Segment Replication Issue list #2194

Closed

31 tasks

mch2 mentioned this issue Nov 18, 2022

[Meta] Promote Segment Replication out of experimental. #5147

Closed

16 tasks

mch2 mentioned this issue Feb 6, 2023

[Segment Replication] - Create a POC proving out mixed clusters with SR enabled. #6211

Closed

anasalkouz added this to Segment Replication Feb 9, 2023

anasalkouz moved this to Todo in Segment Replication Feb 9, 2023

anasalkouz assigned Poojita-Raj Feb 16, 2023

anasalkouz added Migration:ReadyToPick and removed Migration:ReadyToPick labels Mar 17, 2023

dreamer-89 self-assigned this Apr 27, 2023

dreamer-89 added v2.8.0 'Issues and PRs related to version v2.8.0' discuss Issues intended to help drive brainstorming and decision making labels Apr 27, 2023

Poojita-Raj mentioned this issue May 1, 2023

[Segment Replication] Mixed cluster version support for default codecs #7349

Closed

anasalkouz changed the title ~~[Segment Replication: Breaking] Support for mixed cluster versions~~ [Segment Replication: Breaking] Support for mixed cluster versions (Rolling Upgrade) May 4, 2023

Poojita-Raj mentioned this issue May 9, 2023

[BUG][Segment Replication] Manual reroute of shards from lower codec node to higher version fails #7489

Closed

mch2 moved this from Todo to In Progress in Segment Replication May 11, 2023

anasalkouz changed the title ~~[Segment Replication: Breaking] Support for mixed cluster versions (Rolling Upgrade)~~ [Segment Replication] Support for mixed cluster versions (Rolling Upgrade) May 18, 2023

This was referenced Jun 2, 2023

[Segment Replication] Add lucene codec tests #7903

Merged

Add basic integration test for segment replication opensearch-project/k-NN#927

Closed

dreamer-89 mentioned this issue Jun 23, 2023

Support writes with previous major lucene versions apache/lucene#12391

Open

Poojita-Raj mentioned this issue Jun 26, 2023

[Segment Replication] Prioritize replica shard movement during shard relocation #8265

Closed

Poojita-Raj mentioned this issue Jul 25, 2023

[Segment Replication] Prioritize replica shard movement during shard relocation #8875

Merged

6 tasks

Bukhtawar added the Indexing:Replication Issues and PRs related to core replication framework eg segrep label Jul 27, 2023

Poojita-Raj mentioned this issue Aug 1, 2023

[Segment Replication] Handle failover in mixed cluster mode #9027

Closed

hdhalter mentioned this issue Aug 29, 2023

[DOC] Add documentation for cluster routing allocation settings opensearch-project/documentation-website#4827

Closed

4 tasks

Poojita-Raj closed this as completed Aug 31, 2023

github-project-automation bot moved this from In Progress to Done in Segment Replication Aug 31, 2023

Poojita-Raj mentioned this issue Sep 25, 2023

[Segment Replication] Write a test for mixed cluster segrep enabled POC #6346

Closed

github-project-automation bot added this to OpenSearch Project Roadmap Aug 30, 2024

github-project-automation bot moved this to 2.10.0 (Launched) in OpenSearch Project Roadmap Aug 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Segment Replication] Support for mixed cluster versions (Rolling Upgrade) #3881

[Segment Replication] Support for mixed cluster versions (Rolling Upgrade) #3881

Bukhtawar commented Jul 13, 2022

Bukhtawar commented Oct 27, 2022 •

edited

Loading

mch2 commented Nov 1, 2022

Bukhtawar commented Nov 2, 2022

muralikpbhat commented Nov 2, 2022

mch2 commented Nov 15, 2022 •

edited

Loading

mch2 commented Feb 6, 2023 •

edited

Loading

Poojita-Raj commented Mar 10, 2023 •

edited

Loading

dreamer-89 commented Apr 27, 2023

Poojita-Raj commented May 1, 2023 •

edited

Loading

mch2 commented May 3, 2023

Bukhtawar commented May 15, 2023 •

edited

Loading

dreamer-89 commented Jun 8, 2023 •

edited

Loading

mch2 commented Jun 14, 2023 •

edited

Loading

dreamer-89 commented Jun 16, 2023 •

edited

Loading

dblock commented Jun 20, 2023 •

edited

Loading

Bukhtawar commented Jun 20, 2023

dreamer-89 commented Jun 27, 2023 •

edited

Loading

Poojita-Raj commented Aug 1, 2023

hdhalter commented Aug 23, 2023

Poojita-Raj commented Aug 31, 2023

[Segment Replication] Support for mixed cluster versions (Rolling Upgrade) #3881

[Segment Replication] Support for mixed cluster versions (Rolling Upgrade) #3881

Comments

Bukhtawar commented Jul 13, 2022

Bukhtawar commented Oct 27, 2022 • edited Loading

mch2 commented Nov 1, 2022

Bukhtawar commented Nov 2, 2022

muralikpbhat commented Nov 2, 2022

mch2 commented Nov 15, 2022 • edited Loading

mch2 commented Feb 6, 2023 • edited Loading

Poojita-Raj commented Mar 10, 2023 • edited Loading

dreamer-89 commented Apr 27, 2023

Poojita-Raj commented May 1, 2023 • edited Loading

mch2 commented May 3, 2023

Bukhtawar commented May 15, 2023 • edited Loading

dreamer-89 commented Jun 8, 2023 • edited Loading

mch2 commented Jun 14, 2023 • edited Loading

dreamer-89 commented Jun 16, 2023 • edited Loading

Rolling upgrades - No fix is needed

Relocation based upgrades

1. No change

2. Reset codec in mixed cluster state

Custom codecs

Major lucene version bumps

3. Move replicas first to upgraded nodes

4. Use docrep during rolling upgrades

Proposal

References

dblock commented Jun 20, 2023 • edited Loading

Bukhtawar commented Jun 20, 2023

dreamer-89 commented Jun 27, 2023 • edited Loading

Poojita-Raj commented Aug 1, 2023

hdhalter commented Aug 23, 2023

Poojita-Raj commented Aug 31, 2023

Bukhtawar commented Oct 27, 2022 •

edited

Loading

mch2 commented Nov 15, 2022 •

edited

Loading

mch2 commented Feb 6, 2023 •

edited

Loading

Poojita-Raj commented Mar 10, 2023 •

edited

Loading

Poojita-Raj commented May 1, 2023 •

edited

Loading

Bukhtawar commented May 15, 2023 •

edited

Loading

dreamer-89 commented Jun 8, 2023 •

edited

Loading

mch2 commented Jun 14, 2023 •

edited

Loading

dreamer-89 commented Jun 16, 2023 •

edited

Loading

dblock commented Jun 20, 2023 •

edited

Loading

dreamer-89 commented Jun 27, 2023 •

edited

Loading