[Segment Replication] Synchronize checkpoint updates with failover. #4136
Closed
Labels
distributed framework
enhancement
Enhancement or improvement to existing feature or request
v2.5.0
'Issues and PRs related to version v2.5.0'
With #4135 and #3989, basic failover support is added for shards with segment replication enabled.
However, this change does not consider what happens to ongoing or incoming copy events during failover.
Replicas should remain as swappable backups that recovery quickly, so I do not think we should wait for file copy to complete for an ongoing replication. The replica should cancel the event and begin its failover steps (commit & rewire its engine). However, If a replica has an ongoing copy event that is in the finalize step, meaning all segments for a new checkpoint have arrived and the only remaining step is to wire into its directory reader, I think we can let it complete and then continue?
The text was updated successfully, but these errors were encountered: