[Segment Replication] - Fix corrupt index when verifying after file copy. #2331

mch2 · 2022-03-03T21:57:35Z

Describe the bug
The SegmentInfos object on primaries can be newer (after index or merge) compared to the segments_N file written to disk. We are currently deleting segment files based off of a metadata list sent back from primary shards that is generated from the in memory SegmentInfos. After files are copied we verify the index by reading the on disk segments_N file. In the event this file is behind the SegmentInfos in memory, this throws an exception because we have deleted files still referenced by the file.

We should update ReplicationTarget to include segments referenced by the on disk Segments_N during deletion.

mch2 · 2022-03-12T00:26:29Z

Replica shards request the bytes of the latest SegmentInfos object on the primary and a list of file metadata computed from that SegmentInfos. The SegmentInfos object from which metadata is computed is an in memory version of the Segments_N file on disk that references all active segments. The Segments_N file on disk is only updated during a commit. Therefore the SegmentInfos can be ahead of the file on disk and reference newly created segments and/or not reference segments that have been merged away that are still referenced by the on disk Segments_N.

Under load the primary is continuously indexing new documents and publishing new checkpoints. As the replica receives these checkpoints it discards them if an active replication event is occurring and catches up the next time replication runs. As segments grow larger, copying them from the primary to replicas takes longer. If a replica falls behind multiple checkpoints that the primary shard publishes it can miss an entire commit generation. The next time a replica requests a checkpoint, the replica will not yet have segment files referenced by the primary’s Segments_N file. If these segments were merged away on the primary, they will not be referenced by the in memory SegmentInfos that is returned in the request. This leaves us in a state where the replica’s on disk Segments_N file references segment files that do not exist on disk. This leaves our shard in a corrupt state if we ever need to restore from the file system, particularly during node restarts.

Option 1 - Do nothing - ignore Segments_N file on replicas and always recover from a primary. If the primary is lost, we will not be able to recover. Remote storage solves the problem bc all segments are stored elsewhere.
Option 2 - Preserve the previous commit point on replicas and only delete & purge xlog if the incoming segments_N is not corrupt.
Option 3 - Compute files that are missing in memory SegmentInfos but in latest commit and send them. I ran a perf test with this and saw memory % jump back to around 60% which is closer to docrep levels.
Option 4 - Force a flush on primary when we recognize there are merged away segments and send the latest commit point.
Option 5 - Manage a queue of checkpoints on replicas and process them serially. When a new checkpoint is published we incref so files aren’t deleted and release after all replicas have finished processing.

I'm thinking Option 1 should be configurable with a setting so that with remote storage we don't copy the extra data. Am going to try option 2 - that gives us the best performance and solves the problem of a corrupt index bc we'd still be able to recover from store.

mch2 · 2022-03-21T20:07:18Z

Option 2 is not a great solution here bc we run the risk of never catching up and never cleaning up old files. I think we should go with 3 with a setting for 1 until remote storage is enabled.

mch2 added bug Something isn't working untriaged labels Mar 3, 2022

This was referenced Mar 3, 2022

[META] Segment Replication Issue list #2194

Closed

Prevent files still referenced by the on disk segments_N from deletion. #2336

Closed

mch2 self-assigned this Mar 4, 2022

mch2 removed bug Something isn't working untriaged labels Mar 7, 2022

mch2 mentioned this issue Mar 22, 2022

[Segment Replication] Ensure replica's store always contains the previous commit point. #2551

Merged

5 tasks

mch2 closed this as completed Mar 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Segment Replication] - Fix corrupt index when verifying after file copy. #2331

[Segment Replication] - Fix corrupt index when verifying after file copy. #2331

mch2 commented Mar 3, 2022

mch2 commented Mar 12, 2022 •

edited

Loading

mch2 commented Mar 21, 2022 •

edited

Loading

[Segment Replication] - Fix corrupt index when verifying after file copy. #2331

[Segment Replication] - Fix corrupt index when verifying after file copy. #2331

Comments

mch2 commented Mar 3, 2022

mch2 commented Mar 12, 2022 • edited Loading

mch2 commented Mar 21, 2022 • edited Loading

mch2 commented Mar 12, 2022 •

edited

Loading

mch2 commented Mar 21, 2022 •

edited

Loading