-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Segment Replication] - Fix corrupt index when verifying after file copy. #2331
Comments
Replica shards request the bytes of the latest SegmentInfos object on the primary and a list of file metadata computed from that SegmentInfos. The SegmentInfos object from which metadata is computed is an in memory version of the Segments_N file on disk that references all active segments. The Segments_N file on disk is only updated during a commit. Therefore the SegmentInfos can be ahead of the file on disk and reference newly created segments and/or not reference segments that have been merged away that are still referenced by the on disk Segments_N. Under load the primary is continuously indexing new documents and publishing new checkpoints. As the replica receives these checkpoints it discards them if an active replication event is occurring and catches up the next time replication runs. As segments grow larger, copying them from the primary to replicas takes longer. If a replica falls behind multiple checkpoints that the primary shard publishes it can miss an entire commit generation. The next time a replica requests a checkpoint, the replica will not yet have segment files referenced by the primary’s Segments_N file. If these segments were merged away on the primary, they will not be referenced by the in memory SegmentInfos that is returned in the request. This leaves us in a state where the replica’s on disk Segments_N file references segment files that do not exist on disk. This leaves our shard in a corrupt state if we ever need to restore from the file system, particularly during node restarts. Option 1 - Do nothing - ignore Segments_N file on replicas and always recover from a primary. If the primary is lost, we will not be able to recover. Remote storage solves the problem bc all segments are stored elsewhere. I'm thinking Option 1 should be configurable with a setting so that with remote storage we don't copy the extra data. Am going to try option 2 - that gives us the best performance and solves the problem of a corrupt index bc we'd still be able to recover from store. |
Option 2 is not a great solution here bc we run the risk of never catching up and never cleaning up old files. I think we should go with 3 with a setting for 1 until remote storage is enabled. |
Describe the bug
The SegmentInfos object on primaries can be newer (after index or merge) compared to the segments_N file written to disk. We are currently deleting segment files based off of a metadata list sent back from primary shards that is generated from the in memory SegmentInfos. After files are copied we verify the index by reading the on disk segments_N file. In the event this file is behind the SegmentInfos in memory, this throws an exception because we have deleted files still referenced by the file.
We should update ReplicationTarget to include segments referenced by the on disk Segments_N during deletion.
The text was updated successfully, but these errors were encountered: