-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Segment Replication] [BUG] Primary hits NoSuchFileException computing metadata snapshot. #4178
Comments
Looking into it. |
@mch2 : In order to repro this issue, I am trying to run integration tests (please ignore other changes added for debugging) in SegmentReplicationIT.java, but not able to hit this issue (saw this couple of time previously but not right now). The only issue I am consistently getting is #4205 and #4216 Will keep this issue open and come back in case I see this failure again. Till then focussing on #4216 which is failing more consistently.
|
I'm able to see this consistently when running benchmarks, but its more difficult getting the primary into this state within an IT. I think at a minimum we should add retries here on the replica instead of failing the shard. |
Ok figured out a bit more of whats going on here, I think theres two things lacking.
final String segmentsFile = segmentInfos.getSegmentsFileName();
checksumFromLuceneFile(directory, segmentsFile, builder, logger, maxVersion, true); The primary is in a state where the latest in memory infos has a segments file that is no longer on disk. The on-disk version is one generation ahead. I think in this case we can send the replica the latest segments_N and ignore the missing file. The replica does not actually need the missing _N file, particularly if the primary doesn't either, because we send the SegmentInfos bytes over the wire. I think in this case we could skip the metadata for the missing file and continue? |
I'm thinking we could do something like this on the primary to return the on-disk infos if the generation is higher than whats in memory - @Override
public GatedCloseable<SegmentInfos> getSegmentInfosSnapshot() {
final SegmentInfos latestInMemoryInfos = getLatestSegmentInfos();
SegmentInfos result;
try {
final long lastCommitGeneration = SegmentInfos.getLastCommitGeneration(store.directory());
final long generation = latestInMemoryInfos.getGeneration();
logger.info("Latest gen {} - {}", generation, lastCommitGeneration);
if (generation < lastCommitGeneration) {
// the latest in memory infos is behind disk, read the latest SegmentInfos from disk and return that.
final SegmentInfos latestCommit = SegmentInfos.readLatestCommit(store.directory());
result = latestCommit;
} else {
result = latestInMemoryInfos;
}
indexWriter.incRefDeleter(result);
} catch (IOException e) {
throw new EngineException(shardId, e.getMessage(), e);
}
return new GatedCloseable<>(result, () -> indexWriter.decRefDeleter(result));
} |
@mch2 : This bug seems to be interesting. Is memory Can this inconsistency b/w memory & disk state be problematic for segment replication ? |
I think this is happening when the primary commits without creating any new segments, because a subsequent call to refresh does not actually refresh the active reader. It happens quite frequently on the primary in testing. If we return the on-disk version here there is no risk. We'll send the infos byte[] and recreate it on the replica to load into its reader.
Hmm, I think we need to throw & retry replication if the file missing is not segments_N, All files other than _N will be incRef'd until replication completes. So if one of those files is missing there is a bug. If segments_n is missing the replica will still fetch the byte[] and be able to read any new segments, however in this case we can't clear the replica's xlog & must preserve the latest on-disk commit point, which is done here.
With the primary-only version of segrep we need to ensure we have a valid commit point fsync'd on disk at all times that we can potentially restart/recover from & replay ops from xlog. |
Using below test where without the fix No such file exception happens. Saw one instance where No such file exception was happening but not able to repro that anymore after running below test multiple times.
|
One observation from code scan is that during replica promotion, replica commit SegmentInfos using |
@dreamer-89 I've drafted a PR that could solve this by only sending segments over the wire and ignoring commit points - #4366. I think this is ok because we send the bytes[] of the SegmentInfos to replicas and can commit that if needed for durability. Right now the replica only "commits" if its being promoted as primary, but we'll want to perform commits when a new segment generation is received by the primary and whenever the engine closes. My draft doesn't do that right now, am looking at adding it but may want to split it up into two changes. |
Describe the bug
While running some perf tests for Segment Replication I hit an Exception while computing a metadata snapshot off of the primary's latest segment infos. I think this is caused by the primary missing a segments_N file on disk that is currently referenced by the latest SegmentInfos returned from getLatestSegmentInfos. The exception is hit
This means the primary needs to refresh its reader before we compute the metadata or we could hit this situation.
Further, we are invoking indexWriter.incRefDeleter while fetching the latest SegmentInfos and returning it as a closeable to release the files. However,
incRefDeleter
does not incref the segments_N file. - https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java#L5831. This means that this file could be merged away before we finish copying it to a replica.Trace:
The text was updated successfully, but these errors were encountered: