-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Remote Translog] Deleting remote translog considering latest remote metadata #5869
Conversation
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
@@ Coverage Diff @@
## main #5869 +/- ##
============================================
+ Coverage 70.88% 70.91% +0.02%
- Complexity 58720 58776 +56
============================================
Files 4768 4768
Lines 280575 280584 +9
Branches 40514 40516 +2
============================================
+ Hits 198881 198971 +90
+ Misses 65334 65327 -7
+ Partials 16360 16286 -74
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
server/src/main/java/org/opensearch/index/translog/RemoteFsTranslog.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/index/translog/RemoteFsTranslog.java
Outdated
Show resolved
Hide resolved
} | ||
} | ||
|
||
public void trimUnreferencedReaders() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add @Override
annotation?
logger.trace("delete remote translog generation file [{}], not referenced by metadata anymore", generation); | ||
deleteRemoteGeneration(generation); | ||
} else { | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we breaking out of the for loop?
I can think of that we may want to retry the uploads - in which case should we start the for loop from minimum generation number that exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would increase the flush times to a very high number. The clean up can be taken as an async job on its own .
String translogFilename = Translog.getFilename(generation); | ||
if (fileTransferTracker.uploaded(translogFilename)) { | ||
logger.trace("delete remote translog generation file [{}], not referenced by metadata anymore", generation); | ||
deleteRemoteGeneration(generation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleteRemoteGeneration method uses the current primaryTerm. This probably would not be handled when there is failover and the primary term has increased. Could we rely on metadata to fetch the right generation to primary term mapping for cleaning up the files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great catch @ashking94 . The issue here is the latest metadata doesn't know about the right primary term for this generation. I will create a backlog item for cleaning up the older primary term translog files . This can be done on every failover or an async job . I don't want to complicate the usual deletion flow due to this corner case .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls address the comments.
9be02e5
to
246924d
Compare
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
2422c10
to
950c2bc
Compare
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Gaurav Bafna <[email protected]>
Signed-off-by: Gaurav Bafna <[email protected]>
Signed-off-by: Gaurav Bafna <[email protected]>
Signed-off-by: Gaurav Bafna <[email protected]>
Signed-off-by: Ashish Singh <[email protected]>
950c2bc
to
61344d4
Compare
Gradle Check (Jenkins) Run Completed with:
|
Created PR #6086 for rest of the development due to access issue. |
Signed-off-by: Gaurav Bafna [email protected]
Description
We should delete remote translog considering the latest metadata file uploaded. The tlog/ckp files referenced by that metadata file cannot be deleted . By doing this we will be able to restore translog from that metadata.
Issues Resolved
#5845
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.