-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up temporary files created during segment merge incase force segment merge fails #6324
Conversation
baf3fec
to
8a89f7c
Compare
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Looks ok. |
@@ -2009,6 +2009,7 @@ public void forceMerge( | |||
throw ex; | |||
} catch (Exception e) { | |||
try { | |||
indexWriter.flush(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flush()
calls internally flush(boolean triggerMerge, boolean applyAllDeletes)
. In cases where the forceMerge is failing, should we just do a flush(false, true)
rather than flush(true, true)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flush(false, true)
is not a public function. We cannot call it from outside InternalEngine
(Lucene).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes. May be let's just check and confirm that the triggerMerge
is not triggering merge again after the temporary files clean up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should move this below maybeFailEngine("force merge", e);
. In corruption cases, flush is not going to help out .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flush is optional for a given force merge . Does it guarantee for clean up the temporary files ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, InternalEngine
flush()
is different from IndexWriter
flush()
. Optional force merge option is for InternalEngine
flush()
. Deletion of temporary new files is happening inside IndexWriter
deleteNewFiles
which is called from IndexWriter
flush()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed the comments.
…merge fails Signed-off-by: Rishav Sagar <[email protected]>
8a89f7c
to
d15e628
Compare
Gradle Check (Jenkins) Run Completed with:
|
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-6324-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 6b35c32a9ede76d6e8aca79095e42b9ec67d22c1
# Push it to GitHub
git push --set-upstream origin backport/backport-6324-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x Then, create a pull request where the |
@@ -2010,6 +2010,7 @@ public void forceMerge( | |||
} catch (Exception e) { | |||
try { | |||
maybeFailEngine("force merge", e); | |||
indexWriter.flush(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are invoking a second merge within the context of a failed merge, which might be tricky and non-deterministic. Essentially we are saying that a second merge(merge + delete new files) will succeed when the first one failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had raised the same - #6324 (comment). @RS146BIJAY you had mentioned that the merge does not get triggered when in exception block. Hoping that we have confirmed the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of now, lucene only allows a flush call where a regular segment merge is triggered after we delete new lucene files. It does not exposes another overload of flush(boolean triggerMerge, boolean applyAllDeletes) (where we can set triggerMerge to false to not allow segment merge along with file deletion).
As per the corresponding Lucene issue link, this is kept intentionally that if we delete some file we perform a regular segment merge after it.
+1 on the patch and agreed that we should only expose a no-argument flush method, it's actually nice that the patch decreased the visibility of flush(boolean, boolean).
I don't think we should add anything but a no-arg flush() here.
If its going to be more engineered than that, then this is the wrong direction: lets just not expose it at all.
Did you want to backport this to 2.x @RS146BIJAY? Will need to be done manually if it wasn't / link that PR. |
This was reverted in #6403. The issue (#5710) remains open. @RS146BIJAY what is the status here? |
Description
Currently, during segment merge (both during auto merge and force merge), OpenSearch creates multiple temporary files while merging individual segment components. Once individual components of the segments are merged, IndexWriter creates a Compound file (*.cfs) containing all the Lucene components for a segment. At the end, OpenSearch deletes these temporary files.
Now, in case these temporary files consume all the space on the node (because enough amount of space is unavailable for segment merge to go through), segment merge will fail. Also, the files which got created during segment merge do not get cleaned up and it continues to occupy space on the node. This can cause FSHealthService checks to fail for this node, ultimately causing these nodes to be removed from cluster.
In case Force segment merge fails, OpenSearch should remove the temporary files created during segment merge.
Issues Resolved
#5710
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.