-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensures cleanup of temporary index-* generational blobs during snapshotting #21469
Conversation
index-* blobs are generational files that maintain the snapshots in the repository. To write these atomically, we first write a `pending-index-*` blob, then move it to `index-*`, which also deletes `pending-index-*` in case its not a file-system level move (e.g. S3 repositories) . For example, to write the 5th generation of the index blob for the repository, we would first write the bytes to `pending-index-5` and then move `pending-index-5` to `index-5`. It is possible that we fail after writing `pending-index-5`, but before moving it to `index-5` or deleting `pending-index-5`. In this case, we will have a dangling `pending-index-5` blob laying around. Since snapshot elastic#5 would have failed, the next snapshot assumes a generation number of 5, so it tries to write to `index-5`, which first tries to write to `pending-index-5` before moving the blob to `index-5`. Since `pending-index-5` is leftover from the previous failure, the snapshot fails as it cannot overwrite this blob. This commit solves the problem by first, adding a UUID to the `pending-index-*` blobs, and secondly, strengthen the logic around failure to write the `index-*` generational blob to ensure pending files are deleted on cleanup. Closes elastic#21462
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add tests that show that the issue is fixed?
@@ -867,16 +867,23 @@ private long listBlobsToGetLatestIndexId() throws IOException { | |||
} | |||
|
|||
private void writeAtomic(final String blobName, final BytesReference bytesRef) throws IOException { | |||
final String tempBlobName = "pending-" + blobName; | |||
final String tempBlobName = "pending-" + UUIDs.randomBase64UUID() + "-" + blobName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it's nicer to append the random uuid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done
// exceptionToThrow will capture this and be thrown at the end | ||
} catch (IOException e) { | ||
if (exceptionToThrow == null) { | ||
exceptionToThrow = e; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exceptionToThrow
can never be null?
Maybe it's easier to just do the ex.addSuppressed(...)
?
IOException exceptionToThrow = ex; | ||
try { | ||
snapshotsBlobContainer.deleteBlob(tempBlobName); | ||
} catch (NoSuchFileException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should special-case this. Depending on underlying blobcontainer, a different exception might be thrown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its part of the contract of the BlobContainer
to throw a NoSuchFileException
if the blob doesn't exist on calling deleteBlob
, and we changed all implementations to adhere to this contract. That said, you are right, I went through two iterations of this, and now the only way we get to this block is if an exception was thrown, in which case we don't care about the nature of the exception here. Same with below
retest this please |
@ywelsch I pushed some commits that add a test and addresses your feedback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left minor comments (easy to fix). LGTM
* target blob already exists, an exception is thrown. | ||
* Renames the source blob into the target blob. If the source blob does not exist or the | ||
* target blob already exists, an exception is thrown. Atomicity of the move operation is | ||
* can only be guaranteed on an implementation-by-implementation basis. The only current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is can
snapshotsBlobContainer.move(tempBlobName, blobName); | ||
} catch (IOException ex) { | ||
// Move failed - try cleaning up | ||
snapshotsBlobContainer.deleteBlob(tempBlobName); | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe readd comment here (temp file creation or move failed - clean up)
maybeIOExceptionOrBlock(targetBlob); | ||
super.move(sourceBlob, targetBlob); | ||
super.writeBlob(targetBlob, readBlob(sourceBlob), 0L); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super.readBlock
instead of readBlock
to prevent double maybeIOExceptionOrBlock
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
logger.info("--> creating repository"); | ||
final Path repoPath = randomRepoPath(); | ||
assertAcked(client().admin().cluster().preparePutRepository("test-repo").setType("mock").setVerify(false).setSettings( | ||
Settings.builder().put("location", repoPath).put("random_control_io_exception_rate", 0.2))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add some randomization to the random_control_io_exception_rate?
assertThat(shardFailure.reason(), containsString("Random IOException")); | ||
} | ||
} | ||
} catch (Exception ex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we catch a more specific exception type? IOException with the Random IOException
string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue here is that either SnapshotCreationException
or RepositoryException
can be thrown, but what we can do here is ensure the stack trace has the Random IOException
in the nested stack trace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be specified as catch (SnapshotCreationException | RepositoryException ex)
in Java ;-)
@ywelsch I pushed a commit to address your latest feedback. Thanks for reviewing! |
…otting (#21469) Ensures pending index-* blobs are deleted when snapshotting. The index-* blobs are generational files that maintain the snapshots in the repository. To write these atomically, we first write a `pending-index-*` blob, then move it to `index-*`, which also deletes `pending-index-*` in case its not a file-system level move (e.g. S3 repositories) . For example, to write the 5th generation of the index blob for the repository, we would first write the bytes to `pending-index-5` and then move `pending-index-5` to `index-5`. It is possible that we fail after writing `pending-index-5`, but before moving it to `index-5` or deleting `pending-index-5`. In this case, we will have a dangling `pending-index-5` blob laying around. Since snapshot #5 would have failed, the next snapshot assumes a generation number of 5, so it tries to write to `index-5`, which first tries to write to `pending-index-5` before moving the blob to `index-5`. Since `pending-index-5` is leftover from the previous failure, the snapshot fails as it cannot overwrite this blob. This commit solves the problem by first, adding a UUID to the `pending-index-*` blobs, and secondly, strengthen the logic around failure to write the `index-*` generational blob to ensure pending files are deleted on cleanup. Closes #21462
…otting (#21469) Ensures pending index-* blobs are deleted when snapshotting. The index-* blobs are generational files that maintain the snapshots in the repository. To write these atomically, we first write a `pending-index-*` blob, then move it to `index-*`, which also deletes `pending-index-*` in case its not a file-system level move (e.g. S3 repositories) . For example, to write the 5th generation of the index blob for the repository, we would first write the bytes to `pending-index-5` and then move `pending-index-5` to `index-5`. It is possible that we fail after writing `pending-index-5`, but before moving it to `index-5` or deleting `pending-index-5`. In this case, we will have a dangling `pending-index-5` blob laying around. Since snapshot #5 would have failed, the next snapshot assumes a generation number of 5, so it tries to write to `index-5`, which first tries to write to `pending-index-5` before moving the blob to `index-5`. Since `pending-index-5` is leftover from the previous failure, the snapshot fails as it cannot overwrite this blob. This commit solves the problem by first, adding a UUID to the `pending-index-*` blobs, and secondly, strengthen the logic around failure to write the `index-*` generational blob to ensure pending files are deleted on cleanup. Closes #21462
maybeIOExceptionOrBlock(targetBlob); | ||
super.move(sourceBlob, targetBlob); | ||
super.writeBlob(targetBlob, super.readBlob(sourceBlob), 0L); | ||
super.deleteBlob(sourceBlob); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, forgot to add here that we could randomize between atomic and non-atomic move.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed adb7aad to randomize between atomic and non-atomic
* master: ShardActiveResponseHandler shouldn't hold to an entire cluster state Ensures cleanup of temporary index-* generational blobs during snapshotting (#21469) Remove (again) test uses of onModule (#21414) [TEST] Add assertBusy when checking for pending operation counter after tests Revert "Add trace logging when aquiring and releasing operation locks for replication requests" Allows multiple patterns to be specified for index templates (#21009) [TEST] fixes rebalance single shard check as it isn't guaranteed that a rebalance makes sense and the method only tests if rebalance is allowed Document _reindex with random_score
This PR ensures pending
index-*
blobs are deleted when snapshotting. Theindex-*
blobs are generational files that maintain the snapshotsin the repository. To write these atomically, we first write a
pending-index-*
blob, then move it toindex-*
, which also deletespending-index-*
in case its not a file-system level move (e.g.S3 repositories) . For example, to write the 5th generation of the
index blob for the repository, we would first write the bytes to
pending-index-5
and then movepending-index-5
toindex-5
. It ispossible that we fail after writing
pending-index-5
, but beforemoving it to
index-5
or deletingpending-index-5
. In this case,we will have a dangling
pending-index-5
blob laying around. Sincesnapshot number 5 would have failed, the next snapshot assumes a generation
number of 5, so it tries to write to
index-5
, which first tries towrite to
pending-index-5
before moving the blob toindex-5
. Sincepending-index-5
is leftover from the previous failure, the snapshotfails as it cannot overwrite this blob.
This commit solves the problem by first, adding a UUID to the
pending-index-*
blobs, and secondly, strengthen the logic aroundfailure to write the
index-*
generational blob to ensure pendingfiles are deleted on cleanup.
Closes #21462