-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resume partial download from S3 on connection drop #46589
Merged
DaveCTurner
merged 14 commits into
elastic:master
from
DaveCTurner:2019-09-11-retry-s3-download-on-partial-content
Sep 17, 2019
Merged
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
9865c44
Resume partial download from S3 on connection drop
DaveCTurner 35fb904
Simplify
DaveCTurner ad43a77
inline noop
DaveCTurner ebd4e08
Suppress exceptions when closing
DaveCTurner f97cec8
Extract common handler
DaveCTurner 0e396dc
Merge branch 'master' into 2019-09-11-retry-s3-download-on-partial-co…
DaveCTurner 9a6f5c6
Add timebomb to ensure we remove this when no longer necessary
DaveCTurner efb2422
Count retries per blob not per read
DaveCTurner d9890c6
Ensure we do not use the stream after close
DaveCTurner 81f8a35
Make test helpers static and collect at bottom
DaveCTurner 410fa62
Unnecessary throws
DaveCTurner 22f5703
Include a bounded number of suppressed exceptions on failure
DaveCTurner 9604df1
Merge branch 'master' into 2019-09-11-retry-s3-download-on-partial-co…
DaveCTurner 3f8c20e
Review feedback
DaveCTurner File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
159 changes: 159 additions & 0 deletions
159
.../repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3RetryingInputStream.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,159 @@ | ||
/* | ||
* Licensed to Elasticsearch under one or more contributor | ||
* license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright | ||
* ownership. Elasticsearch licenses this file to you under | ||
* the Apache License, Version 2.0 (the "License"); you may | ||
* not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
package org.elasticsearch.repositories.s3; | ||
|
||
import com.amazonaws.AmazonClientException; | ||
import com.amazonaws.services.s3.model.AmazonS3Exception; | ||
import com.amazonaws.services.s3.model.GetObjectRequest; | ||
import com.amazonaws.services.s3.model.S3Object; | ||
import org.apache.logging.log4j.LogManager; | ||
import org.apache.logging.log4j.Logger; | ||
import org.apache.logging.log4j.message.ParameterizedMessage; | ||
import org.elasticsearch.core.internal.io.IOUtils; | ||
import org.elasticsearch.Version; | ||
|
||
import java.io.IOException; | ||
import java.io.InputStream; | ||
import java.nio.file.NoSuchFileException; | ||
import java.util.ArrayList; | ||
import java.util.List; | ||
|
||
/** | ||
* Wrapper around an S3 object that will retry the {@link GetObjectRequest} if the download fails part-way through, resuming from where | ||
* the failure occurred. This should be handled by the SDK but it isn't today. This should be revisited in the future (e.g. before removing | ||
* the {@link Version#V_7_0_0} version constant) and removed when the SDK handles retries itself. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1, otherwise we'll adding retries over retries |
||
* | ||
* See https://github.com/aws/aws-sdk-java/issues/856 for the related SDK issue | ||
*/ | ||
class S3RetryingInputStream extends InputStream { | ||
|
||
private static final Logger logger = LogManager.getLogger(S3RetryingInputStream.class); | ||
|
||
static final int MAX_SUPPRESSED_EXCEPTIONS = 10; | ||
|
||
private final S3BlobStore blobStore; | ||
private final String blobKey; | ||
private final int maxAttempts; | ||
|
||
private InputStream currentStream; | ||
private int attempt = 1; | ||
private List<IOException> failures = new ArrayList<>(MAX_SUPPRESSED_EXCEPTIONS); | ||
private long currentOffset; | ||
private boolean closed; | ||
|
||
S3RetryingInputStream(S3BlobStore blobStore, String blobKey) throws IOException { | ||
this.blobStore = blobStore; | ||
this.blobKey = blobKey; | ||
this.maxAttempts = blobStore.getMaxRetries() + 1; | ||
currentStream = openStream(); | ||
} | ||
|
||
private InputStream openStream() throws IOException { | ||
try (AmazonS3Reference clientReference = blobStore.clientReference()) { | ||
final GetObjectRequest getObjectRequest = new GetObjectRequest(blobStore.bucket(), blobKey); | ||
if (currentOffset > 0) { | ||
getObjectRequest.setRange(currentOffset); | ||
} | ||
final S3Object s3Object = SocketAccess.doPrivileged(() -> clientReference.client().getObject(getObjectRequest)); | ||
return s3Object.getObjectContent(); | ||
} catch (final AmazonClientException e) { | ||
if (e instanceof AmazonS3Exception) { | ||
if (404 == ((AmazonS3Exception) e).getStatusCode()) { | ||
throw addSuppressedExceptions(new NoSuchFileException("Blob object [" + blobKey + "] not found: " + e.getMessage())); | ||
} | ||
} | ||
throw addSuppressedExceptions(e); | ||
} | ||
} | ||
|
||
@Override | ||
public int read() throws IOException { | ||
ensureOpen(); | ||
while (true) { | ||
try { | ||
final int result = currentStream.read(); | ||
currentOffset += 1; | ||
return result; | ||
} catch (IOException e) { | ||
reopenStreamOrFail(e); | ||
} | ||
} | ||
} | ||
|
||
@Override | ||
public int read(byte[] b, int off, int len) throws IOException { | ||
ensureOpen(); | ||
while (true) { | ||
try { | ||
final int bytesRead = currentStream.read(b, off, len); | ||
if (bytesRead == -1) { | ||
return -1; | ||
} | ||
currentOffset += bytesRead; | ||
return bytesRead; | ||
} catch (IOException e) { | ||
reopenStreamOrFail(e); | ||
} | ||
} | ||
} | ||
|
||
private void ensureOpen() { | ||
if (closed) { | ||
assert false : "using S3RetryingInputStream after close"; | ||
throw new IllegalStateException("using S3RetryingInputStream after close"); | ||
} | ||
} | ||
|
||
private void reopenStreamOrFail(IOException e) throws IOException { | ||
if (attempt >= maxAttempts) { | ||
throw addSuppressedExceptions(e); | ||
} | ||
logger.debug(new ParameterizedMessage("failed reading [{}/{}] at offset [{}], attempt [{}] of [{}], retrying", | ||
blobStore.bucket(), blobKey, currentOffset, attempt, maxAttempts), e); | ||
attempt += 1; | ||
if (failures.size() < MAX_SUPPRESSED_EXCEPTIONS) { | ||
failures.add(e); | ||
} | ||
IOUtils.closeWhileHandlingException(currentStream); | ||
currentStream = openStream(); | ||
} | ||
|
||
@Override | ||
public void close() throws IOException { | ||
currentStream.close(); | ||
DaveCTurner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
closed = true; | ||
} | ||
|
||
@Override | ||
public long skip(long n) { | ||
throw new UnsupportedOperationException("S3RetryingInputStream does not support seeking"); | ||
} | ||
|
||
@Override | ||
public void reset() { | ||
throw new UnsupportedOperationException("S3RetryingInputStream does not support seeking"); | ||
} | ||
|
||
private <T extends Exception> T addSuppressedExceptions(T e) { | ||
for (IOException failure : failures) { | ||
e.addSuppressed(failure); | ||
} | ||
return e; | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you link to the corresponding open AWS SDK issue? i.e. aws/aws-sdk-java#856
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, done in 3f8c20e. I am not convinced that that's the whole issue, because the problem we were chasing was to do with S3 actively closing the connection rather than a network timeout, but there doesn't seem to be an issue for that.