No retries on network timeouts S3 InputStream #856

phraktle · 2016-09-26T20:58:59Z

It appears that there are no retries attempted when there's a network timeout on the underlying HTTP connection while reading the InputStream from S3Object#getObjectContent. It should instead transparently reconnect (as per the retry policy) and continue from the last byte's position.

Stack trace

Caused by: java.net.SocketTimeoutException: Read timed out 
at java.net.SocketInputStream.socketRead0(Native Method) 
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) 
at java.net.SocketInputStream.read(SocketInputStream.java:170) 
at java.net.SocketInputStream.read(SocketInputStream.java:141) 
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) 
at sun.security.ssl.InputRecord.read(InputRecord.java:503) 
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973) 
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930) 
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) 
at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:139) 
at org.apache.http.impl.io.SessionInputBufferImpl.read(SessionInputBufferImpl.java:200) 
at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:178) 
at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137) 
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72) 
at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151) 
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72) 
at com.amazonaws.services.s3.model.S3ObjectInputStream.read(S3ObjectInputStream.java:155) 
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72) 
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72) 
at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151) 
at java.security.DigestInputStream.read(DigestInputStream.java:161) 
at com.amazonaws.services.s3.internal.DigestValidationInputStream.read(DigestValidationInputStream.java:59) 
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72) 
at com.amazonaws.services.s3.model.S3ObjectInputStream.read(S3ObjectInputStream.java:155) 
at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:238) 
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) 
at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117) 
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) 
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) 
at java.io.BufferedInputStream.read(BufferedInputStream.java:345) 
...

The text was updated successfully, but these errors were encountered:

shorea · 2016-09-26T21:33:10Z

This is probably not something we'd consider taking on until the next major version bump as it is a big departure from what we do today. The retry policy for all streaming operations do not apply while reading the content because we've passed control back to the caller already. Presumably we could retry transparently by capturing a reference to the client in a special input stream and on calls to read, catch the IO exception and make another ranged GET starting from the last successful byte.

The transfer manager utility has some more robust retry and resume behavior, would that meet your needs for now?

phraktle · 2016-09-27T08:04:25Z

Hi @shorea,

TransferManager does not allow streaming and requires a temporary file, which is not desirable in our use case. Since there's already a S3ObjectInputStream wrapping the stream, it doesn't sound like a stretch to imagine that it should internally reconnect.

Regards,
Viktor

shorea · 2016-09-27T14:48:38Z

Yeah I think it's definitely possible and makes a lot of sense to honor the retry policy even for streaming operations but I don't think we can add it to the SDK without a major version bump due to the performance implications.

stevematyas · 2016-11-04T23:30:04Z

Using final S3Object s3Object = s3Client.getObject(bucketName, keyName); I believe I got the same error as well.

Caused by: java.net.SocketTimeoutException: Read timed out
dataservice-app_1             |     at java.net.SocketInputStream.socketRead0(Native Method)
dataservice-app_1             |     at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
dataservice-app_1             |     at java.net.SocketInputStream.read(SocketInputStream.java:170)
dataservice-app_1             |     at java.net.SocketInputStream.read(SocketInputStream.java:141)
dataservice-app_1             |     at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
dataservice-app_1             |     at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:593)
dataservice-app_1             |     at sun.security.ssl.InputRecord.read(InputRecord.java:532)
dataservice-app_1             |     at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
dataservice-app_1             |     at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930)
dataservice-app_1             |     at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
dataservice-app_1             |     at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
dataservice-app_1             |     at org.apache.http.impl.io.SessionInputBufferImpl.read(SessionInputBufferImpl.java:198)
dataservice-app_1             |     at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:176)
dataservice-app_1             |     at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137)
dataservice-app_1             |     at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
dataservice-app_1             |     at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151)
dataservice-app_1             |     at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
dataservice-app_1             |     at com.amazonaws.services.s3.model.S3ObjectInputStream.read(S3ObjectInputStream.java:155)
dataservice-app_1             |     at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
dataservice-app_1             |     at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
dataservice-app_1             |     at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151)
dataservice-app_1             |     at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
dataservice-app_1             |     at com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:108)
dataservice-app_1             |     at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
dataservice-app_1             |     at com.amazonaws.services.s3.model.S3ObjectInputStream.read(S3ObjectInputStream.java:155)
dataservice-app_1             |     at com.amazonaws.services.s3.model.S3ObjectInputStream.read(S3ObjectInputStream.java:147)
dataservice-app_1             |     at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1792)
dataservice-app_1             |     at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1769)

Using aws-java-sdk-s3:1.11.18 here.

Introducing a new method that accepts a OutputStream would be so great, instead of File, as streaming is a much desired use-case -- TransferManger::download (GetObjectRequest, OutputStream): Download or TransferManger::download (GetObjectRequest, InputStream): Download

Also, @shorea it'd be great if any retry examples, PR, existed before official rollout within SDK -- #893! I have all objects have stored using multi-part upload (5MB or greater partsize).

stevematyas · 2016-11-04T23:31:50Z

@phraktle : Did you come up with a work-around?

OrigamiMarie · 2016-12-08T20:36:14Z

Is this lack of retries the cause of the error I have been getting very frequently while streaming data from S3 to an EC2 instance in a VPC? I really don't want to download these files (I don't want to deal with the disk at all -- and streaming seems like it ought to work). But the error rate when downloading files is increasing dramatically, and it's a big operational pain. The failure happens at random places in the files (when I retry, the same file will often fail again, but at a different place).

Stack trace:
com.amazonaws.SdkClientException: Data read has a different length than the expected: dataLength=122569353; expectedLength=664918217; includeSkipped=true; in.getClass()=class com.amazonaws.services.s3.AmazonS3Client$2; markedSupported=false; marked=0; resetSinceLastMarked=false; markCount=0; resetCount=0
at com.amazonaws.util.LengthCheckInputStream.checkLength(LengthCheckInputStream.java:152) ~[file-dapi-importer.jar!/:0.0.1]
at com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:110) ~[file-dapi-importer.jar!/:0.0.1]
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72) ~[file-dapi-importer.jar!/:0.0.1]
at com.amazonaws.services.s3.model.S3ObjectInputStream.read(S3ObjectInputStream.java:155) ~[file-dapi-importer.jar!/:0.0.1]
at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:238) ~[?:1.8.0_66]
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) ~[?:1.8.0_66]
at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117) ~[?:1.8.0_66]
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) ~[?:1.8.0_66]
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) ~[?:1.8.0_66]
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) ~[?:1.8.0_66]
at java.io.InputStreamReader.read(InputStreamReader.java:184) ~[?:1.8.0_66]
at java.io.BufferedReader.fill(BufferedReader.java:161) ~[?:1.8.0_66]
at java.io.BufferedReader.readLine(BufferedReader.java:324) ~[?:1.8.0_66]
at java.io.BufferedReader.readLine(BufferedReader.java:389) ~[?:1.8.0_66]

dagnir · 2017-01-25T18:11:29Z

Hi @OrigamiMarie, sorry to hear you're having issues. We do have #893 in our backlog, which is to allow downloading to a InputStream using TransferManager. We are actively looking at ways to support it and hope to deliver it soon!

steveloughran · 2018-07-30T23:25:50Z

If anyone ever does add transparent retries to failures in input stream reads, can I, as a representative of the Hadoop team who maintain the S3A connector, have a way to turn this off? Because we do our own reconnect logic and think we've got it under control (now), and having something underneath trying to be helpful might be a regression. Happy to discuss what could be done here, including what exceptions should be treated as recoverable...

coolpistachio · 2018-09-21T20:18:17Z

There's a similar problem when the underlying s3 client fails the getObject call in retryableS3DownloadTask.getS3ObjectStream(). The failed call is not retried and the whole parallel download fails.

https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/ServiceUtils.java#L397

@dagnir Is there any workaround other than catching the exceptions from the TransferManager and retrying the whole download?

millems · 2020-03-31T19:26:58Z

V2 supports retrying streaming operations using the retry policy, if the proper API is used. We do not intend to make this change in 1.11.x.

electrum · 2024-03-22T18:34:58Z

@millems Can you give an example or link to some documentation on how to this properly with V2? The InputStream returned from S3Client.getObject() doesn't seem to handle retry for socket exceptions during read operations.

millems · 2024-03-22T18:50:06Z

@electrum In 2.x, you can use the response transformer abstraction to allow retrying failures that occur while reading the response:

s3.getObject(r -> r.bucket("bucket").key("key"), (response, inputStream) -> {
    try {
        // Do something with the stream. 
        IoUtils.copy(inputStream, System.out);
        return null;
    } catch (IOException e) {
        throw RetryableException.create("Failed to read from input stream.", e);
    }
});

Note that the response transformer can be called multiple times, once for each retry. It's a new input stream each time, so it will start back at the beginning of the object.

dagnir added the waiting-reply label Sep 26, 2016

dagnir added feature-request A feature should be added or improved. and removed waiting-reply labels Sep 28, 2016

stevematyas mentioned this issue Nov 4, 2016

S3 TransferManager Should Allow Downloading to Stream #893

Closed

shorea assigned dagnir Jan 25, 2017

dagnir mentioned this issue Oct 19, 2017

Download Larges File from S3 #1352

Closed

ywelsch mentioned this issue Sep 17, 2019

Resume partial download from S3 on connection drop elastic/elasticsearch#46589

Merged

millems closed this as completed Mar 31, 2020

fixmebot bot referenced this issue in VectorXz/elasticsearch Apr 22, 2021

Create TestFixMe.md

a9fae03

fixmebot bot referenced this issue in VectorXz/elasticsearch May 28, 2021

Create Helloworld.md

1398a04

fixmebot bot referenced this issue in VectorXz/elasticsearch Aug 4, 2021

Update Helloworld.md

f68abab

mfussenegger mentioned this issue Feb 26, 2024

Integrate open-search google cloud plugin retry logic crate/crate#15598

Merged

5 tasks

nineinchnick mentioned this issue Jul 31, 2024

Retry on interrupted connections when reading from S3 in native FS library trinodb/trino#22895

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No retries on network timeouts S3 InputStream #856

No retries on network timeouts S3 InputStream #856

phraktle commented Sep 26, 2016

shorea commented Sep 26, 2016

phraktle commented Sep 27, 2016

shorea commented Sep 27, 2016

stevematyas commented Nov 4, 2016 •

edited

Loading

stevematyas commented Nov 4, 2016

OrigamiMarie commented Dec 8, 2016

dagnir commented Jan 25, 2017

steveloughran commented Jul 30, 2018

coolpistachio commented Sep 21, 2018

millems commented Mar 31, 2020

electrum commented Mar 22, 2024

millems commented Mar 22, 2024

No retries on network timeouts S3 InputStream #856

No retries on network timeouts S3 InputStream #856

Comments

phraktle commented Sep 26, 2016

shorea commented Sep 26, 2016

phraktle commented Sep 27, 2016

shorea commented Sep 27, 2016

stevematyas commented Nov 4, 2016 • edited Loading

stevematyas commented Nov 4, 2016

OrigamiMarie commented Dec 8, 2016

dagnir commented Jan 25, 2017

steveloughran commented Jul 30, 2018

coolpistachio commented Sep 21, 2018

millems commented Mar 31, 2020

electrum commented Mar 22, 2024

millems commented Mar 22, 2024

stevematyas commented Nov 4, 2016 •

edited

Loading