S3 TransferManager Should Allow Downloading to Stream #893

jeffquinn-nuna · 2016-10-15T18:28:32Z

The current implementation of TransferManager only allows a File as the download destination. I think it would be helpful if there was an interface for supplying an OutputStream as the final sink for the downloaded data. Overall it seems this would give greater flexibility in usage, and one could still just supply a FileOutputStream if they simply want to download to a file.

The text was updated successfully, but these errors were encountered:

kiiadi · 2016-10-15T18:41:59Z

You can get at the stream by using the AmazonS3Client.getObject method directly. This returns an S3Object which has a getObjectContent() method to get access to the stream. Does this solve your use-case?

jeffquinn-nuna · 2016-10-15T22:09:08Z

Hi @kiiadi thanks for your response. Yes it is true that the api for S3Object works well for my use case, however my understanding is that my download will always be single threaded if I use that api.

For my use case the consumer of my stream is going to be able to consume data much faster than a single thread downloads from S3, so I believe I will be able to get performance gains if multiple threads are downloading from S3 simultaneously and buffering, and then dumping their contents to a single stream sequentially as they complete.

I tried to achieve this working within the existing API by passing a unix named pipe as the file parameter to TransferManager, but this fails because TransferManager's design assumes a regular file.

kiiadi · 2016-10-17T22:48:41Z

As you may already be aware TransferManager will only parallelize downloads for S3 objects that were uploaded in multiple parts (which in turn will only happen if we know the content-length either through a File or an InputStream with content-length specified in the ObjectMetadata).

Historically we've stuck with File because it's easier for us to control the retry logic and also to parallelize uploads (see above). That said, we've had a few similar requests so I can add it as a feature request to our backlog.

jeffquinn-nuna · 2016-10-17T23:25:52Z

Ah yes, I did not know that at the time I wrote this issue, but noticed it while playing around with the TransferManager. I have seen a lot of libraries out in the wild that use Range requests to achieve better parallelism, and they seem to get good results. I believe the SDK does not provide this functionality, is there any reason why? Is there something inherently more robust about using the multipart approach vs Range requests? (Maybe it makes retries more robust, etc?)

Happy to contribute a Range request parallel downloader if theres no fundamental reason against it. We have written it ourselves several times within my company in different languages.

kiiadi · 2016-10-18T00:14:46Z

We're somewhat limited by what S3 itself can support; as it works today S3 only supports multi-part download on objects that have been uploaded in multiple parts. Range is not currently supported.

Unless I've misunderstood your proposal...

kiiadi · 2016-10-18T00:26:24Z

Oops - @varunnvs92 just pointed out to me that you may have meant using the range property on the GetObjectRequest (which I admit I didn't know was there) - in which case this could work! A PR would be great!

jeffquinn-nuna · 2016-10-18T00:33:55Z

Ok great! I'm glad to hear we are not doing anything fundamentally wrong by using the Range property (overwhelming the API etc.) Will try to review the contributing guidelines and open a PR soon :)

stevematyas · 2016-11-04T23:57:21Z

+1 (sooner the better or an unofficial work-around would be appreciated)

@kiiadi : Completely understand the history, here:

#893 (comment)

@kiiadi : From you earlier suggestion, community users are using AmazonS3Client.getObject to access the underlying stream (to achieve streaming support) and unfortunately due to #856 (java.net.SocketTimeoutException: Read timed out) we're forced to retry our large streams again -- a poor experience. And, storing a File is ill-advised in our use-case.

justinuang · 2019-02-21T16:13:16Z

I have a proof-of-concept implementation here: https://issues.apache.org/jira/browse/HADOOP-16132, with the PR being here: palantir/hadoop#47

debora-ito · 2019-08-12T22:56:45Z

@jeffquinn-nuna @stevematyas @justinuang

The SDK team has reviewed the feature request list for V1, and since they're concentrating efforts on V2 new features they decided to not implement this one in V1. It's still being considered for the TransferManager refactor in V2, see the referenced issue above. I'll go ahead and close this.

Please feel free to comment on the V2 tracking issue with your use case, and reach out if you have further questions.

kiiadi added the feature-request A feature should be added or improved. label Oct 17, 2016

stevematyas mentioned this issue Nov 4, 2016

No retries on network timeouts S3 InputStream #856

Closed

shorea assigned kiiadi Jan 25, 2017

spfink mentioned this issue Sep 13, 2017

Transfer Manager aws/aws-sdk-java-v2#37

Closed

pomadchin mentioned this issue Nov 2, 2017

Optimise S3 interaction locationtech/geotrellis#2466

Closed

debora-ito closed this as completed Aug 12, 2019

debora-ito unassigned kiiadi Sep 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3 TransferManager Should Allow Downloading to Stream #893

S3 TransferManager Should Allow Downloading to Stream #893

jeffquinn-nuna commented Oct 15, 2016

kiiadi commented Oct 15, 2016

jeffquinn-nuna commented Oct 15, 2016

kiiadi commented Oct 17, 2016

jeffquinn-nuna commented Oct 17, 2016

kiiadi commented Oct 18, 2016 •

edited

Loading

kiiadi commented Oct 18, 2016

jeffquinn-nuna commented Oct 18, 2016

stevematyas commented Nov 4, 2016 •

edited

Loading

justinuang commented Feb 21, 2019

debora-ito commented Aug 12, 2019

S3 TransferManager Should Allow Downloading to Stream #893

S3 TransferManager Should Allow Downloading to Stream #893

Comments

jeffquinn-nuna commented Oct 15, 2016

kiiadi commented Oct 15, 2016

jeffquinn-nuna commented Oct 15, 2016

kiiadi commented Oct 17, 2016

jeffquinn-nuna commented Oct 17, 2016

kiiadi commented Oct 18, 2016 • edited Loading

kiiadi commented Oct 18, 2016

jeffquinn-nuna commented Oct 18, 2016

stevematyas commented Nov 4, 2016 • edited Loading

justinuang commented Feb 21, 2019

debora-ito commented Aug 12, 2019

kiiadi commented Oct 18, 2016 •

edited

Loading

stevematyas commented Nov 4, 2016 •

edited

Loading