-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 TransferManager Should Allow Downloading to Stream #893
Comments
You can get at the stream by using the |
Hi @kiiadi thanks for your response. Yes it is true that the api for For my use case the consumer of my stream is going to be able to consume data much faster than a single thread downloads from S3, so I believe I will be able to get performance gains if multiple threads are downloading from S3 simultaneously and buffering, and then dumping their contents to a single stream sequentially as they complete. I tried to achieve this working within the existing API by passing a unix named pipe as the |
As you may already be aware Historically we've stuck with |
Ah yes, I did not know that at the time I wrote this issue, but noticed it while playing around with the Happy to contribute a Range request parallel downloader if theres no fundamental reason against it. We have written it ourselves several times within my company in different languages. |
We're somewhat limited by what S3 itself can support; as it works today S3 only supports multi-part download on objects that have been uploaded in multiple parts. Range is not currently supported. Unless I've misunderstood your proposal... |
Oops - @varunnvs92 just pointed out to me that you may have meant using the range property on the GetObjectRequest (which I admit I didn't know was there) - in which case this could work! A PR would be great! |
Ok great! I'm glad to hear we are not doing anything fundamentally wrong by using the Range property (overwhelming the API etc.) Will try to review the contributing guidelines and open a PR soon :) |
+1 (sooner the better or an unofficial work-around would be appreciated) @kiiadi : Completely understand the history, here: @kiiadi : From you earlier suggestion, community users are using |
I have a proof-of-concept implementation here: https://issues.apache.org/jira/browse/HADOOP-16132, with the PR being here: palantir/hadoop#47 |
@jeffquinn-nuna @stevematyas @justinuang The SDK team has reviewed the feature request list for V1, and since they're concentrating efforts on V2 new features they decided to not implement this one in V1. It's still being considered for the TransferManager refactor in V2, see the referenced issue above. I'll go ahead and close this. Please feel free to comment on the V2 tracking issue with your use case, and reach out if you have further questions. |
The current implementation of
TransferManager
only allows aFile
as the download destination. I think it would be helpful if there was an interface for supplying anOutputStream
as the final sink for the downloaded data. Overall it seems this would give greater flexibility in usage, and one could still just supply a FileOutputStream if they simply want to download to a file.The text was updated successfully, but these errors were encountered: