-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3929 Slow download performance for Storage API. Added new downloadToPathWithMediaHttpDownloader method with better performance. #4337
Conversation
@frankyn |
@andrey-qlogic, can you please add a descriptive title? |
I would suggest creating a method that is decoupled from java.nio.file.Path. I would assume that a lot of users (myself included) would not have a file to write to. For example, you may want to stream the data from the storage API to a browser via HTTP response
and/or
|
@mcantrell
|
Sorry, I don't know the internals in detail but wouldn't the blob's storage object contain the transport options? Building the client like this?
|
I suppose that another alternative to using the media downloader is to do something similar to the BlobReaderChannel. You can use this.storage.getOptions().getStorageRpcV1().read() to fetch the required bytes using the CountingOutputStream's offset and bytes remaining. edit: never mind, that won't work. it returns a byte array. Not sure what I was thinking here. Maybe I hadn't had enough coffee yet :) |
the change to OutputStream is reasonable, but building com.google.api.services.storage.Storage object from Blob may not be a good idea. |
} catch (IOException e) { | ||
throw new StorageException(e); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrey-qlogic, I appreciate your patience.
I'd recommend passing through an additional option (USE_DIRECT_DOWNLOAD) to downloadTo()
if using getMediaHttpDownloader
is considered a breaking change:
Then in the underlying RPC class handle the request in
HttpStorageRpc.getCall() which called by HttpStorageRpc.get().
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added USE_DIRECT_DOWNLOAD to set directDownloadEnabled for MediaHttpDownloader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @andrey-qlogic, I left a few comments. Also have you benchmarked this change to compare?
google-cloud-clients/google-cloud-storage/src/main/java/com/google/cloud/storage/Blob.java
Outdated
Show resolved
Hide resolved
google-cloud-clients/google-cloud-storage/src/test/java/com/google/cloud/storage/BlobTest.java
Outdated
Show resolved
Hide resolved
google-cloud-clients/google-cloud-storage/src/main/java/com/google/cloud/storage/Blob.java
Outdated
Show resolved
Hide resolved
I still need more benchmarking |
@frankyn, it turned out that adding the new option USE_DIRECT_DOWNLOAD and setting MediaHttpDownloader directDownloadEnable to true/false depends on the option does not change the performance. The throughput still ~10Mb/s when the legacy client showed 40Mb/s. At the same time, if I increase buffer allocating size 100x times at the 'downloadTo' method Line 217 in 9c47e1b
|
The increased byte buffer size is kind of a band-aid. To achieve reasonable download speeds, I'll have to limit the number of concurrent downloads that can be achieved. We've moved from a speed problem to a memory problem. |
This fell through my email. Apologies for the delay. @mcantrell this is a trade-off between the two. IIUC what you were asking for was removing the need to make multiple GET requests to a single GET request to mitigate the overhead. Is this correct? |
Thanks @andrey-qlogic, given the new helper method |
...d-clients/google-cloud-storage/src/main/java/com/google/cloud/storage/spi/v1/StorageRpc.java
Outdated
Show resolved
Hide resolved
google-cloud-clients/google-cloud-storage/src/main/java/com/google/cloud/storage/Blob.java
Outdated
Show resolved
Hide resolved
Not exactly. I wanted reasonable performance compared to the deprecated client. I think it's a mistake to trade download performance for memory performance. This is a huge downgrade for us. The multiple requests appears to be the cause. I would suggest again that resumable downloads is a better pattern for retries than what is currently implemented. |
Let's step back and talk about the practical implication of the proposed fix. To achieve reasonable download performance, I need 10x the memory. That means that for every 10 concurrent downloads I could handle before, I can now only handle 1. I'll need 10x the compute engines to handle out peak traffic. |
Sorry, I missed the comment regarding commit. Just to be clear (there are a lot of threads going on here), we're talking about using a resume instead of increased buffer size to fix the issue? |
Codecov Report
@@ Coverage Diff @@
## master #4337 +/- ##
============================================
- Coverage 49.15% 49.15% -0.01%
- Complexity 21934 21936 +2
============================================
Files 2077 2077
Lines 207174 207216 +42
Branches 24099 24100 +1
============================================
+ Hits 101841 101857 +16
- Misses 97160 97186 +26
Partials 8173 8173
Continue to review full report at Codecov.
|
@andrey-qlogic can you please answer this comment: #4337 (comment) |
gentle ping |
@JustinBeckwith, @sduskis, @mcantrell . That is correct, the PR is about using a resumable download with Http MediaDownloader instead of increase buffer size to fix the issue. |
@sduskis , PTAL |
@mcantrell, can you PTAL? Your question was answered. @JesseLovelace or @frankyn: would you be able to review this? |
@JesseLovelace or @frankyn: would you be able to review this? |
@frankyn, this PR is stale. I'm closing it. Please reach out offline if you would like to restore this PR. |
Hey there! Any news? I see this was closed like one month ago and I'm wondering whether this is addressed on another PR or there are plans to fix this at all... |
@franDiazBitmover we closed this pr without merging it. We will have a new pr early next week addressing this issue. |
Fixes #3929 Added new downloadToPathWithMediaHttpDownloader method with better performance.