Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3929 Slow download performance for Storage API. Added new downloadToPathWithMediaHttpDownloader method with better performance. #4337

Closed
wants to merge 13 commits into from

Conversation

andrey-qlogic
Copy link

@andrey-qlogic andrey-qlogic commented Jan 14, 2019

Fixes #3929 Added new downloadToPathWithMediaHttpDownloader method with better performance.

@andrey-qlogic andrey-qlogic requested a review from a team as a code owner January 14, 2019 18:48
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jan 14, 2019
@andrey-qlogic
Copy link
Author

andrey-qlogic commented Jan 14, 2019

@frankyn
The approach is tied to an obsolete client, but the performance of the new method downloadToPathWithMediaHttpDownloader
looks better than the other downloadTo method . I would suggest adding this approach to the user separately from another method. If it looks reasonable, then I will add also unit tests.

@sduskis sduskis added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jan 15, 2019
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jan 15, 2019
@sduskis
Copy link
Contributor

sduskis commented Jan 15, 2019

@andrey-qlogic, can you please add a descriptive title?

@andrey-qlogic andrey-qlogic changed the title 3929 3929 Added new downloadToPathWithMediaHttpDownloader method with better performance. Jan 15, 2019
@andrey-qlogic andrey-qlogic changed the title 3929 Added new downloadToPathWithMediaHttpDownloader method with better performance. 3929 Slow download performance for Storage API. Added new downloadToPathWithMediaHttpDownloader method with better performance. Jan 15, 2019
@ajaaym ajaaym added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jan 15, 2019
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jan 15, 2019
@mcantrell
Copy link

I would suggest creating a method that is decoupled from java.nio.file.Path. I would assume that a lot of users (myself included) would not have a file to write to. For example, you may want to stream the data from the storage API to a browser via HTTP response

    downloadWithMediaHttpDownloaderTo(OutputStream out)

and/or

    downloadWithMediaHttpDownloaderTo(WritableChannel out)

@andrey-qlogic
Copy link
Author

@mcantrell
Changing to downloadWithMediaHttpDownloaderTo(OutputStream out, ...) is more appreciate way than using WritableChannel.
But the definition of the next parameter becomes complicated for User we have either build com.google.api.services.storage.Storage with HttpStorageOptions that may not available in StorageOptions for getting MediaHttpDownloader.We have to make the second parameter com.google.api.services.storage.Storage and make the user responsible for building it the right way. I would appreciate your input here.

public void downloadToPathWithMediaHttpDownloader(Path path, com.google.api.services.storage.Storage storage)

@mcantrell
Copy link

Sorry, I don't know the internals in detail but wouldn't the blob's storage object contain the transport options? Building the client like this?

  public void downloadWithMediaHttpDownloader(OutputStream targetOut) {
    com.google.api.services.storage.Storage storage = buildStorage(this.storage.getOptions());

@mcantrell
Copy link

mcantrell commented Jan 17, 2019

I suppose that another alternative to using the media downloader is to do something similar to the BlobReaderChannel. You can use this.storage.getOptions().getStorageRpcV1().read() to fetch the required bytes using the CountingOutputStream's offset and bytes remaining.

edit: never mind, that won't work. it returns a byte array. Not sure what I was thinking here. Maybe I hadn't had enough coffee yet :)

@andrey-qlogic
Copy link
Author

Sorry, I don't know the internals in detail but wouldn't the blob's storage object contain the transport options? Building the client like this?

  public void downloadWithMediaHttpDownloader(OutputStream targetOut) {
    com.google.api.services.storage.Storage storage = buildStorage(this.storage.getOptions());

the change to OutputStream is reasonable, but building com.google.api.services.storage.Storage object from Blob may not be a good idea.

} catch (IOException e) {
throw new StorageException(e);
}
}
Copy link
Member

@frankyn frankyn Jan 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrey-qlogic, I appreciate your patience.

I'd recommend passing through an additional option (USE_DIRECT_DOWNLOAD) to downloadTo() if using getMediaHttpDownloader is considered a breaking change:

Then in the underlying RPC class handle the request in
HttpStorageRpc.getCall() which called by HttpStorageRpc.get().

WDYT?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added USE_DIRECT_DOWNLOAD to set directDownloadEnabled for MediaHttpDownloader

@AVaksman AVaksman added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jan 24, 2019
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jan 24, 2019
Copy link
Member

@frankyn frankyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andrey-qlogic, I left a few comments. Also have you benchmarked this change to compare?

@andrey-qlogic
Copy link
Author

I still need more benchmarking

@andrey-qlogic
Copy link
Author

andrey-qlogic commented Jan 25, 2019

@frankyn, it turned out that adding the new option USE_DIRECT_DOWNLOAD and setting MediaHttpDownloader directDownloadEnable to true/false depends on the option does not change the performance. The throughput still ~10Mb/s when the legacy client showed 40Mb/s. At the same time, if I increase buffer allocating size 100x times at the 'downloadTo' method

, it improves the performance up to 3x time on files of size 1Gb. What if instead, we add the next parameter to the method 'downloadTo' to give the user a chance to set the buffer size on his own?

@mcantrell
Copy link

The increased byte buffer size is kind of a band-aid. To achieve reasonable download speeds, I'll have to limit the number of concurrent downloads that can be achieved. We've moved from a speed problem to a memory problem.

@frankyn
Copy link
Member

frankyn commented Feb 5, 2019

This fell through my email. Apologies for the delay. @mcantrell this is a trade-off between the two.

IIUC what you were asking for was removing the need to make multiple GET requests to a single GET request to mitigate the overhead. Is this correct?

@frankyn
Copy link
Member

frankyn commented Feb 5, 2019

Thanks @andrey-qlogic, given the new helper method downloadTo(OutputStream) WDYT about removing the code around useDirectDownload? The issue I see is that there are two overloads with different download expectations.

@andrey-qlogic andrey-qlogic self-assigned this Feb 6, 2019
@mcantrell
Copy link

This fell through my email. Apologies for the delay. @mcantrell this is a trade-off between the two.

IIUC what you were asking for was removing the need to make multiple GET requests to a single GET request to mitigate the overhead. Is this correct?

Not exactly. I wanted reasonable performance compared to the deprecated client. I think it's a mistake to trade download performance for memory performance. This is a huge downgrade for us.

The multiple requests appears to be the cause. I would suggest again that resumable downloads is a better pattern for retries than what is currently implemented.

@mcantrell
Copy link

Let's step back and talk about the practical implication of the proposed fix. To achieve reasonable download performance, I need 10x the memory. That means that for every 10 concurrent downloads I could handle before, I can now only handle 1. I'll need 10x the compute engines to handle out peak traffic.

@mcantrell
Copy link

Sorry, I missed the comment regarding commit. Just to be clear (there are a lot of threads going on here), we're talking about using a resume instead of increased buffer size to fix the issue?

@JustinBeckwith JustinBeckwith added the 🚨 This issue needs some love. label Feb 7, 2019
@yoshi-automation yoshi-automation removed the 🚨 This issue needs some love. label Feb 7, 2019
@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Feb 7, 2019
@sduskis sduskis added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Feb 11, 2019
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Feb 11, 2019
@codecov
Copy link

codecov bot commented Feb 11, 2019

Codecov Report

Merging #4337 into master will decrease coverage by <.01%.
The diff coverage is 38.09%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #4337      +/-   ##
============================================
- Coverage     49.15%   49.15%   -0.01%     
- Complexity    21934    21936       +2     
============================================
  Files          2077     2077              
  Lines        207174   207216      +42     
  Branches      24099    24100       +1     
============================================
+ Hits         101841   101857      +16     
- Misses        97160    97186      +26     
  Partials       8173     8173
Impacted Files Coverage Δ Complexity Δ
...va/com/google/cloud/storage/spi/v1/StorageRpc.java 62.85% <ø> (ø) 0 <0> (ø) ⬇️
...om/google/cloud/storage/spi/v1/HttpStorageRpc.java 1.83% <0%> (-0.06%) 1 <0> (ø)
...ud/storage/contrib/nio/testing/FakeStorageRpc.java 63.36% <0%> (-0.32%) 40 <0> (ø)
...e/src/main/java/com/google/cloud/storage/Blob.java 82.32% <84.21%> (+0.22%) 32 <2> (+2) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 07ab941...c7a8938. Read the comment docs.

@sduskis
Copy link
Contributor

sduskis commented Feb 26, 2019

@andrey-qlogic can you please answer this comment: #4337 (comment)

@JustinBeckwith
Copy link
Contributor

gentle ping

@andrey-qlogic
Copy link
Author

@JustinBeckwith, @sduskis, @mcantrell . That is correct, the PR is about using a resumable download with Http MediaDownloader instead of increase buffer size to fix the issue.

@andrey-qlogic
Copy link
Author

@sduskis , PTAL

@sduskis
Copy link
Contributor

sduskis commented Mar 8, 2019

@mcantrell, can you PTAL? Your question was answered. @JesseLovelace or @frankyn: would you be able to review this?

@sduskis
Copy link
Contributor

sduskis commented Mar 19, 2019

@JesseLovelace or @frankyn: would you be able to review this?

@sduskis sduskis added the api: storage Issues related to the Cloud Storage API. label Mar 19, 2019
@sduskis
Copy link
Contributor

sduskis commented Apr 30, 2019

@frankyn, this PR is stale. I'm closing it. Please reach out offline if you would like to restore this PR.

@sduskis sduskis closed this Apr 30, 2019
@franDiazBitmover
Copy link

franDiazBitmover commented Jun 6, 2019

Hey there! Any news? I see this was closed like one month ago and I'm wondering whether this is addressed on another PR or there are plans to fix this at all...

@ajaaym
Copy link
Contributor

ajaaym commented Jun 6, 2019

@franDiazBitmover we closed this pr without merging it. We will have a new pr early next week addressing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API. cla: yes This human has signed the Contributor License Agreement. 🚨 This issue needs some love.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Slow download performance for Storage API