-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow download performance for Storage API #3929
Comments
It seems that a new API request is issued to fill each byte buffer for the channel reader (com.google.cloud.storage.BlobReadChannel#read). The byte buffer argument size is 2MB. The client is issuing ~500 service requests to download this file (~1GB). The time difference between the two clients is ~60s. This puts the overhead at ~120ms per request. I was able to use a 50MB byte buffer to get closer to the legacy client (~33MB/s) but I'm not sure that large of a byte buffer is healthy for our system considering the number of concurrent downloads we handle. Is the need for individual service requests to fulfill retry requirements? |
Just as a concept, I created a quick and dirty downloader that uses the internal storage client's executeMediaAndDownloadTo method with retry support. It adjusts the mediaDownloader to the last good byte offset written to the output stream:
It's not well implemented but it displays the concept. |
@mcantrell , that is a good example, and I checked that the performance can be different between legacy client and the new API from google-cloud-java, but there is the reason why method downloadTo was added Blob.java issue #2107 The method uses new Channel API Line 213 in 2aa9d80
Your approach adds a dependency to legacy client com.google.apis:google-api-services that maybe is not a good option. And what is not clear to me that even BlobDownloader or BlobDownloadHelper will happen to exist, where shall I put the class? I would suggest hearing something from @frankyn or @garrettjonesgoogle |
The approach is tied to the legacy client but from what I can tell, so is everything else. For instance, the Reader created for the downloadTo method uses the legacy client:
I'm not sure the code referenced is really meant to be implemented exactly as is. It was just an example to demonstrate that you don't need to issue so many service requests to achieve a resumable download. |
The approach is tied to an obsolete client, but the performance of the new method downloadToPathWithMediaHttpDownloader |
I would suggest creating a method that is decoupled from java.nio.file.Path. I would assume that a lot of users (myself included) would not have a file to write to. For example, you may want to stream the data from the storage API to a browser via HTTP response. e.g.
or
|
the change to OutputStream is reasonable, but building com.google.api.services.storage.Storage object from Blob may not be a good idea. |
@frankyn, it turned out that adding the new option USE_DIRECT_DOWNLOAD and setting MediaHttpDownloader directDownloadEnable to true/false depends on the option does not change the performance. The throughput still ~10Mb/s when the legacy client showed 40Mb/s. At the same time, if I increase buffer allocating size at the 'downloadTo' method ByteBuffer bytes = ByteBuffer.allocate(DEFAULT_CHUNK_SIZE); , it improves the performance up to 3x time on files of size 1Gb. What if instead, we add the next parameter to the method 'downloadTo' to give the user a chance to set the buffer size on his own? |
I think that sounds like a better idea. Thanks for digging into @andrey-qlogic! |
The PR makes performance the same as legacy client |
The new version of the storage client (com.google.cloud:google-cloud-storage:1.52.0) appears download storage content at a MUCH slower rate than the legacy client (com.google.apis:google-api-services-storage:v1-rev141-1.25.0).
Legacy client (~40MB/s):
New client (~10MB/s):
I'm attaching a couple of test cases that I've ran from a GCE instance (Ubuntu 16.04 with Java 1.8.0_191):
storage-performance-legacy.zip
storage-performance-new.zip
The text was updated successfully, but these errors were encountered: