-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GCP: CP CLI poor download performance #435
Comments
Version: Performance of the downloading and uploading operations for the same amount of data before and after the changes is shown below.
Pipe cli integration tests were performed and no new failures were found. |
Version: After the long research several issues regarding the google cloud storage support in pipe cli were found. It turned out that Buffering sizeFirst of all simple download operation uses buffer of To increase the performance of the pipe cli the buffering size can be increased. As a simple heuristic a size of Connection resetsNevertheless, a simple replacement of the buffering size cannot be applied because there is a deeper problem in As a way to resolve the connection reset issue a resumable download mechanism can be introduced. |
Resumable downloading of the google storage blobs resolves connection reset issue and custom buffering increases an overall download performance. Both changes are described in details in the corresponding issue: #435.
…#475) * Add resumable downloads and custom buffering size for GCP blobs. Resumable downloading of the google storage blobs resolves connection reset issue and custom buffering increases an overall download performance. Both changes are described in details in the corresponding issue: #435. * Add checksum validation for GCP resumable downloads * Fix and refactor google storage downloading classes
Use buffering size as download chunk size in order to improve overall google storage blobs download performance. Also increase default download resume attempts to bypass the connection reset issue described in #435 for most of the possible cases.
Use buffering size as download chunk size in order to improve overall google storage blobs download performance. Also increase default download resume attempts to bypass the connection reset issue described in #435 for most of the possible cases.
Download performance looks reasonable now. Performance benchmarksVersion: Download
|
…epam#475) * Add resumable downloads and custom buffering size for GCP blobs. Resumable downloading of the google storage blobs resolves connection reset issue and custom buffering increases an overall download performance. Both changes are described in details in the corresponding issue: epam#435. * Add checksum validation for GCP resumable downloads * Fix and refactor google storage downloading classes
Version:
0.16.0.1477.64ff1d341960a21d1839a378253e963de023aebc
Originally cloud pipeline used simple download strategy for all file downloads as long as it is default in
google-cloud-storage
library. Later chucked download strategy was introduced in #253.It turned out that the chucked downloading strategy has a tremendous effect on the downloading performance and therefore has to be replaced with the original simple download strategy.
Example of the copying time with simple and chucked download strategies.
The text was updated successfully, but these errors were encountered: