Skip to content
This repository has been archived by the owner on Dec 3, 2023. It is now read-only.

Increase DEFAULT_CHUNK_SIZE to reduce transfer overhead, increase throughput #86

Closed
domZippilli opened this issue Dec 4, 2019 · 0 comments · Fixed by #87
Closed

Increase DEFAULT_CHUNK_SIZE to reduce transfer overhead, increase throughput #86

domZippilli opened this issue Dec 4, 2019 · 0 comments · Fixed by #87
Assignees
Labels
type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@domZippilli
Copy link
Contributor

domZippilli commented Dec 4, 2019

Is your feature request related to a problem? Please describe.
In working with GCS customers, I've observed that the Java client library for storage tends to have poor single-threaded, single-stream upload throughput. The issue presented itself a lot like this one, but for uploads instead of downloads. After debugging it for a while, I found that overriding the DEFAULT_CHUNK_SIZE value in the storage WriteChannel with the setChunkSize method to a larger value significantly improved transfer times.

In fact, I put a customer's (idiomatic) code in a test harness and experimented with different chunk sizes to find out if there was an inflection point where larger chunks don't pay off (there is) and what that point might be, as it would seem to be optimal for minimizing memory usage and retry cost and transfer throughput. I can provide raw data upon request, but here's the graph of transfer times of 1GB with different chunk sizes (lower Y value is better).

image (5)

This test was run from a GCE VM with a same-region Regional bucket. It seems that beyond 15MB, little is gained. Between 1, 2, 5, and 10 MB, a lot is gained. The gain between 10MB and 15MB is smaller, but 15MB is definitely where improvements for throughput stop.

Describe the solution you'd like
Increase the DEFAULT_CHUNK_SIZE to 15MB (or MiB, since that's what it uses) in BaseWriteChannel.

A larger chunk size would also be consistent with the way GSUtil handles uploads to GCS, with 100MB chunks, heavily optimized for transfer speed.

Describe alternatives you've considered
The alternative is the status quo, in which customers should override this value unless they are optimizing heavily for low memory / low retry cost over throughput. I think this optimization is less common than optimization for transfer speed, and regardless, at 10-15MiB a good balance among all three considerations is achieved - especially considering the advancements in mobile computing power and WANs since we last set this value >4 years ago. A balance between likely optimizations is a good place for a default.

Additional context
Disclaimer: I am a Googler, and some amount of discussion on the matter occurred internally, with @frankyn suggesting this approach to me. All the relevant contents of that discussion are presented here.

@chingor13 chingor13 added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Dec 4, 2019
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Feb 26, 2020
Upgrading the GCS SDK to the most recent version.
Adjusting (i.e. improving) the REST mock accordingly.
This should significantly boost performance by pulling in
googleapis/java-core#86 in some cases.
original-brownbear added a commit to elastic/elasticsearch that referenced this issue Mar 5, 2020
Upgrading the GCS SDK to the most recent version.
Adjusting (i.e. improving) the REST mock accordingly.
This should significantly boost performance by pulling in
googleapis/java-core#86 in some cases.
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Mar 5, 2020
Upgrading the GCS SDK to the most recent version.
Adjusting (i.e. improving) the REST mock accordingly.
This should significantly boost performance by pulling in
googleapis/java-core#86 in some cases.
original-brownbear added a commit to elastic/elasticsearch that referenced this issue Mar 5, 2020
Upgrading the GCS SDK to the most recent version.
Adjusting (i.e. improving) the REST mock accordingly.
This should significantly boost performance by pulling in
googleapis/java-core#86 in some cases.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants