Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gcp,s3,azure: make the storage client upload chunk size configurable #80668

Merged
merged 1 commit into from
May 3, 2022

Conversation

adityamaru
Copy link
Contributor

@adityamaru adityamaru commented Apr 27, 2022

This change adds a cloudstorage.write_chunk_size cluster setting
that allows us to control the size of the chunks buffered by the
cloud storage client when uploading a file to storage. The setting defaults to
8MiB.

Prior to this change gcs used a 16MB buffer, s3 a 5MB buffer, and azure a 4MB
buffer. A follow up change will add memory monitoring to each external storage
writer to account for these buffered chunks during upload.

This change was motivated by the fact that in google-cloud-storage
SDK versions prior to v1.21.0 every chunk is given a hardcoded
timeout of 32s to successfully upload to storage. This includes retries
due to transient errors. If any chunk during a backup were to hit this
timeout the entire backup would fail. We have additional work to do
to make the job more resilient to such failures, but dropping the default
chunk size might mean we see fewer chunks hit their timeouts.

Release note: None

@adityamaru adityamaru requested review from dt, stevendanna and a team April 27, 2022 21:52
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@dt
Copy link
Member

dt commented Apr 27, 2022

I wonder if it makes sense to be per-provider, or if we should just have a "cloudstorage.write_chunk_size" or something that everyone reads? I'm not really sure either way

@adityamaru
Copy link
Contributor Author

Good point, S3 and Azure both expose configurable chunk sizes. S3 defaults to 5MB, Azure to 4MB let's tie them all to this shared setting.

@adityamaru adityamaru changed the title gcp: make the GCS chunked upload chunk size configurable gcp,s3,azure: make the storage client upload chunk size configurable Apr 28, 2022
@adityamaru
Copy link
Contributor Author

Changed to a single, shared cluster setting. I've not marked the setting as public since we don't expect users to change this.

This change adds a `cloudstorage.write_chunk_size` cluster setting
that allows us to control the size of the chunks buffered by the
cloud storage client when uploading a file to storage. The setting defaults to
8MiB.

Prior to this change gcs used a 16MB buffer, s3 a 5MB buffer, and azure a 4MB
buffer. A follow up change will add memory monitoring to each external storage
writer to account for these buffered chunks during upload.

This change was motivated by the fact that in google-cloud-storage
SDK versions prior to v1.21.0 every chunk is given a hardcoded
timeout of 32s to successfully upload to storage. This includes retries
due to transient errors. If any chunk during a backup were to hit this
timeout the entire backup would fail. We have additional work to do
to make the job more resilient to such failures, but dropping the default
chunk size might mean we see fewer chunks hit their timeouts.

Release note: None
@adityamaru
Copy link
Contributor Author

friendly ping!

@adityamaru
Copy link
Contributor Author

TFTR!

bors r=dt

@craig
Copy link
Contributor

craig bot commented May 3, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants