-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gcp,s3,azure: make the storage client upload chunk size configurable #80668
Conversation
I wonder if it makes sense to be per-provider, or if we should just have a "cloudstorage.write_chunk_size" or something that everyone reads? I'm not really sure either way |
Good point, S3 and Azure both expose configurable chunk sizes. S3 defaults to 5MB, Azure to 4MB let's tie them all to this shared setting. |
6fda95f
to
33d36a6
Compare
Changed to a single, shared cluster setting. I've not marked the setting as public since we don't expect users to change this. |
This change adds a `cloudstorage.write_chunk_size` cluster setting that allows us to control the size of the chunks buffered by the cloud storage client when uploading a file to storage. The setting defaults to 8MiB. Prior to this change gcs used a 16MB buffer, s3 a 5MB buffer, and azure a 4MB buffer. A follow up change will add memory monitoring to each external storage writer to account for these buffered chunks during upload. This change was motivated by the fact that in google-cloud-storage SDK versions prior to v1.21.0 every chunk is given a hardcoded timeout of 32s to successfully upload to storage. This includes retries due to transient errors. If any chunk during a backup were to hit this timeout the entire backup would fail. We have additional work to do to make the job more resilient to such failures, but dropping the default chunk size might mean we see fewer chunks hit their timeouts. Release note: None
33d36a6
to
c5abb54
Compare
friendly ping! |
TFTR! bors r=dt |
Build succeeded: |
This change adds a
cloudstorage.write_chunk_size
cluster settingthat allows us to control the size of the chunks buffered by the
cloud storage client when uploading a file to storage. The setting defaults to
8MiB.
Prior to this change gcs used a 16MB buffer, s3 a 5MB buffer, and azure a 4MB
buffer. A follow up change will add memory monitoring to each external storage
writer to account for these buffered chunks during upload.
This change was motivated by the fact that in google-cloud-storage
SDK versions prior to v1.21.0 every chunk is given a hardcoded
timeout of 32s to successfully upload to storage. This includes retries
due to transient errors. If any chunk during a backup were to hit this
timeout the entire backup would fail. We have additional work to do
to make the job more resilient to such failures, but dropping the default
chunk size might mean we see fewer chunks hit their timeouts.
Release note: None