Skip to content

Commit

Permalink
gcp,s3,azure: make the storage client upload chunk size configurable
Browse files Browse the repository at this point in the history
This change adds a `cloudstorage.write_chunk_size` cluster setting
that allows us to control the size of the chunks buffered by the
cloud storage client when uploading a file to storage. The setting defaults to
8MiB.

Prior to this change gcs used a 16MB buffer, s3 a 5MB buffer, and azure a 4MB
buffer. A follow up change will add memory monitoring to each external storage
writer to account for these buffered chunks during upload.

This change was motivated by the fact that in google-cloud-storage
SDK versions prior to v1.21.0 every chunk is given a hardcoded
timeout of 32s to successfully upload to storage. This includes retries
due to transient errors. If any chunk during a backup were to hit this
timeout the entire backup would fail. We have additional work to do
to make the job more resilient to such failures, but dropping the default
chunk size might mean we see fewer chunks hit their timeouts.

Release note: None
  • Loading branch information
adityamaru committed May 24, 2022
1 parent 9769856 commit 147bd0a
Show file tree
Hide file tree
Showing 4 changed files with 14 additions and 2 deletions.
4 changes: 3 additions & 1 deletion pkg/cloud/amazon/s3_storage.go
Original file line number Diff line number Diff line change
Expand Up @@ -352,7 +352,9 @@ func newClient(
sess.Config.Region = aws.String(region)

c := s3.New(sess)
u := s3manager.NewUploader(sess)
u := s3manager.NewUploader(sess, func(uploader *s3manager.Uploader) {
uploader.PartSize = cloud.WriteChunkSize.Get(&settings.SV)
})
return s3Client{client: c, uploader: u}, region, nil
}

Expand Down
2 changes: 1 addition & 1 deletion pkg/cloud/azure/azure_storage.go
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ func (s *azureStorage) Writer(ctx context.Context, basename string) (io.WriteClo
defer sp.Finish()
_, err := azblob.UploadStreamToBlockBlob(
ctx, r, blob, azblob.UploadStreamToBlockBlobOptions{
BufferSize: 4 << 20,
BufferSize: int(cloud.WriteChunkSize.Get(&s.settings.SV)),
},
)
return err
Expand Down
9 changes: 9 additions & 0 deletions pkg/cloud/cloud_io.go
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,15 @@ var httpCustomCA = settings.RegisterStringSetting(
"",
).WithPublic()

// WriteChunkSize is used to control the size of each chunk that is buffered and
// uploaded by the cloud storage client.
var WriteChunkSize = settings.RegisterByteSizeSetting(
settings.TenantWritable,
"cloudstorage.write_chunk.size",
"controls the size of each file chunk uploaded by the cloud storage client",
8<<20,
)

// HTTPRetryOptions defines the tunable settings which control the retry of HTTP
// operations.
var HTTPRetryOptions = retry.Options{
Expand Down
1 change: 1 addition & 0 deletions pkg/cloud/gcp/gcs_storage.go
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,7 @@ func (g *gcsStorage) Writer(ctx context.Context, basename string) (io.WriteClose
path.Join(g.prefix, basename))})

w := g.bucket.Object(path.Join(g.prefix, basename)).NewWriter(ctx)
w.ChunkSize = int(cloud.WriteChunkSize.Get(&g.settings.SV))
if !gcsChunkingEnabled.Get(&g.settings.SV) {
w.ChunkSize = 0
}
Expand Down

0 comments on commit 147bd0a

Please sign in to comment.