-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: Add snippets for upload_chunks_concurrently and add chunk_size #1135
Changes from 2 commits
8421e5f
5a3874a
89dbd11
aa35cfd
18faaf1
7cf3f00
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,7 +13,9 @@ | |
# limitations under the License. | ||
|
||
# [START storage_transfer_manager_download_chunks_concurrently] | ||
def download_chunks_concurrently(bucket_name, blob_name, filename, processes=8): | ||
def download_chunks_concurrently( | ||
bucket_name, blob_name, filename, chunk_size=32 * 1024 * 1024, processes=8 | ||
): | ||
"""Download a single file in chunks, concurrently in a process pool.""" | ||
|
||
# The ID of your GCS bucket | ||
|
@@ -25,6 +27,11 @@ def download_chunks_concurrently(bucket_name, blob_name, filename, processes=8): | |
# The destination filename or path | ||
# filename = "" | ||
|
||
# The size of each chunk. The performance impact of this value depends on | ||
# the use case. The remote service has a minimum of 5 MiB and a maximum of | ||
# 5 GiB. | ||
# chunk_size = 32 * 1024 * 1024 (32 MiB) | ||
|
||
# The maximum number of processes to use for the operation. The performance | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the rule of thumb here: "number of cores your CPU has"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, workloads with small files benefit from many times that number and workloads with large files max out the NIC below that, so the number of cores is not a good starting place. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this something that would go into docs instead of sample comments? |
||
# impact of this value depends on the use case, but smaller files usually | ||
# benefit from a higher number of processes. Each additional process occupies | ||
|
@@ -37,7 +44,11 @@ def download_chunks_concurrently(bucket_name, blob_name, filename, processes=8): | |
bucket = storage_client.bucket(bucket_name) | ||
blob = bucket.blob(blob_name) | ||
|
||
transfer_manager.download_chunks_concurrently(blob, filename, max_workers=processes) | ||
transfer_manager.download_chunks_concurrently( | ||
blob, filename, chunk_size=chunk_size, max_workers=processes | ||
) | ||
|
||
print("Downloaded {} to {}.".format(blob_name, filename)) | ||
|
||
|
||
# [END storage_transfer_manager_download_chunks_concurrently] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
# Copyright 2022 Google LLC | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 2023 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will fix |
||
# | ||
# Licensed under the Apache License, Version 2.0 (the 'License'); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
# [START storage_transfer_manager_upload_chunks_concurrently] | ||
def upload_chunks_concurrently( | ||
bucket_name, | ||
source_filename, | ||
destination_blob_name, | ||
chunk_size=32 * 1024 * 1024, | ||
processes=8, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. workers? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, it might be easier to understand as "workers" for this and the other samples, I'll change it. |
||
): | ||
"""Upload a single file, in chunks, concurrently in a process pool.""" | ||
# The ID of your GCS bucket | ||
# bucket_name = "your-bucket-name" | ||
|
||
# The path to your file to upload | ||
# source_filename = "local/path/to/file" | ||
|
||
# The ID of your GCS object | ||
# destination_blob_name = "storage-object-name" | ||
|
||
# The size of each chunk. The performance impact of this value depends on | ||
# the use case. The remote service has a minimum of 5 MiB and a maximum of | ||
# 5 GiB. | ||
# chunk_size = 32 * 1024 * 1024 (32 MiB) | ||
|
||
# The maximum number of processes to use for the operation. The performance | ||
# impact of this value depends on the use case. Each additional process | ||
# occupies some CPU and memory resources until finished. | ||
# processes=8 | ||
|
||
from google.cloud.storage import Client, transfer_manager | ||
|
||
storage_client = Client() | ||
bucket = storage_client.bucket(bucket_name) | ||
blob = bucket.blob(destination_blob_name) | ||
|
||
transfer_manager.upload_chunks_concurrently( | ||
source_filename, blob, chunk_size=chunk_size, max_workers=processes | ||
) | ||
|
||
print(f"File {source_filename} uploaded to {destination_blob_name}.") | ||
|
||
|
||
# [END storage_transfer_manager_upload_chunks_concurrently] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't exercise multiple chunks; recommend increasing the object size to test
chunk_size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's sufficient to test the snippet. The feature itself is not under test here - it is fully covered in the integration tests, with appropriately-sized test files.