Changing over Blob.upload*() methods to use google-resumable-media. #3362

dhermes · 2017-05-03T02:28:09Z

Some notes:

~~For now, the build is failing unit tests, because I haven't done those (what I'm sure will be very time-consuming) updates~~
The ACTUAL IMPORTANT thing to note is that the system tests in the build are passing
~~I haven't yet updated create_resumable_upload_session(), but I can (it should be fairly straightforward, but I'd rather do it in a separate PR)~~
I would really like some suggestions about what to do about the num_retries argument. IMO supporting it is a "bad idea™"
I have ever-so-slightly changed the behavior of upload_from_file() when size is passed in. This is mostly because the existing implementation is confused about what the size / chunk_size combination should mean. The implementation now does the "sane" thing: use a resumable upload IF AND ONLY IF there is a chunk size specified on the blob
I have also dropped the "extra" size check from the implementation and moved an actual fstat into upload_from_filename(). This size check never really made sense in the generic "give me an IO[bytes] and I'll stream it" method. What's more, a resumable upload works perfectly fine if the size isn't known, so there is no good reason to add in that extra check.
Previously, if the chunk_size was unset (i.e. a simple upload), then blob.upload_from_file would completely fail if size was not passed in. This is not necessary, since we could just do file_obj.read() when there is no size specified
I would really like to deprecate the rewind keyword argument (related Dropping internal usage of rewind in Blob.upload_from_string(). #3365)

storage/google/cloud/storage/blob.py

+        .. _API reference: https://cloud.google.com/storage/\
+                           docs/json_api/v1/objects
+        """
+        # NOTE: This assumes `self.name` is unicode.


storage/google/cloud/storage/blob.py

+                  * An object metadata dictionary
+                  * The ``content_type`` as a string (according to precedence)
+        """
+        transport = self._make_transport(client)


storage/google/cloud/storage/blob.py

+    def _do_multipart_upload(self, client, stream, content_type, size):
+        """Perform a multipart upload.
+
+        Assumes ``chunk_size`` is :data:`None` on the current blob.


storage/google/cloud/storage/blob.py

+        upload.initiate(
+            transport, stream, object_metadata, content_type,
+            total_bytes=size, stream_final=False)
+        while not upload.finished:


storage/google/cloud/storage/blob.py

                       to the ``client`` stored on the blob's bucket.
        """
        content_type = self._get_content_type(content_type, filename=filename)

        with open(filename, 'rb') as file_obj:
+            total_bytes = os.fstat(file_obj.fileno()).st_size


theacodes · 2017-05-03T19:22:26Z

Overall this looks fine, just some small concerns.

For now, the build is failing unit tests, because I haven't done those (what I'm sure will be very time-consuming) updates

Do you plan to address those in this PR?

I haven't yet updated create_resumable_upload_session(), but I can (it should be fairly straightforward, but I'd rather do it in a separate PR)

Will you file a bug to track that, or do you have confidence you won't forget?

have ever-so-slightly changed the behavior of upload_from_file() when size is passed in. This is mostly because the existing implementation is confused about what the size / chunk_size combination should mean. The implementation now does the "sane" thing: use a resumable upload IF AND ONLY IF there is a chunk size specified on the blob

I'm okay with this.

I would really like to deprecate the rewind keyword argument

Do it. File a bug if needed to track.

dhermes · 2017-05-03T19:36:13Z

Do you plan to address those (i.e. unit tests) in this PR?

Absolutely.

Will you file a bug to track that (that == fixing the impl. of create_resumable_upload_session) , or do you have confidence you won't forget?

I realized that I would go below 100% line coverage in the _create_upload function if I left that alone. Since _create_upload needs to go anyways, I will just make the create_resumable_upload_session in this PR (rather than fighting coverage).

Do it. File a bug if needed to track.

You mean like do it in this PR?

theacodes · 2017-05-03T19:36:56Z

You mean like do it in this PR?

Your call.

In addition, switched over Blob.create_resumable_upload_session() to use google-resumable-media instead of using the vendored in `google.cloud.streaming` package.

…ming`).

storage/google/cloud/storage/blob.py

-            extra_headers=extra_headers)
+        curr_chunk_size = self.chunk_size
+        try:
+            # Temporarily patch the chunk size. A user should still be able


This is to avoid monkey-patching the instance when "pure" behavior will suffice. Also removed the transport from Blob._get_upload_arguments().

dhermes · 2017-05-04T17:47:09Z

Merging this now after discussions with @lukesneeringer and @jonparrott.

This needs a follow-up PR ASAP that supports num_retries, so I will be working on that.

Changing over Blob.upload*() methods to use google-resumable-media.

8b1ae6d

dhermes assigned lukesneeringer and tseaver May 3, 2017

googlebot added the cla: yes This human has signed the Contributor License Agreement. label May 3, 2017

dhermes added api: storage Issues related to the Cloud Storage API. and removed cla: yes This human has signed the Contributor License Agreement. labels May 3, 2017

dhermes assigned lukesneeringer, tseaver and theacodes and unassigned lukesneeringer and tseaver May 3, 2017

theacodes suggested changes May 3, 2017

View reviewed changes

googlebot added the cla: yes This human has signed the Contributor License Agreement. label May 3, 2017

dhermes added 3 commits May 3, 2017 19:24

Adding unit tests for Blob.upload*() changes.

113f4bb

In addition, switched over Blob.create_resumable_upload_session() to use google-resumable-media instead of using the vendored in `google.cloud.streaming` package.

Removing unused stub classes (previously used for `google.cloud.strea…

9d229fa

…ming`).

Cleaning up some unused unit test code in test_blob.

92be78f

dhermes mentioned this pull request May 4, 2017

Dropping internal usage of rewind in Blob.upload_from_string(). #3365

Merged

theacodes approved these changes May 4, 2017

View reviewed changes

storage/google/cloud/storage/blob.py Outdated

extra_headers=extra_headers)

curr_chunk_size = self.chunk_size

try:

# Temporarily patch the chunk size. A user should still be able

This comment was marked as spam.

Sign in to view

Adding chunk_size to Blob._initiate_resumable_upload.

4fcbe8a

This is to avoid monkey-patching the instance when "pure" behavior will suffice. Also removed the transport from Blob._get_upload_arguments().

dhermes merged commit d92306a into master May 4, 2017

dhermes deleted the resumable-media-for-GCS-uploads branch May 4, 2017 17:47

This was referenced Jun 5, 2017

STORAGE: Send blob metadata in upload requests #754

Closed

Makes checking fstat for file optional. #914

Merged

Allow unspecified size for file uploads in storage #931

Closed

Unable to set blob metadata #1185

Closed

william-silversmith mentioned this pull request Jun 19, 2017

v0.26 Release Date? #3507

Closed

This was referenced Jun 26, 2017

Prep storage-1.2.0 release. #3529

Closed

Prep storage-1.2.0 release. #3540

Merged

tswast mentioned this pull request Nov 18, 2021

Remove num_retries parameter from load_table_from_*() methods googleapis/python-bigquery#1071

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changing over Blob.upload*() methods to use google-resumable-media. #3362

Changing over Blob.upload*() methods to use google-resumable-media. #3362

dhermes commented May 3, 2017 •

edited

Loading

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

theacodes commented May 3, 2017

dhermes commented May 3, 2017

theacodes commented May 3, 2017

This comment was marked as spam.

dhermes commented May 4, 2017

Changing over Blob.upload*() methods to use google-resumable-media. #3362

Changing over Blob.upload*() methods to use google-resumable-media. #3362

Conversation

dhermes commented May 3, 2017 • edited Loading

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

theacodes commented May 3, 2017

dhermes commented May 3, 2017

theacodes commented May 3, 2017

This comment was marked as spam.

dhermes commented May 4, 2017

dhermes commented May 3, 2017 •

edited

Loading