-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changing over Blob.upload*() methods to use google-resumable-media. #3362
Conversation
.. _API reference: https://cloud.google.com/storage/\ | ||
docs/json_api/v1/objects | ||
""" | ||
# NOTE: This assumes `self.name` is unicode. |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
storage/google/cloud/storage/blob.py
Outdated
* An object metadata dictionary | ||
* The ``content_type`` as a string (according to precedence) | ||
""" | ||
transport = self._make_transport(client) |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
def _do_multipart_upload(self, client, stream, content_type, size): | ||
"""Perform a multipart upload. | ||
|
||
Assumes ``chunk_size`` is :data:`None` on the current blob. |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
storage/google/cloud/storage/blob.py
Outdated
upload.initiate( | ||
transport, stream, object_metadata, content_type, | ||
total_bytes=size, stream_final=False) | ||
while not upload.finished: |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
to the ``client`` stored on the blob's bucket. | ||
""" | ||
content_type = self._get_content_type(content_type, filename=filename) | ||
|
||
with open(filename, 'rb') as file_obj: | ||
total_bytes = os.fstat(file_obj.fileno()).st_size |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
Overall this looks fine, just some small concerns.
Do you plan to address those in this PR?
Will you file a bug to track that, or do you have confidence you won't forget?
I'm okay with this.
Do it. File a bug if needed to track. |
Absolutely.
I realized that I would go below 100% line coverage in the
You mean like do it in this PR? |
Your call. |
In addition, switched over Blob.create_resumable_upload_session() to use google-resumable-media instead of using the vendored in `google.cloud.streaming` package.
storage/google/cloud/storage/blob.py
Outdated
extra_headers=extra_headers) | ||
curr_chunk_size = self.chunk_size | ||
try: | ||
# Temporarily patch the chunk size. A user should still be able |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This is to avoid monkey-patching the instance when "pure" behavior will suffice. Also removed the transport from Blob._get_upload_arguments().
Merging this now after discussions with @lukesneeringer and @jonparrott. This needs a follow-up PR ASAP that supports |
Some notes:
For now, the build is failing unit tests, because I haven't done those (what I'm sure will be very time-consuming) updatesI haven't yet updatedcreate_resumable_upload_session()
, but I can (it should be fairly straightforward, but I'd rather do it in a separate PR)num_retries
argument. IMO supporting it is a "bad idea™"upload_from_file()
whensize
is passed in. This is mostly because the existing implementation is confused about what the size /chunk_size
combination should mean. The implementation now does the "sane" thing: use a resumable upload IF AND ONLY IF there is a chunk size specified on the blobfstat
intoupload_from_filename()
. This size check never really made sense in the generic "give me anIO[bytes]
and I'll stream it" method. What's more, a resumable upload works perfectly fine if the size isn't known, so there is no good reason to add in that extra check.chunk_size
was unset (i.e. a simple upload), thenblob.upload_from_file
would completely fail ifsize
was not passed in. This is not necessary, since we could just dofile_obj.read()
when there is no size specifiedrewind
keyword argument (related Dropping internal usage of rewind in Blob.upload_from_string(). #3365)