Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing over Blob.upload*() methods to use google-resumable-media. #3362

Merged
merged 5 commits into from
May 4, 2017

Conversation

dhermes
Copy link
Contributor

@dhermes dhermes commented May 3, 2017

Some notes:

  • For now, the build is failing unit tests, because I haven't done those (what I'm sure will be very time-consuming) updates
  • The ACTUAL IMPORTANT thing to note is that the system tests in the build are passing
  • I haven't yet updated create_resumable_upload_session(), but I can (it should be fairly straightforward, but I'd rather do it in a separate PR)
  • I would really like some suggestions about what to do about the num_retries argument. IMO supporting it is a "bad idea™"
  • I have ever-so-slightly changed the behavior of upload_from_file() when size is passed in. This is mostly because the existing implementation is confused about what the size / chunk_size combination should mean. The implementation now does the "sane" thing: use a resumable upload IF AND ONLY IF there is a chunk size specified on the blob
  • I have also dropped the "extra" size check from the implementation and moved an actual fstat into upload_from_filename(). This size check never really made sense in the generic "give me an IO[bytes] and I'll stream it" method. What's more, a resumable upload works perfectly fine if the size isn't known, so there is no good reason to add in that extra check.
  • Previously, if the chunk_size was unset (i.e. a simple upload), then blob.upload_from_file would completely fail if size was not passed in. This is not necessary, since we could just do file_obj.read() when there is no size specified
  • I would really like to deprecate the rewind keyword argument (related Dropping internal usage of rewind in Blob.upload_from_string(). #3365)

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label May 3, 2017
@dhermes dhermes added api: storage Issues related to the Cloud Storage API. and removed cla: yes This human has signed the Contributor License Agreement. labels May 3, 2017
.. _API reference: https://cloud.google.com/storage/\
docs/json_api/v1/objects
"""
# NOTE: This assumes `self.name` is unicode.

This comment was marked as spam.

This comment was marked as spam.

* An object metadata dictionary
* The ``content_type`` as a string (according to precedence)
"""
transport = self._make_transport(client)

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

def _do_multipart_upload(self, client, stream, content_type, size):
"""Perform a multipart upload.

Assumes ``chunk_size`` is :data:`None` on the current blob.

This comment was marked as spam.

This comment was marked as spam.

upload.initiate(
transport, stream, object_metadata, content_type,
total_bytes=size, stream_final=False)
while not upload.finished:

This comment was marked as spam.

to the ``client`` stored on the blob's bucket.
"""
content_type = self._get_content_type(content_type, filename=filename)

with open(filename, 'rb') as file_obj:
total_bytes = os.fstat(file_obj.fileno()).st_size

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

@theacodes
Copy link
Contributor

Overall this looks fine, just some small concerns.

For now, the build is failing unit tests, because I haven't done those (what I'm sure will be very time-consuming) updates

Do you plan to address those in this PR?

I haven't yet updated create_resumable_upload_session(), but I can (it should be fairly straightforward, but I'd rather do it in a separate PR)

Will you file a bug to track that, or do you have confidence you won't forget?

have ever-so-slightly changed the behavior of upload_from_file() when size is passed in. This is mostly because the existing implementation is confused about what the size / chunk_size combination should mean. The implementation now does the "sane" thing: use a resumable upload IF AND ONLY IF there is a chunk size specified on the blob

I'm okay with this.

I would really like to deprecate the rewind keyword argument

Do it. File a bug if needed to track.

@dhermes
Copy link
Contributor Author

dhermes commented May 3, 2017

Do you plan to address those (i.e. unit tests) in this PR?

Absolutely.

Will you file a bug to track that (that == fixing the impl. of create_resumable_upload_session) , or do you have confidence you won't forget?

I realized that I would go below 100% line coverage in the _create_upload function if I left that alone. Since _create_upload needs to go anyways, I will just make the create_resumable_upload_session in this PR (rather than fighting coverage).

Do it. File a bug if needed to track.

You mean like do it in this PR?

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label May 3, 2017
@theacodes
Copy link
Contributor

You mean like do it in this PR?

Your call.

dhermes added 3 commits May 3, 2017 19:24
In addition, switched over Blob.create_resumable_upload_session() to
use google-resumable-media instead of using the vendored in
`google.cloud.streaming` package.
extra_headers=extra_headers)
curr_chunk_size = self.chunk_size
try:
# Temporarily patch the chunk size. A user should still be able

This comment was marked as spam.

This is to avoid monkey-patching the instance when "pure" behavior
will suffice.

Also removed the transport from Blob._get_upload_arguments().
@dhermes
Copy link
Contributor Author

dhermes commented May 4, 2017

Merging this now after discussions with @lukesneeringer and @jonparrott.

This needs a follow-up PR ASAP that supports num_retries, so I will be working on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants