Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage: google-resumable-media==0.4.0 Breaks Gzipped Downloads #9188

Closed
william-silversmith opened this issue Sep 6, 2019 · 6 comments
Closed
Assignees
Labels
api: storage Issues related to the Cloud Storage API. 🚨 This issue needs some love. triage me I really want to be triaged. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@william-silversmith
Copy link

william-silversmith commented Sep 6, 2019

Environment details

  1. Specify the API at the beginning of the title (for example, "BigQuery: ...")
    General, Core, and Other are also allowed as types

Google Storage blob.py

  1. OS type and version

Ubuntu 14.04

  1. Python version and virtual environment information: python --version

Python 3.6.8

  1. google-cloud- version: pip show google-<service> or pip freeze

google-cloud-storage==1.19.0

Steps to reproduce

  1. pip install google-resumable-media==0.4.0
  2. blob.download_as_string()

Per the latest release of google-resumable-media, no decompression of content-encoding gzip is performed and raw bytes are returned.

See https://github.com/googleapis/google-resumable-media-python/releases

blob.download_as_string() formerly returned decompressed bytes, and now returns compressed bytes. We are using .blob instead of .get_blob for an HPC application and thus have no way of knowing what the content encoding is as the information is erased.

We actually LIKE the new functionality as we can now decide when to decompress, but we need to know the content encoding to avoid various kinds of problems that would be introduced by speculative decompression.

Code example

Here is our desired functionality.

    blob = bucket.blob( key )
    try:
      # blob handles the decompression so the encoding is None
      resp = blob.download_as_string(start=start, end=end)
      return resp, blob.content_encoding
    except google.cloud.exceptions.NotFound as err:
      return None, None

Adding this patch to google.cloud.storage.blob.py would solve this problem for us:

    def _do_download(
        self, transport, file_obj, download_url, headers, start=None, end=None
    ):
        """Perform a download without any error handling.

        This is intended to be called by :meth:`download_to_file` so it can
        be wrapped with error handling / remapping.

        :type transport:
            :class:`~google.auth.transport.requests.AuthorizedSession`
        :param transport: The transport (with credentials) that will
                          make authenticated requests.

        :type file_obj: file
        :param file_obj: A file handle to which to write the blob's data.

        :type download_url: str
        :param download_url: The URL where the media can be accessed.

        :type headers: dict
        :param headers: Optional headers to be sent with the request(s).

        :type start: int
        :param start: Optional, the first byte in a range to be downloaded.

        :type end: int
        :param end: Optional, The last byte in a range to be downloaded.
        """
        if self.chunk_size is None:
            download = Download(
                download_url, stream=file_obj, headers=headers, start=start, end=end
            )
>           response = download.consume(transport)
>           if 'Content-Encoding' in response.headers:
>               self.content_encoding = response.headers['Content-Encoding']
        else:
            download = ChunkedDownload(
                download_url,
                self.chunk_size,
                file_obj,
                headers=headers,
                start=start if start else 0,
                end=end,
            )

            while not download.finished:
                download.consume_next_chunk(transport)
@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Sep 7, 2019
@tseaver tseaver added type: question Request for information or clarification. Not an issue. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. api: storage Issues related to the Cloud Storage API. and removed triage me I really want to be triaged. type: question Request for information or clarification. Not an issue. labels Sep 9, 2019
@tseaver
Copy link
Contributor

tseaver commented Sep 9, 2019

@william-silversmith Can you achieve what you need to reloading the blob's metadata? E.g.:

blob.reload()

At that point, the content_encoding property will be populated from the server.

@william-silversmith
Copy link
Author

william-silversmith commented Sep 9, 2019 via email

@tseaver
Copy link
Contributor

tseaver commented Sep 9, 2019

@william-silversmith Thanks for clarifying. One issue here is that we would want to have the header-driven content_encoding value also available for chunked downloads, so we would need to rework your patch a bit.

@crwilcox, @frankyn Please chime in.

@tseaver tseaver changed the title storage.blob: google-resumable-media==0.4.0 Breaks Gzipped Downloads Storage: google-resumable-media==0.4.0 Breaks Gzipped Downloads Sep 10, 2019
@frankyn
Copy link
Member

frankyn commented Sep 17, 2019

@crwilcox reverted changes made in googleapis/google-resumable-media-python#103 and releasing a new version to unblock this issue: googleapis/google-resumable-media-python#104

Reassigning to him.

@frankyn frankyn added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. and removed type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. labels Sep 17, 2019
@frankyn frankyn assigned crwilcox and unassigned frankyn Sep 17, 2019
@yoshi-automation yoshi-automation added 🚨 This issue needs some love. triage me I really want to be triaged. labels Sep 17, 2019
@crwilcox
Copy link
Contributor

Hi @william-silversmith I have released v0.4.1 that backs out this change.

Due to the way we have pinned this package within google-cloud-storage, we are rethinking the way we make this change to avoid disrupting folks using existing libraries.

@william-silversmith
Copy link
Author

Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API. 🚨 This issue needs some love. triage me I really want to be triaged. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

5 participants